WO2009097714A1 - Depth searching method and depth estimating method for multi-viewing angle video image - Google Patents

Depth searching method and depth estimating method for multi-viewing angle video image Download PDF

Info

Publication number
WO2009097714A1
WO2009097714A1 PCT/CN2008/072141 CN2008072141W WO2009097714A1 WO 2009097714 A1 WO2009097714 A1 WO 2009097714A1 CN 2008072141 W CN2008072141 W CN 2008072141W WO 2009097714 A1 WO2009097714 A1 WO 2009097714A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
search
pixel
view
value
Prior art date
Application number
PCT/CN2008/072141
Other languages
French (fr)
Chinese (zh)
Inventor
Xiaoyun Zhang
George L Yang
Original Assignee
Panovasic Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panovasic Technology Co., Ltd. filed Critical Panovasic Technology Co., Ltd.
Publication of WO2009097714A1 publication Critical patent/WO2009097714A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Definitions

  • TECHNICAL FIELD The present invention relates to multi-view video image processing techniques. BACKGROUND OF THE INVENTION
  • technologies such as computer vision, video processing, and scene synthesis based on depth images should be utilized in advanced 3D television and Free Viewpoint Video System (FVV). Separating the acquisition and display settings of the video, that is, the viewing angle and the camera orientation of the acquired video are mutually unrestricted, thereby providing a high degree of flexibility, interactive book and operability.
  • European stereoscopic TV projects use video plus depth data formats ("Deep Image Based Synthesis, Compression and Transmission of 3D TV New Methods", Stereoscopic Display and Virtual Reality Systems SPIE Conference, 2004.; C.
  • DIBR Depth Image Based rendering
  • the current deep search method is: Use a fixed depth step (uniform depth-grid) for deep search within a fixed search range.
  • a fixed depth step uniform depth-grid
  • the given search step at a smaller depth value corresponds to an offset of 1 pixel
  • the pixel offset corresponding to the search step will be less than 1 pixel.
  • Hypothesis When projecting to a non-integer pixel at a given depth value, taking the nearest neighbor pixel as a projection point, the same pixel will be searched at a plurality of different depth values in the depth search, that is, a repeated search occurs.
  • the pixel offset corresponding to the search step will be greater than 1 pixel at a smaller depth value, ie Two adjacent depth values will search for two non-adjacent pixel points, causing some pixels to miss detection and incomplete search. Therefore, it is originally expected to search for N pixel points in the search range [ Zmm , zmax ], but since the pixel point repeated search or the miss search is generated, the actual searched effective search point is less than N.
  • the search range is usually set to be large enough, and in order to ensure a certain search accuracy, the search step size is set to be small, which greatly increases the number of searches and corresponding The amount of calculation, and due to the existence of miss search and repeated search, the search effect is not good.
  • disparity estimation or stereo matching is a classic problem in computer vision. Although there has been a lot of research work and results so far, the lack of matching or ambiguity caused by ambiguity or uncertainty makes the parallax matching problem still computer vision. Research hotspots and difficulties in the research.
  • JVT Conference Proposal (“Multi-View Video Coding Core Experiment 3 Report”; S. Yea, J. Oh, S. Ince, E. Martinian and A. Vetro, "Report on Core Experiment CE3 of Multiview Coding", ISO IEC JTC1/SC29AVG11, Doc. JVT-T123, Klagenfurt, Austria, July 2006.) proposes the use of camera internal and external parameters and depth-based view synthesis, with a given search step within a specified depth search range, The depth that minimizes the error between the composite view and the actual view is searched as an estimate.
  • M. Okutom et al. proposed a multiple-baseline stereo method for multi-baseline stereo systems.
  • This method uses the inverse relationship between depth and parallax to convert the disparity estimation into a deep solution problem and eliminates the parallax matching.
  • Deterministic Problems Multi-Baseline Stereo System", IEEE Transactions on Pattern Recognition and Machine Intelligence; M. Okutomi and K. Kanade, "A multiple-baseline stereo”, IEEE Trans, on Pattern Analysis and Machine Intelligence 15 (4): 353 - 363, 1993. ).
  • N. Kim et al. proposed direct depth search, matching, and view synthesis operations in distance/depth space ("general multi-baseline stereo systems and direct view synthesis using deep space search, matching, and synthesis", International Journal of Computer Vision; (N. Kim, M. Trivedi and H.
  • the parallax search range is usually determined intuitively according to the nature of the image, while in deep search, especially in non-parallel camera systems, Since the relationship between depth variation and image pixel offset is not obvious, its search range is difficult to determine reasonably. Therefore, how to determine the appropriate depth search interval and step size for a given multi-view view becomes the key to effectively estimate depth information.
  • M. Okutom and N. Kim et al. refer to the depth variation value corresponding to the 1 pixel offset of the reference view of the longest baseline as the search step size, thereby ensuring that the pixel offset in all other reference views is less than 1 pixel.
  • Both of the above methods use a fixed search step size, and the step size is not adaptively adjusted according to changes in image content or scene.
  • SUMMARY OF THE INVENTION The technical problem to be solved by the present invention is to provide an adaptive determination method for a search step size, which can avoid repeated or missing search of pixel points.
  • the present invention also proposes a depth estimation method based on an adaptive search step size.
  • the technical solution adopted by the present invention to solve the above technical problem is a multi-view video image depth search method, which is characterized in that the search step length of each step in the depth search range is dynamically adjusted according to the current depth value, and the current depth value is smaller.
  • the pixel search accuracy may be sub-pixel precision such as one-half pixel, one-quarter pixel, or integer pixel precision, such as one pixel, two pixels; the search step size is equal to each search
  • the depth change value corresponding to the primary pixel offset vector; the search step size of the target view is determined by the current depth value in the
  • the target view refers to an image that currently needs to be estimated, and the reference view refers to other images in a multi-view video system.
  • a depth estimation method for multi-view video images in depth-based view synthesis and block-based depth search, the depth search range and search step size of the target view are determined by the pixel search range and pixel search precision of the reference view
  • the search step size of each step is dynamically adjusted according to the current depth value. The smaller the current depth value is, the smaller the search step size is. The larger the current depth value is, the larger the search step size is.
  • the search step size of each step corresponds to the same pixel search precision; the depth search step size is determined by the current depth value in the target view, the pixel offset vector in the reference view, and the camera internal and external parameters corresponding to the view, each in the target view
  • the search step of one step corresponds to a pixel offset vector of the same length in the reference view;
  • the depth-based view synthesis refers to the pixel point and depth value of a given target view, according to the target angle of view and the reference angle of view within the camera
  • An external parameter a method of backprojecting the pixel point to a three-dimensional scene space point, and then re-projecting the spatial point to an image plane of the reference perspective, obtaining a composite view of the target view at the reference view; the depth-based view synthesis and
  • the depth search based on the block configuration is specifically, using the current depth value for view synthesis, and calculating the synthesis
  • Step 2 Determine that the depth search corresponds to the pixel search range and the pixel search precision in the reference view, and obtain the pixel offset vector ⁇ in the reference view according to the pixel search precision.
  • Step 3 According to the current depth value 3 ⁇ 4 and the pixel offset vector ⁇ , the corresponding correspondence is obtained.
  • the depth change value ⁇ 3 ⁇ 4, the depth change value ⁇ 3 ⁇ 4 is the next search step;
  • Step 4 uses the current depth value for view synthesis, and calculates the error e k between the pixel block of the composite view and the pixel block of the reference view ;
  • k k + l;
  • step 5 determining whether more than a given pixel search range, and if so proceeds to step 6, if not, proceeds to step 3;
  • the search step is obtained by the following formula: Where: P is the pixel point to be depth estimated in the target view, z is the current depth value of the pixel point P, and ⁇ is the depth change value of the pixel point P, that is, the search step length, which is the depth change value ⁇ of the pixel point ⁇ in the target view.
  • the square of the current depth value is proportional to the depth change value.
  • the invention has the beneficial effects that the depth search of the adaptive search step does not cause the pixel leak search and the repeated search, the absolute difference between the synthesized image block and the reference image block in the depth estimation is small, the error estimation is small, and the calculation amount or depth Less searches.
  • FIG. 1 is a schematic diagram of coordinate system setting in a multi-view video system
  • FIG. 2 is a schematic diagram of depth-based view synthesis
  • FIG. 3 (a) a view of an initial moment of a video sequence of a seventh camera in a Uli test sequence; (b) A view of the initial time of the video sequence of the 7th camera in the Uli test sequence;
  • Figure 3 (c) is a partial schematic view of Figure 2 (a), with 16 indicated points indicating the pixel points [527, 430] to [590, 493] image area;
  • FIG. 4 is a schematic diagram of the relationship between the depth variation value and the square of the depth value;
  • FIG. 1 is a schematic diagram of coordinate system setting in a multi-view video system
  • FIG. 2 is a schematic diagram of depth-based view synthesis
  • FIG. 3 (a) a view of an initial moment of a video sequence of a seventh camera in a Uli test sequence; (b) A view of the initial time of the video sequence of
  • FIG. 5 is a schematic diagram of the depth variation value and the pixel offset vector of the present invention
  • FIG. 6 is a pixel leak search when the depth value is small.
  • Figure 7 is a schematic diagram of pixel repeated search when the depth value is large;
  • Figure 8 is a schematic diagram of adaptively adjusting the depth search step size according to the present invention;
  • Figure 9 is a search for the pixel by using the adaptive variable length search step size of the present invention.
  • Figure 10 is a schematic diagram of the depth search performance using the fixed search step size and the adaptive step size of the present invention.
  • the present invention proposes an adaptive determination method for depth search step size, which utilizes the internal and external parameters of the camera and the perspective projection relationship, first deducing the pixel point depth value, the depth variation value, and the projection point caused by the depth change in the composite view.
  • the present invention also proposes a simple and effective initial depth estimation method, which obtains a rough estimate of the scene depth by solving the convergence point of the camera optical axis in the convergence camera system and treating the point as a scene miniature point.
  • three types of coordinate systems are usually needed to describe the scene and its image position information, which are the world coordinate system, the camera coordinate system and the pixel coordinate system where the scene is located, as shown in FIG.
  • the camera coordinate system takes the center of the camera as the origin, the optical axis is the z-axis, and the xy plane is parallel to the image plane.
  • the pixel coordinate system takes the upper left corner of the image as the coordinate origin, and the horizontal and vertical coordinates are u, v.
  • A is the camera's internal parameter matrix of the reference angle of view, which mainly includes parameters such as camera focal length, center and deformation coefficient.
  • the present invention performs block-based depth search in depth space, that is, using camera internal and external parameters and depth-based view synthesis, searching for pixel blocks of the synthesized view and corresponding actual reference views by using search step length in the depth search range.
  • the minimum depth value between the blocks, and the depth value is taken as the depth of the pixel of the target view Estimated value.
  • the target view and the target perspective refer to images and corresponding perspectives that currently require depth estimation, and the reference views and reference perspectives refer to other images and perspectives in the multi-view video system.
  • the reference view and reference view can be automatically selected during the deep search or specified by the user.
  • the pixel When the depth value of the pixel in the view is given, the pixel can be back projected into the scene space according to the internal and external parameters of the camera to obtain a spatial point, and the spatial point is reprojected to the required
  • the image plane in the direction of the view gives a composite view of the view, which is the depth-based view synthesis technique, as shown in Figure 2.
  • the depth value in the camera coordinate system is Zl
  • the corresponding pixel point in view 2 is P 2
  • the depth value in the camera c 2 coordinate system is 3 ⁇ 4, according to formula (1) (2) Derived
  • a 1 is a view pixel values in pixels in the perspective view of Synthesis 2 2 2 P-luminance-chrominance value at depth z :
  • L is view 1
  • I 2 is view 2
  • Synthesized_I 2 is a composite view 2 of view 1 in reference camera perspective 2.
  • the above description is exemplified by a camera system composed of two cameras, and it can be further drawn that a camera system composed of m cameras can be applied to the above principle. Assuming that the pixel points in the partial window W centered on the pixel point P have the same scene depth value, the composite view 2 of the view 1 in the window W and the absolute view of the reference view 2 actually taken by the camera at the angle of view 2 are assumed. The difference is:
  • the synthetic view 2 under theoretical real scene depth values has the same luminance chrominance value as the reference view 2. Therefore, the solution of the depth value of view 1 at pixel point P can be transformed into the following problem: ze ⁇ That is, within a given depth range, the depth z that minimizes the absolute difference between the composite view and the reference view is taken as the final depth estimate.
  • This method of directly performing depth search in depth space does not require parallax matching, image correction processing is directly performed in the depth search process, and the depth value is a continuous value, and the accuracy is not limited by the image pixel resolution as the disparity vector. It is known from equation (7) that in the case where the internal and external parameters of the camera are known, the pixel points of the composite view 2 are a function of the pixel points in the view 1 and their depth values.
  • the depth value change ⁇ of the pixel point Pi in the view 1 causes its pixel offset in the composite view 2 From equation (12), it can be derived that the relationship between the depth value change of the pixel in view 1 and the corresponding pixel point offset vector ⁇ in the composite view 2 is:
  • equation (17) is a homogeneous linear equation for the two components of the pixel offset vector ⁇ and ⁇ .
  • the depth change value is proportional to the pixel offset and inversely proportional to the square of the depth value.
  • the larger the depth value is the larger the corresponding depth change value is.
  • the smaller the depth value is the smaller the corresponding depth change value is.
  • the depth change value (corresponding to the button on the right side of the shirt collar), the depth change value , the relationship between the square of the depth value and the pixel offset.
  • the depth variation values corresponding to different depth values are calculated according to (14), and the relationship between them is as shown in Fig. 4.
  • the abscissa is the square of the depth value, and the ordinate is the depth change value.
  • Figure 4 shows that when the pixel offset in the composite view is given, the depth change value is approximately linear with the square of the depth value, which means that the same amount of depth change value of the pixel in view 1 at different depth values Causes different pixel offsets in the composite view.
  • (17) is a homogeneous linear equation for the pixel offset vector ⁇ , ⁇ exists in two opposite directions ⁇ + and ⁇ -, and they can be obtained by substituting them into (14).
  • the pixel offset is given, and the depth change value is proportional to the square of the depth value. Therefore, the depth change values corresponding to the two pixel offset vectors ⁇ of the same size and different directions are not the same, that is, the depth value.
  • the decrease amount I ⁇ I is smaller than the depth value increase amount I ⁇ + I , as shown in FIG. 5 .
  • the depth value is 3172mm
  • the depth change value is approximately proportional to the square of the depth value.
  • the search step size at a smaller depth value corresponds to 1 image
  • the offset of the prime, then at a larger depth value the pixel offset corresponding to the search step will be less than 1 pixel.
  • a pixel of the nearest neighbor is taken as a projection point when projected to a non-integer pixel at a given depth value, the same pixel will be searched at a plurality of different depth values in the depth search, that is, a repeated search occurs.
  • the depth value when the depth value is small, for example, the pixel coordinate u searched at 2090 is 661, and the pixel coordinate of the pixel searched at the depth value 2080 is 663, and the pixel point is skipped in the middle.
  • the depth value when the depth value is large, for example, two different depth values 4450 and 4460 search for the same pixel with the u coordinate of 437, that is, the pixel is repeatedly searched. Since the search step size of 10 mm corresponds to the search precision of 1 pixel in the local range of the true depth value 3170, we originally expected to search for 250 different pixels in the range of [2000, 4500], but due to the occurrence of pixel leakage Searching and repeating the search, the actual calculation found that only 200 pixels were searched.
  • the search step In order to make the depth search process correspond to the same pixel search precision in the reference view, that is, the search step always corresponds to the offset of the fixed pixel in the reference view, and must be based on the depth variation value and the depth value.
  • the relationship dynamically adjusts the search step size and determines the corresponding search range. Assume the pixels in view 1!
  • the initial search depth value of ⁇ is, and the depth change value ⁇ in the view 1 corresponding to the pixel offset ⁇ in the reference view 2 can be conveniently obtained according to the equation (14).
  • the initial depth value is ⁇ .
  • the pixel corresponding to the pixel point Pi in the reference view 2 is the true corresponding pixel point and the depth z.
  • the pixel offset between the pixels found below is usually limited to a certain range.
  • N Given in the pixel search range N, how to adaptively determine the search step size according to the depth value, so that the search step size always corresponds to the offset of a fixed pixel.
  • the search step size Given the pixel point Pi and camera parameters, according to the polar line constraint equation (16) of the pixel offset vector, it is easy to solve the two offset vectors ⁇ corresponding to the pixel offset II ⁇ II. And then calculate the correspondence according to (14) Use them as the search step size for the next step to decrease the depth value and increase the direction, as shown in Figure 8.
  • ⁇ _ ⁇ z 0 +
  • the search depth and step size of the nth step are obtained as follows:
  • the number of search steps n is determined according to the search range N and the search precision, that is, n satisfies " ⁇ ⁇ ⁇ ⁇ ⁇ . Therefore, after determining the search range and the initial depth value, the above method can be used to obtain the variable length search step size adaptively adjusted with the change of the depth value, so that the same pixel search precision is maintained during the depth search process, and the fixed search is overcome. A defect in which the pixel repeats the search or misses the search in the step size. Since the depth search range is obtained by the accumulation of the search step size, it is also adaptively adjusted as the depth value changes.
  • the depth search range corresponding to the same pixel offset II ⁇ II becomes correspondingly larger.
  • the depth search range corresponding to the same pixel offset II ⁇ II is correspondingly smaller.
  • the corresponding depth search step size can be determined by determining the pixel search accuracy, and the determination of the depth search range is also converted into corresponding The determination of the pixel offset.
  • the determination of pixel offset and search accuracy is similar to the determination of search range and precision in disparity estimation, which is intuitive and easy to determine, and can be dynamically determined by adjusting pixel offset and search accuracy according to image content or application requirements.
  • Depth search range and step size In the depth estimation process, a depth initial value ⁇ needs to be given. The value of the initial value affects the depth search performance and effect. When you are. When the deviation from the true depth value is small, a smaller pixel offset can be used, that is, the search range can be smaller, thereby reducing the search speed too high; when the deviation from the true depth value is large, A relatively large pixel offset is used to ensure that the true depth value is searched, and thus the amount of calculation is large.
  • the poor depth initial value can improve the search performance by setting a wide range of search range and high-precision search step size
  • a good depth initial value can determine a small range of search range and a suitable step size, thereby improving the depth search. Efficiency and performance. Therefore, the estimation and determination of the depth initial value during the depth estimation process is also very important.
  • the determination of the initial depth value of the video sequence image can be divided into two situations, an image at the initial moment and a subsequent image.
  • the determination of the depth initial value of the image at the initial time is further divided into two types, that is, the first pixel and the other pixels. For the first pixel, since there is no deep search for any pixel, there is no known scene depth information.
  • the initial depth can be determined from the depth estimates of neighboring pixels within the image.
  • the depth values of the video sequence images of the same viewing angle have a strong correlation, the depth of the stationary background region remains unchanged, and only a small number of motion regions change in depth, so the previous one can be
  • the depth value of the same pixel position of the time image is taken as an initial value. Therefore, in the determination of the initial depth value, the key is to obtain the scene depth information of the initial time image, and provide a better depth initial value for the first pixel.
  • the difference between images of different views or the position information of the camera usually contains information about the depth of the scene.
  • an initial estimation of the depth of the scene based on camera parameters or image information without any known depth information is given.
  • the main goal of multi-view video is to capture information of the same scene at multiple angles, so the camera is usually placed in a circular arc and the camera's optical axis converges at one point, the convergence system.
  • the camera may not be strictly concentrated at one point, it is always possible to find a point that is closest to the optical axis of each camera. This point is considered to be a convergence point.
  • the convergence point is usually the location of the scene, which can be considered as a microscopic point of the scene. Therefore, by obtaining the position of the convergence point, an approximate value of the scene depth can be obtained, and the value is used as the initial value in the depth search.
  • Mc [ Xe , y e , ]
  • Equation (25) is a linear equation of 3 (/7-1) with respect to depth, 2 , -, 2....
  • the parallax has a simple inverse relationship with the depth (18), so the depth information can be obtained by calculating the global parallax between the two views.
  • the global disparity can be defined as the pixel offset that minimizes the absolute difference between the two views, ie by the following method:
  • R is the number of pixels of the overlapping area of views 1 and 2. Since the estimation accuracy of the global disparity is not high, the search unit of the pixel offset X in the equation (26) can be set larger, such as 8 pixels or 16 pixels, so that the calculation amount can be greatly reduced.
  • the depth initial value can be obtained according to the relationship (18) whose depth is inversely proportional to the parallax.
  • the depth of field of the Uli scene does not change much, but the initial depth estimation value in Table 1 is not much different from the real depth information of the scene point, indicating that the initial depth estimation value is more effective and reasonable.
  • the depth estimate provides a good initial value.
  • Table 1 The Uli view shown in Figure 3 (c) is from the pixel points [527, 430] to the 64x64 image area of [590, 493]. For every 15 pixels in the region, a total of 16 pixels are used for depth search using a fixed step size and an adaptive step size. Three searches with fixed search steps were performed in the fixed search range [2000, 5000] with steps of 20, 35, 50 respectively. In the determination of the adaptive search step size, the initial depth is 2817, the pixel offset is set to 32 pixels, the search precision is 1 pixel, and the depth initial value of the subsequent pixel is set as the depth estimation value of the adjacent pixel.
  • the search step corresponding to the search precision per unit pixel in the search range of pixels 32 [527, 430] deviating from the initial search pixel point can be obtained, such as a table.
  • the pixels are searched by these search steps as shown in Fig. 9.
  • Table 2 shows that the step size along the direction in which the depth value decreases is negative, and as the pixel offset increases and the depth value decreases, the absolute value of the step gradually decreases;
  • the step size in the large direction is positive, and as the pixel offset increases and the depth value increases, the absolute value of the step gradually increases.
  • Figure 9 shows that when performing a depth search using the variable length search step of Table 2, the corresponding pixel search accuracy is guaranteed to remain constant, always 1 pixel.
  • Pixel step size depth value step size (depth value pixel step size (depth value step size (depth value shift amount increase direction) decrease direction) shift amount increase direction) decrease direction)
  • the depth estimation is performed by the block matching based method.
  • the depth search result is shown in Fig. 10.
  • Each point in the figure represents the absolute between the synthesized block and the actual block under the depth value obtained by the search. Poor, the smaller the value, the more accurate the depth estimate is.
  • the absolute difference obtained by the depth value obtained by searching for a step size of 20 mm is less than the absolute difference of 35 mm in steps.
  • the absolute difference of 35mm is less than 50mm.
  • the depth value obtained by the search under the adaptive search step is the best, and the corresponding absolute difference is the smallest.
  • Figure 3(c) shows the 16 pixel points for depth estimation in the image area of the pixel points [527, 430] to [590, 493] in view 2 (a), using adaptive search step size, fixed step size 20 , 35, 50 to search.
  • Table 3 shows that with adaptive search step size, 16 pixel points are searched for the correct depth value, while a fixed step size has an incorrect depth estimate. This is because these pixels are in an area where the texture is lacking. In a wide range of fixed search ranges, the minimum point of absolute difference in the search does not correspond to the correct pixel.
  • the pixel offset can be set smaller, that is, searching in a relatively small local range, which reduces the probability of searching for the wrong pixel. And it guarantees a certain depth smoothness.
  • Table 3 lists the depth estimation results, the number of deep search times, and the number of error estimates when using the adaptive search step size and the fixed search step size. The data with borders in the table is the error data. The results in Table 3 show that there are few search times in the adaptive search step search and there is no error estimation, and there are many search times in the fixed search step search and there are still error estimates. For example, in an adaptive depth search of 32 pixel offsets, only 64 depth values are searched, and a fixed search step size of 20 mm needs to search for 150 depth values in the search range of [2000, 5000].

Abstract

A depth searching method for a multi-viewing angle video image is disclosed. The method involves dynamically adjusting searching step sizes in a depth searching range according to a current depth value so that each of the searching step sizes is corresponding to a same pixel searching accuracy. A depth estimating method for a multi-viewing angle video image is disclosed. The method involves dynamically adjusting each of the searching step sizes in a depth searching range in using a view synthesizing based on depths and a depth searching based on block matching.

Description

多视角视频图像深度搜索方法及深度估计方法  Multi-view video image depth search method and depth estimation method
技术领域 本发明涉及多视角视频图像处理技术。 背景技术 近年来, 研究者们逐渐认识到, 未说来先进三维电视和任意视角视频应用系统(FVV, Free Viewpoint Video System)中应该利用计算机视觉、视频处理和基于深度图像的场景 合成等技术, 把视频的获取和显示设置分离开来, 即观看视角与获取视频的照相机方位 相互不受限制, 从而提供高度的灵活性、 交互性书和可操作性。 欧洲的立体电视项目采用 了视频加深度的数据格式 ( "基于深度图像的合成、 压缩和传输的三维电视新方法" , 立体显示和虚拟现实系统 SPIE会议, 2004.; C. Fehn, "Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV," in Proc. SPIE Conf. Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, CA, U.S.A., Jan. 2004, pp. 93-104. ) , 即图像的每个像素 对应一个深度值; 利用基于深度图像的视图合成方法 (DIBR: Depth Image Based Rendering): 接收端解码器根据显示设置和观看视角生成立体图像对, 从而使观看视角 与获取视频的照相机方位相互不受限制。 2007年 4月 JVT会议提案 ( "先进三维视频 系统的多视角视频加深度的数据格式"; A. Smolic and K. Mueller, et al., "Multi-View Video plus Depth (MVD) Format for Advanced 3D Video Systems", ISO IEC JTC1/SC29AVG11 , Doc. JVT-W100, San Jose, USA, Apnl 2007. )把视频加深度推广到多视角视频, 提出了视频加深度的多视角 编码数据格式 MVD (Multi-view video plus depth) 。 由于 MVD能够满足先进三维视频 或任意视角视频应用的一个本质需求, 即能够在解码端生成一定范围内的连续的任意视 角的视图, 而不是数量有限的离散的视图, 所以视频加深度的 MVD方案已经被 JVT采 纳, 被确定为今后的发展方向。 所以, 如何从不同视角的两幅或多幅视图获取场景的深度信息成为多视角视频处理 中的重要问题之一。 目前的深度搜索方式为: 在固定搜索范围内采用固定搜索步长 (uniform depth-grid) 进行深度搜索。 使用固定搜索步长时, 若在较小深度值处给定的搜索步长对应于 1个像 素的偏移量, 则在较大深度值处, 该搜索步长对应的像素偏移量将小于 1个像素。 假设 在给定的深度值下投影到非整数像素时, 取最近邻的像素点作为投影点, 则深度搜索时 将在多个不同的深度值处搜索到同一像素点, 即出现了重复搜索。 反过来, 若给定的搜 索步长在较大深度值处对应于 1个像素的偏移量, 则在较小深度值处该搜索步长对应的 像素偏移量将大于 1个像素, 即相邻两个深度值将搜索到两个非相邻的像素点, 从而使 得有些像素点漏检, 产生搜索不全。 所以, 本来期望在搜索范围 [Zmm,zmax]内搜索 N个像 素点, 但由于产生了像素点重复搜索或漏搜索, 实际搜索到的有效搜索点要少于 N。 为 了保证搜索范围包含场景真实深度值的所有可能取值, 通常把搜索范围设得足够大, 而 为了保证一定的搜索精度, 把搜索步长设得较小, 这大大增加了搜索次数和相应的计算 量, 并且由于漏搜索和重复搜索的存在, 搜索效果并不好。 TECHNICAL FIELD The present invention relates to multi-view video image processing techniques. BACKGROUND OF THE INVENTION In recent years, researchers have come to realize that technologies such as computer vision, video processing, and scene synthesis based on depth images should be utilized in advanced 3D television and Free Viewpoint Video System (FVV). Separating the acquisition and display settings of the video, that is, the viewing angle and the camera orientation of the acquired video are mutually unrestricted, thereby providing a high degree of flexibility, interactive book and operability. European stereoscopic TV projects use video plus depth data formats ("Deep Image Based Synthesis, Compression and Transmission of 3D TV New Methods", Stereoscopic Display and Virtual Reality Systems SPIE Conference, 2004.; C. Fehn, "Depth- Image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV," in Proc. SPIE Conf. Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, CA, USA, Jan. 2004, pp. 93 -104.), that is, each pixel of the image corresponds to a depth value; using a depth image based view synthesis method (DIBR: Depth Image Based Rendering): the receiver decoder generates a stereo image pair according to the display setting and the viewing angle, thereby The viewing angle and the camera orientation for acquiring the video are not limited to each other. Proposal for the JVT meeting in April 2007 ("Multi-view video and depth data format for advanced 3D video systems"; A. Smolic and K. Mueller, et al., "Multi-View Video plus Depth (MVD) Format for Advanced 3D Video Systems", ISO IEC JTC1/SC29AVG11, Doc. JVT-W100, San Jose, USA, Apnl 2007. ) Extends video depth to multi-view video, and proposes video plus depth multi-view encoded data format MVD (Multi- View video plus depth) . Since MVD can meet the essential requirement of advanced 3D video or arbitrary viewing video applications, that is, it can generate a continuous view of arbitrary angles of view within a certain range, instead of a limited number of discrete views, so the video plus depth MVD scheme It has been adopted by JVT and has been identified as the future direction of development. Therefore, how to obtain the depth information of a scene from two or more views from different perspectives becomes one of the important issues in multi-view video processing. The current deep search method is: Use a fixed depth step (uniform depth-grid) for deep search within a fixed search range. When using a fixed search step, if the given search step at a smaller depth value corresponds to an offset of 1 pixel, then at a larger depth value, the pixel offset corresponding to the search step will be less than 1 pixel. Hypothesis When projecting to a non-integer pixel at a given depth value, taking the nearest neighbor pixel as a projection point, the same pixel will be searched at a plurality of different depth values in the depth search, that is, a repeated search occurs. Conversely, if the given search step corresponds to an offset of 1 pixel at a larger depth value, the pixel offset corresponding to the search step will be greater than 1 pixel at a smaller depth value, ie Two adjacent depth values will search for two non-adjacent pixel points, causing some pixels to miss detection and incomplete search. Therefore, it is originally expected to search for N pixel points in the search range [ Zmm , zmax ], but since the pixel point repeated search or the miss search is generated, the actual searched effective search point is less than N. In order to ensure that the search range contains all possible values of the true depth value of the scene, the search range is usually set to be large enough, and in order to ensure a certain search accuracy, the search step size is set to be small, which greatly increases the number of searches and corresponding The amount of calculation, and due to the existence of miss search and repeated search, the search effect is not good.
迄今, 已有很多与深度估计相关的研究和估计算法, 但大多数通过对校正的、 平行 立体图像对先进行视差估计, 再根据视差与深度的关系计算深度信息。 例如, 平行相机 系统中两幅图像之间只存在水平视差, 利用基于特征或块匹配的方法先估计视差, 然后 根据深度与视差成反比的关系计算出深度信息; 而对于非平行相机系统, 则要经过图像 对校正、 视差匹配、 深度计算和反校正等一系列处理才能得到原始视图对应的深度图。 该类深度估计问题本质上就是进行视差估计, 其性能主要由视差估计算法决定。 众所周 知, 视差估计或立体匹配是计算机视觉中的经典问题, 虽然至今已有大量的研究工作和 成果,但纹理信息缺乏或遮挡所引起的匹配模糊性或不确定性使得视差匹配问题仍旧是 计算机视觉中的研究热点和难点。  So far, there have been many research and estimation algorithms related to depth estimation, but most of them first calculate the disparity of the corrected, parallel stereo image pairs, and then calculate the depth information according to the relationship between parallax and depth. For example, there is only horizontal disparity between two images in a parallel camera system, the disparity is estimated first by feature or block matching, and then the depth information is calculated according to the inverse relationship between the depth and the parallax; for non-parallel camera systems, A series of processes such as image pair correction, parallax matching, depth calculation, and inverse correction are required to obtain the depth map corresponding to the original view. This type of depth estimation problem is essentially the estimation of disparity, and its performance is mainly determined by the disparity estimation algorithm. As we all know, disparity estimation or stereo matching is a classic problem in computer vision. Although there has been a lot of research work and results so far, the lack of matching or ambiguity caused by ambiguity or uncertainty makes the parallax matching problem still computer vision. Research hotspots and difficulties in the research.
2006年, JVT会议提案 ( "多视角视频编码核心实验 3报告" ; S. Yea, J. Oh, S. Ince, E. Martinian and A. Vetro, "Report on Core Experiment CE3 of Multiview Coding", ISO IEC JTC1/SC29AVG11 , Doc. JVT-T123, Klagenfurt, Austria, July 2006.)提出了利用照相机内外部参数 和基于深度的视图合成, 在某一指定的深度搜索范围内用给定的搜索步长, 搜索使得合 成视图与实际视图之间的误差最小的深度作为估计值。 M.Okutom等人提出了多基线立 体系统的立体匹配方法 (A multiple-baseline stereo), 该方法利用深度与视差的反比关系, 把视差估计转化为深度求解问题, 并且消除了视差匹配中的不确定性难题( "多基线立 体系统" , 模式识别与机器智能 IEEE学报; M. Okutomi and K. Kanade, "A multiple-baseline stereo", IEEE Trans, on Pattern Analysis and Machine Intelligence 15 (4): 353 - 363, 1993. ) 。 N.Kim等 人提出了在距离 /深度空间直接进行深度搜索、 匹配和视图合成操作( "一般的多基线立 体系统和利用深度空间搜索、匹配和合成的直接视图合成",计算机视觉国际期刊; (N. Kim, M. Trivedi and H. Ishiguro, "Generalized multiple baseline stereo and direct view synthesis using range-space search, match, and render", International Journal of Computer Vision 47 (1/2/3): 131 - 148, 2002. ) : 在深度空间直接进行深度搜索, 不需要视差匹配, 图像校正处理直接在深度搜 索过程中完成, 并且深度值是连续值, 其精度不像视差向量那样受图像像素分辨率的限 制。但是实际求解中,需要指定深度搜索范围和搜索步长,根据某一代价函数求最优解, 而搜索范围和步长的取值是否合适对估计性能至关重要。 视差匹配中, 视差搜索范围通常根据图像性质直观确定, 而深度搜索中, 特别是在 非平行相机系统中, 由于深度变化与图像像素偏移的关系并不显而易见, 所以其搜索范 围难以合理确定。 所以,对给定的多视角视图如何确定合适的深度搜索区间和步长成为有效估计深度 信息的关键。 2006, JVT Conference Proposal ("Multi-View Video Coding Core Experiment 3 Report"; S. Yea, J. Oh, S. Ince, E. Martinian and A. Vetro, "Report on Core Experiment CE3 of Multiview Coding", ISO IEC JTC1/SC29AVG11, Doc. JVT-T123, Klagenfurt, Austria, July 2006.) proposes the use of camera internal and external parameters and depth-based view synthesis, with a given search step within a specified depth search range, The depth that minimizes the error between the composite view and the actual view is searched as an estimate. M. Okutom et al. proposed a multiple-baseline stereo method for multi-baseline stereo systems. This method uses the inverse relationship between depth and parallax to convert the disparity estimation into a deep solution problem and eliminates the parallax matching. Deterministic Problems ("Multi-Baseline Stereo System", IEEE Transactions on Pattern Recognition and Machine Intelligence; M. Okutomi and K. Kanade, "A multiple-baseline stereo", IEEE Trans, on Pattern Analysis and Machine Intelligence 15 (4): 353 - 363, 1993. ). N. Kim et al. proposed direct depth search, matching, and view synthesis operations in distance/depth space ("general multi-baseline stereo systems and direct view synthesis using deep space search, matching, and synthesis", International Journal of Computer Vision; (N. Kim, M. Trivedi and H. Ishiguro, "Generalized multiple baseline stereo and direct view synthesis using Range-space search, match, and render", International Journal of Computer Vision 47 (1/2/3): 131 - 148, 2002. ) : Direct depth search in depth space, no parallax matching required, image correction processing directly Completed in the depth search process, and the depth value is a continuous value, the accuracy is not limited by the image pixel resolution as the disparity vector. However, in the actual solution, the depth search range and the search step size need to be specified, and the cost function is determined according to a certain cost function. The optimal solution, and whether the search range and the step value are appropriate is critical to the estimation performance. In parallax matching, the parallax search range is usually determined intuitively according to the nature of the image, while in deep search, especially in non-parallel camera systems, Since the relationship between depth variation and image pixel offset is not obvious, its search range is difficult to determine reasonably. Therefore, how to determine the appropriate depth search interval and step size for a given multi-view view becomes the key to effectively estimate depth information.
JVT-W059 '视图合成预测核心实验 6报告 "; S. Yea and A. Vetro, "Report of CE6 on View Synthesis Prediction", ISO IEC JTC1/SC29AVG11, Doc. JVT-W059, San Jose, USA, April 2007. )提出 利用两幅视图的匹配特征点对, 从若干组备选的深度搜索的最小值、 最大值和搜索步长 中选取使得匹配特征点对之间的误差最小的一组作为深度搜索范围和步长, 该方法需要 用 KLT (Kanade-Lucas-Tomasi)算法( "特征点的检测和跟踪", 卡内基梅隆大学技术报 告; C. Tomasi, and T. Kanade, "Detection and tracking of point features", Technical Report CMU-CS-91-132, Carnegie Mellon University , 1991.)进行特征提取匹配,性能依赖于特征 匹配的正确性。 JVT-W059 'View Synthesis Prediction Core Experiment 6 Report'; S. Yea and A. Vetro, "Report of CE6 on View Synthesis Prediction", ISO IEC JTC1/SC29AVG11, Doc. JVT-W059, San Jose, USA, April 2007 . ) Using a matching feature point pair of two views, selecting a minimum of the difference between the pair of matching feature points from the minimum value, the maximum value and the search step size of the selected group of depth searches as the depth search range And the step size, this method requires the KLT (Kanade-Lucas-Tomasi) algorithm ("Feature Point Detection and Tracking", Carnegie Mellon University Technical Report; C. Tomasi, and T. Kanade, "Detection and tracking of Point features", Technical Report CMU-CS-91-132, Carnegie Mellon University, 1991.) Feature extraction matching, performance depends on the correctness of feature matching.
M.Okutom和 N.Kim等人提到用最长基线的参考视图的 1个像素偏移量所对应的深 度变化值作为搜索步长, 从而保证在所有其他参考视图中的像素偏移量小于 1个像素。 上述两种方法都是使用固定的搜索步长, 没有根据图像内容或场景的变化自适应地 调整步长。 发明内容 本发明所要解决的技术问题是, 提出一种搜索步长的自适应确定方法, 能避免像素 点重复搜索或漏搜索。另外,本发明还提出了一种基于自适应搜索步长的深度估计方法。 本发明为解决上述技术问题所采用的技术方案是, 多视角视频图像深度搜索方法, 其特征在于, 在深度搜索范围内每一步的搜索步长根据当前深度值动态调整, 当前深度 值越小, 采用的搜索步长越小; 当前深度值越大, 采用的搜索步长越大, 使得每一步的 搜索步长对应于相同的像素搜索精度; 根据深度变化值和像素偏移向量的关系,把所述深度搜索范围和搜索步长的确定转 化为像素搜索范围和像素搜索精度的确定; 所述像素搜索精度等于搜索中每一次像素偏 移向量的长度; 所述像素搜索精度可以为分像素精度如二分之一个像素, 四分之一个像 素, 或整像素精度, 如一个像素, 两个像素; 所述搜索步长等于搜索中每一次像素偏移 向量所对应的深度变化值; 目标视图的搜索步长由目标视图中当前深度值、参考视图中的像素偏移向量和视图 对应的照相机内外部参数确定, 目标视图中每一步的搜索步长在参考视图中对应于相同 长度的像素偏移向量。 所述的目标视图是指当前需要估计深度的图像, 所述的参考视图 是指多视角视频系统中的其他图像。参考视图可以在深度搜索过程中自动选择或由用户 指定; 搜索步长由以下公式得到:
Figure imgf000006_0001
其中: P是目标视图中待深度估计的像素点, z为像素点 P的当前深度值, Δζ为像素 点 Ρ的深度变化值即搜索步长, APr为目标视图中像素点 P的深度变化值 Δζ在参考视图 r 中对应的像素偏移向量, II ΔΡΓ II 2= ΔΡΓ Τ · ΔΡΓ, ; Br =4 — ^^― 1禾 =4 —1是 3 x 3的 矩阵, △ = t _tT是三维向量; 其中, R为目标视角的相机坐标系相对于世界坐标系的 三维旋转矩阵; t为目标视角的相机坐标系相对于世界坐标系的平移向量; A为目标视 角的相机内部参数矩阵; ¾为参考视角的相机坐标系相对于世界坐标系的三维旋转矩 阵; ^为参考视角的相机坐标系相对于世界坐标系的平移向量; Ατ为参考视角的相机内 部参数矩阵; b3和 c3分别是矩阵 B 的第三行向量。 对于平行相机系统, 所述深度变 化值与当前深度值的平方成正比。 所述参考视图中的像素偏移向量满足目标视角和参考视角的极线约束方程:
M. Okutom and N. Kim et al. refer to the depth variation value corresponding to the 1 pixel offset of the reference view of the longest baseline as the search step size, thereby ensuring that the pixel offset in all other reference views is less than 1 pixel. Both of the above methods use a fixed search step size, and the step size is not adaptively adjusted according to changes in image content or scene. SUMMARY OF THE INVENTION The technical problem to be solved by the present invention is to provide an adaptive determination method for a search step size, which can avoid repeated or missing search of pixel points. In addition, the present invention also proposes a depth estimation method based on an adaptive search step size. The technical solution adopted by the present invention to solve the above technical problem is a multi-view video image depth search method, which is characterized in that the search step length of each step in the depth search range is dynamically adjusted according to the current depth value, and the current depth value is smaller. The smaller the search step size is; the larger the current depth value is, the larger the search step size is, so that the search step size of each step corresponds to the same pixel search accuracy; Determining the depth search range and the search step length into a determination of a pixel search range and a pixel search accuracy according to a relationship between the depth change value and the pixel offset vector; the pixel search precision is equal to each pixel offset vector in the search The pixel search accuracy may be sub-pixel precision such as one-half pixel, one-quarter pixel, or integer pixel precision, such as one pixel, two pixels; the search step size is equal to each search The depth change value corresponding to the primary pixel offset vector; the search step size of the target view is determined by the current depth value in the target view, the pixel offset vector in the reference view, and the camera internal and external parameters corresponding to the view, each step in the target view The search step size corresponds to a pixel offset vector of the same length in the reference view. The target view refers to an image that currently needs to be estimated, and the reference view refers to other images in a multi-view video system. The reference view can be automatically selected during the deep search or specified by the user; the search step is obtained by the following formula:
Figure imgf000006_0001
Where: P is the pixel to be depth estimated in the target view, z is the current depth value of the pixel point P, Δζ is the depth change value of the pixel point 即, that is, the search step size, and AP r is the depth change of the pixel point P in the target view value of the pixel corresponding to the offset vector Δζ reference view r,, II ΔΡ Γ II 2 = ΔΡ Γ Τ · ΔΡ Γ,; B r = 4 - ^^ - 1 = Wo 4--1 is a 3 x 3 matrix, △ = t _t T is a three-dimensional vector; where R is the three-dimensional rotation matrix of the camera coordinate system of the target perspective relative to the world coordinate system; t is the translation vector of the camera coordinate system of the target perspective relative to the world coordinate system; A is the target perspective Camera internal parameter matrix; 3⁄4 is the three-dimensional rotation matrix of the camera coordinate system with reference to the world coordinate system; ^ is the translation vector of the camera coordinate system with reference to the world coordinate system; Α τ is the camera internal parameter matrix of the reference angle of view ; b 3 and c 3 are the third row vectors of matrix B, respectively. For a parallel camera system, the depth change value is proportional to the square of the current depth value. The pixel offset vector in the reference view satisfies the polar constraint equation of the target view and the reference view:
APr T (C Atr x Br)P = 0 , 其中, P是目标视图中的像素点, △ ^为参考视图中的像素偏移向 量。 存在两个互为相反方向的所述像素偏移向量 ΔΡΤ满足所述的极线约束方程, 所述 2 个像素偏移向量分别对应深度值增大方向、 深度值减小方向; 深度值增大方向的偏移向 量所对应的深度变化值大于深度值减小方向的偏移向量所对应的深度变化值。 多视角视频图像的深度估计方法,在利用基于深度的视图合成和基于块配的深度搜 索中, 目标视图的深度搜索范围和搜索步长由参考视图的像素搜索范围和像素搜索精度 决定; 在深度搜索范围内, 每一步的搜索步长根据当前深度值动态调整, 当前深度值越 小, 采用的搜索步长越小; 当前深度值越大, 采用的搜索步长越大, 使得每一步的搜索 步长对应于相同的像素搜索精度; 所述深度搜索步长由目标视图中当前深度值、参考视图中的像素偏移向量和视图对 应的照相机内外部参数确定, 目标视图中每一步的搜索步长在参考视图中对应于相同长 度的像素偏移向量; 所述的基于深度的视图合成, 是指给定目标视图的像素点和深度值, 根据目标视角 和参考视角的相机内外部参数, 把该像素点反投影到三维场景空间点, 再把该空间点重 新投影到参考视角的图像平面的方法, 得到目标视图在该参考视角的合成视图; 所述基于深度的视图合成和基于块配的深度搜索具体为,利用当前深度值进行视图 合成, 并计算合成视图的像素块与参考视图的像素块之间的误差; 采用最小误差对应的 深度值为目标视图的深度估计值; 多视角视频图像的深度估计方法具体包括以下步骤: 步骤 1 估计目标视图中的深度搜索初始值 ¾=。; 步骤 2 确定深度搜索对应于参考视图中的像素搜索范围和像素搜索精度,根据像素 搜索精度得到参考视图中像素偏移向量 ΔΡ^ 步骤 3 根据当前深度值 ¾和像素偏移向量 Δ , 得到对应的深度变化值 Δ¾, 所述 深度变化值△¾即为下一步搜索步长; 步骤 4 利用当前深度值 进行视图合成, 并计算合成视图的像素块与参考视图的像 素块之间的误差 ek; 步骤 4 更新当前深度值 ¾=¾+△¾; k=k+l ; 步骤 5 判断是否超过给定的像素搜索范围, 如是进入步骤 6, 如否, 进入步骤 3 ; 步骤 6 以误差 ek(k=0,…… ,N-1, N为搜索总步数)中最小误差对应的深度值为估计 值。 所述搜索步长由以下公式得到:
Figure imgf000007_0001
其中: P是目标视图中待深度估计的像素点, z为像素点 P的当前深度值, Δζ为像素点 P 的深度变化值即搜索步长, 为目标视图中像素点 Ρ的深度变化值 Δζ在参考视图 r中 对应的像素偏移向量, II ΔΡΓ II 2= ΔΡΓ τ · ΔΡΓ, ; Br = 4 J - 1禾 P C =4A- 1是 3x3的 矩阵, △ = t _tT是三维向量; 其中, R为目标视角的相机坐标系相对于世界坐标系的 三维旋转矩阵; t为目标视角的相机坐标系相对于世界坐标系的平移向量; A为目标视 角的相机内部参数矩阵; ¾为参考视角的相机坐标系相对于世界坐标系的三维旋转矩 阵; ^为参考视角的相机坐标系相对于世界坐标系的平移向量; Ατ为参考视角的相机内 部参数矩阵; b3和 c3分别是矩阵 B 的第三行向量。 对于平行相机系统, 所述当前深 度值的平方与深度变化值成正比。所述参考视图中的像素偏移向量满足目标视角和参考 视角的极线约束方程:
Figure imgf000008_0001
χΑ)Ρ = 0, 其中, Ρ是目标视图中的像素点, APr为参 考视图中的像素偏移向量。 本发明的有益效果是, 自适应搜索步长的深度搜索不会出现像素漏搜索与重复搜 索, 深度估计中合成的图像块与参考图像块的绝对差小, 错误估计少, 且计算量或深度 搜索次数少。
AP r T (C At r x B r )P = 0 , where P is the pixel point in the target view and Δ ^ is the pixel offset vector in the reference view. There are two pixel offset vectors ΔΡ Τ in mutually opposite directions satisfying the polar line constraint equation, and the two pixel offset vectors respectively correspond to a depth value increasing direction and a depth value decreasing direction; The depth change value corresponding to the offset vector in the large direction is larger than the depth change value corresponding to the offset vector in the direction in which the depth value decreases. A depth estimation method for multi-view video images, in depth-based view synthesis and block-based depth search, the depth search range and search step size of the target view are determined by the pixel search range and pixel search precision of the reference view In the depth search range, the search step size of each step is dynamically adjusted according to the current depth value. The smaller the current depth value is, the smaller the search step size is. The larger the current depth value is, the larger the search step size is. The search step size of each step corresponds to the same pixel search precision; the depth search step size is determined by the current depth value in the target view, the pixel offset vector in the reference view, and the camera internal and external parameters corresponding to the view, each in the target view The search step of one step corresponds to a pixel offset vector of the same length in the reference view; the depth-based view synthesis refers to the pixel point and depth value of a given target view, according to the target angle of view and the reference angle of view within the camera An external parameter, a method of backprojecting the pixel point to a three-dimensional scene space point, and then re-projecting the spatial point to an image plane of the reference perspective, obtaining a composite view of the target view at the reference view; the depth-based view synthesis and The depth search based on the block configuration is specifically, using the current depth value for view synthesis, and calculating the synthesis The error between the pixel block of the figure and the pixel block of the reference view; the depth value corresponding to the minimum error is the depth estimation value of the target view; the depth estimation method of the multi-view video image specifically includes the following steps: Step 1 Estimating the target view The depth search initial value is 3⁄4=. Step 2 Determine that the depth search corresponds to the pixel search range and the pixel search precision in the reference view, and obtain the pixel offset vector ΔΡ^ in the reference view according to the pixel search precision. Step 3 According to the current depth value 3⁄4 and the pixel offset vector Δ, the corresponding correspondence is obtained. The depth change value Δ3⁄4, the depth change value Δ3⁄4 is the next search step; Step 4 uses the current depth value for view synthesis, and calculates the error e k between the pixel block of the composite view and the pixel block of the reference view ; step 4 to update the current depth value ¾ = ¾ + △ ¾; k = k + l; step 5 determining whether more than a given pixel search range, and if so proceeds to step 6, if not, proceeds to step 3; step 6 error EK ( The depth value corresponding to the smallest error in k=0, ..., N-1, N is the total number of search steps) is an estimated value. The search step is obtained by the following formula:
Figure imgf000007_0001
Where: P is the pixel point to be depth estimated in the target view, z is the current depth value of the pixel point P, and Δζ is the depth change value of the pixel point P, that is, the search step length, which is the depth change value Δζ of the pixel point 目标 in the target view. In the reference view r, the corresponding pixel offset vector, II ΔΡ Γ II 2 = ΔΡ Γ τ · ΔΡ Γ , ; B r = 4 J - 1 and PC = 4A - 1 is a matrix of 3x3, Δ = t _t T is a three-dimensional vector; wherein, R is a three-dimensional rotation matrix of the camera coordinate system of the target perspective relative to the world coordinate system; t is a translation vector of the camera coordinate system of the target perspective relative to the world coordinate system; A is a camera internal parameter matrix of the target perspective; 3⁄4 is the three-dimensional rotation matrix of the camera coordinate system with reference to the world coordinate system; ^ is the translation vector of the camera coordinate system with reference to the world coordinate system; Α τ is the camera internal parameter matrix of the reference angle of view; b 3 and c 3 is the third row vector of matrix B, respectively. For a parallel camera system, the square of the current depth value is proportional to the depth change value. The pixel offset vector in the reference view satisfies the polar constraint equation of the target view and the reference view:
Figure imgf000008_0001
χΑ)Ρ = 0, where Ρ is the pixel in the target view and AP r is the pixel offset vector in the reference view. The invention has the beneficial effects that the depth search of the adaptive search step does not cause the pixel leak search and the repeated search, the absolute difference between the synthesized image block and the reference image block in the depth estimation is small, the error estimation is small, and the calculation amount or depth Less searches.
附图说明 图 1 为多视角视频系统中的坐标系设置示意图; 图 2 为基于深度的视图合成示意图; 图 3 (a) Uli测试序列中第 7相机的视频序列的初始时刻的视图; 图 3 (b) Uli测试序列中第 7相机的视频序列的初始时刻的视图; 图 3 (c) 为图 2 (a) 的局部示意图, 16个示意点表明的区域为像素点 [527, 430] 至 [590, 493]的图像区域; 图 4 为深度变化值与深度值平方的关系示意图; 图 5 为本发明深度变化值与像素偏移向量的示意图; 图 6 为深度值较小时的像素漏搜索的示意图; 图 7 为深度值较大时的像素重复搜索的示意图; 图 8 为本发明自适应调整深度搜索步长的示意图; 图 9 为采用本发明自适应的变长搜索步长搜索到像素点的分布示意图; 图 10 采用固定搜索步长和本发明自适应步长的深度搜索性能示意图。 具体实施方式 本发明提出了一种深度搜索步长的自适应确定方法,利用相机内外部参数和透视投 影关系, 首先推导出像素点深度值、 深度变化值以及深度变化引起的投影点在合成视图 中的像素偏移量之间的关系, 根据推导出的深度变化值与相应像素偏移量之间的关系 式, 把深度搜索范围的确定转化为像素搜索范围的确定, 像素偏移量在图像中有直观意 义, 容易合理确定; 并且根据像素偏移量与深度值的关系, 即深度值越大, 相同的深度 变化值引起的像素偏移量就越小, 动态地调整搜索步长, 使得每一个搜索步长对应着相 同的像素搜索精度, 避免了像素点重复搜索或漏搜索, 从而提高了搜索效率和性能。 另 外, 本发明还提出了一种简单有效的初始深度估计方法, 该方法通过求解汇聚相机系统 中相机光轴的汇聚点, 并把该点看作场景缩影点, 从而得到场景深度的一个大概估计。 在多视角视频中通常需要三个类型的坐标系来描述场景及其图像位置信息, 它们分 别是场景所在的世界坐标系、 相机坐标系和像素坐标系, 如图 1所示。 其中, 相机坐标 系以相机中心为原点、光轴为 z轴, xy平面与图像平面平行; 像素坐标系则以图像左上 角为坐标原点, 水平和垂直坐标为 u,v。 设相机 d ( i=l, ……, m) 的相机坐标系 0l-XlylZl相对于世界坐标系 o-xyz的位置用三 维旋转矩阵 和平移向量表示, 其中 m为相机个数。场景中一点在世界坐标系下的坐标 用向量 p=[x,y,z]表示, 在相机坐标系 01-^1 中的坐标用向量 1=[ , ,¾]表示, 则根据空间 几何和坐标变换, 有如下关系: p = Ripi + tt ( 1 ) 根据计算机视觉透视投影原理,相机坐标系下的坐标 Pl与其在图像平面的齐次像素 坐标 Ρ^^,ν^ Ι]满足以下关系: zrPr = Αιρι (2) 其中, A为参考视角的相机 内部参数矩阵, 主要包括相机焦距、 中心和变形系数 等参数。 BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram of coordinate system setting in a multi-view video system; FIG. 2 is a schematic diagram of depth-based view synthesis; FIG. 3 (a) a view of an initial moment of a video sequence of a seventh camera in a Uli test sequence; (b) A view of the initial time of the video sequence of the 7th camera in the Uli test sequence; Figure 3 (c) is a partial schematic view of Figure 2 (a), with 16 indicated points indicating the pixel points [527, 430] to [590, 493] image area; FIG. 4 is a schematic diagram of the relationship between the depth variation value and the square of the depth value; FIG. 5 is a schematic diagram of the depth variation value and the pixel offset vector of the present invention; FIG. 6 is a pixel leak search when the depth value is small. Figure 7 is a schematic diagram of pixel repeated search when the depth value is large; Figure 8 is a schematic diagram of adaptively adjusting the depth search step size according to the present invention; Figure 9 is a search for the pixel by using the adaptive variable length search step size of the present invention. Schematic diagram of the distribution of points; Figure 10 is a schematic diagram of the depth search performance using the fixed search step size and the adaptive step size of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention proposes an adaptive determination method for depth search step size, which utilizes the internal and external parameters of the camera and the perspective projection relationship, first deducing the pixel point depth value, the depth variation value, and the projection point caused by the depth change in the composite view. The relationship between the pixel offsets in the middle, according to the relationship between the derived depth variation value and the corresponding pixel offset, the determination of the depth search range is converted into the determination of the pixel search range, and the pixel offset is in the image It has an intuitive meaning, which is easy to determine reasonably; and according to the relationship between the pixel offset and the depth value, that is, the larger the depth value, the smaller the pixel offset caused by the same depth change value, dynamically adjusts the search step size, so that Each search step corresponds to the same pixel search accuracy, avoiding repeated or missing search of pixels, thereby improving search efficiency and performance. In addition, the present invention also proposes a simple and effective initial depth estimation method, which obtains a rough estimate of the scene depth by solving the convergence point of the camera optical axis in the convergence camera system and treating the point as a scene miniature point. . In a multi-view video, three types of coordinate systems are usually needed to describe the scene and its image position information, which are the world coordinate system, the camera coordinate system and the pixel coordinate system where the scene is located, as shown in FIG. The camera coordinate system takes the center of the camera as the origin, the optical axis is the z-axis, and the xy plane is parallel to the image plane. The pixel coordinate system takes the upper left corner of the image as the coordinate origin, and the horizontal and vertical coordinates are u, v. Let the position of the camera coordinate system 0l - Xl y lZl of the camera d (i = l, ..., m) relative to the world coordinate system o-xyz be represented by a three-dimensional rotation matrix and a translation vector, where m is the number of cameras. The coordinates of a point in the world coordinate system are represented by the vector p=[x,y,z], and the coordinates in the camera coordinate system 01 -^ 1 are represented by the vector 1 =[ , ,3⁄4], according to the spatial geometry and The coordinate transformation has the following relationship: p = R i p i + t t ( 1 ) According to the principle of computer vision perspective projection, the coordinate P1 in the camera coordinate system and its homogeneous pixel coordinates in the image plane Ρ^^, ν^ Ι] The following relationship is satisfied: z r P r = Α ι ρ ι (2) where A is the camera's internal parameter matrix of the reference angle of view, which mainly includes parameters such as camera focal length, center and deformation coefficient.
本发明在深度空间进行基于块匹配的深度搜索, 即利用照相机内外部参数和基于深 度的视图合成,在深度搜索范围内用搜索步长搜索使得合成视图的像素块与对应的实际 参考视图的像素块之间的误差最小的深度值, 并把该深度值作为目标视图的像素点的深 度估计值。 目标视图和目标视角是指当前需要估计深度的图像和对应的视角, 参考视图 和参考视角是指多视角视频系统中的其他图像和视角。参考视图和参考视角可以在深度 搜索过程中自动选择或由用户指定。 The present invention performs block-based depth search in depth space, that is, using camera internal and external parameters and depth-based view synthesis, searching for pixel blocks of the synthesized view and corresponding actual reference views by using search step length in the depth search range. The minimum depth value between the blocks, and the depth value is taken as the depth of the pixel of the target view Estimated value. The target view and the target perspective refer to images and corresponding perspectives that currently require depth estimation, and the reference views and reference perspectives refer to other images and perspectives in the multi-view video system. The reference view and reference view can be automatically selected during the deep search or specified by the user.
当视图中像素点的深度值给定时,根据相机内外部参数可以把该像素点反投影 (back project)到场景空间中得到一空间点, 再把该空间点重新投影 (re project) 到所需视角方 向的图像平面,得到该视角的合成视图,这就是基于深度的视图合成技术,如图 2所示。 考虑两个视图的情形, 设视图 1为目标视图, 视图 2为参考视图。 视图 1中的像素 点?在其相机^坐标系下的深度值为 Zl, 该点在视图 2中的对应像素点为 P2, 在其相机 c2 坐标系下的深度值为 ¾, 根据公式 (1) (2) 可推导得到 When the depth value of the pixel in the view is given, the pixel can be back projected into the scene space according to the internal and external parameters of the camera to obtain a spatial point, and the spatial point is reprojected to the required The image plane in the direction of the view gives a composite view of the view, which is the depth-based view synthesis technique, as shown in Figure 2. Consider the case of two views, with view 1 as the target view and view 2 as the reference view. The pixel points in view 1? The depth value in the camera coordinate system is Zl , the corresponding pixel point in view 2 is P 2 , and the depth value in the camera c 2 coordinate system is 3⁄4, according to formula (1) (2) Derived
z^^+^z^-1^ +t2 (3) 由式 (3) 得到:
Figure imgf000010_0001
z^^+^z^- 1 ^ +t 2 (3) From equation (3):
Figure imgf000010_0001
为方便描述, 记:
Figure imgf000010_0002
For the convenience of description, remember:
Figure imgf000010_0002
则 (4) 式变为: ζΒΡ+α = z2P2 (6) 其中 B,C是三维矩阵, t是两个相机之间的平移向量。 由于 P P2是齐次坐标, 可消去 (6) 中的 ¾, 得到像素点 Pi在视图 2中的像素齐次坐标为: Then (4) becomes: ζΒΡ+α = z 2 P 2 (6) where B, C is a three-dimensional matrix and t is the translation vector between the two cameras. Since PP 2 is a homogeneous coordinate, 3⁄4 of (6) can be eliminated, and the pixel parity of the pixel point Pi in view 2 is obtained as:
λΒΡλ λ λ ΒΡ λ
Figure imgf000010_0003
其中 b3和 c3分别是矩阵 Β和 C的第三行向量; 由式 (9)可得出: 在相机 ^与 内外部参数已知的情况下, 视图 2的像素点值是关 于视图 1中的像素点值及其深度值的函数。 利用公式 (7) 进行视图 1在参考视角 2中 的视图合成。 视图 1中的像素点 Pi,在给定的深度 z下通过反投影和重投影得到其在相机 c2的视角 中的合成视图 2 的像素点 Ρ2, Ρ2
Figure imgf000010_0004
根据计算机视觉中的常用假设, 同一场景 点在不同视角的视图中的对应像素点具有相同的亮度色度值, 则在深度值 z下视图 1 的 像素点 A在视角 2的合成视图 2中的像素点 P2的亮度色度值为:
Figure imgf000010_0003
Where b 3 and c 3 are the third row vectors of the matrix Β and C, respectively; From equation (9), it can be concluded that, in the case where the camera ^ and the internal and external parameters are known, the pixel value of view 2 is related to view 1 A function of the pixel value and its depth value in . The view synthesis of view 1 in reference view 2 is performed using equation (7). Point of view of a pixel in the 1 Pi, at a given depth z to obtain the synthesis point of view of a pixel in the camera angle Ρ 2 c 2 by the backprojection and reprojection 2, Ρ 2
Figure imgf000010_0004
According to common assumptions in computer vision, the same scene Perspective views of different points in the corresponding pixel points having the same luminance chrominance values, A 1 is a view pixel values in pixels in the perspective view of Synthesis 2 2 2 P-luminance-chrominance value at depth z :
Synthesized _ I2 (P2 ) = Synthesized _ I2 (f2 (z, ^ )) = /, (;) ( 8 )Synthesized _ I 2 (P 2 ) = Synthesized _ I 2 (f 2 (z, ^ )) = /, (;) ( 8 )
L为视图 1, I2为视图 2, Synthesized_I2为视图 1在参考相机视角 2中的合成视图 2。 上述说明是以两个相机组成的相机系统为例, 同样可以进一步得出由 m个相机组成的相 机系统可以适用于上述原理。 假设在以像素点 P为中心的局部窗口 W内的像素点 Ρ」具有相同的场景深度值,则在窗 口 W内视图 1的合成视图 2与相机在视角 2实际拍摄得到的参考视图 2的绝对差为: L is view 1, I 2 is view 2, and Synthesized_I 2 is a composite view 2 of view 1 in reference camera perspective 2. The above description is exemplified by a camera system composed of two cameras, and it can be further drawn that a camera system composed of m cameras can be applied to the above principle. Assuming that the pixel points in the partial window W centered on the pixel point P have the same scene depth value, the composite view 2 of the view 1 in the window W and the absolute view of the reference view 2 actually taken by the camera at the angle of view 2 are assumed. The difference is:
SAD(z, ) =∑ \Synthesised_I2 (f(z, P} )— /2 (/(z, P} ))|| SAD(z, ) =∑ \Synthesised_I 2 (f(z, P } )— / 2 (/(z, P } ))||
(9)
Figure imgf000011_0001
(9)
Figure imgf000011_0001
由于合成视图 2是利用参考视图 2对应的照相机参数计算得到, 所以理论上真实 场景深度值下的合成视图 2与参考视图 2具有相同的亮度色度值。 因此, 视图 1在像素 点 P的深度值求解可以转化为以下问题: ze{
Figure imgf000011_0002
即在给定深度搜索范围 (depth range) 内, 把使得合成视图与参考视图的绝对差最小的 深度 z作为最终的深度估计值。
Since the composite view 2 is calculated using the camera parameters corresponding to the reference view 2, the synthetic view 2 under theoretical real scene depth values has the same luminance chrominance value as the reference view 2. Therefore, the solution of the depth value of view 1 at pixel point P can be transformed into the following problem: ze{
Figure imgf000011_0002
That is, within a given depth range, the depth z that minimizes the absolute difference between the composite view and the reference view is taken as the final depth estimate.
这种在深度空间直接进行深度搜索的方法, 不需要视差匹配, 图像校正处理直接在 深度搜索过程中完成, 并且深度值是连续值, 其精度不像视差向量那样受图像像素分辨 率的限制。 从公式 (7 ) 知道, 在相机内外部参数已知的情况下, 合成视图 2的像素点是关于 视图 1中的像素点及其深度值的函数。 若视图 1中的像素点 P1对应的深度值变化 Δζ, 则其在合成视图 2中的像素坐标为:
Figure imgf000011_0003
所以, 视图 1 中的像素点 Pi的深度值变化 Δζ引起其在合成视图 2中的像素偏移向
Figure imgf000012_0001
由式 (12) 可以推导出视图 1中的像素点的深度值变化 和合成视图 2中的相应 像素点偏移向量 ΔΡ的关系为:
This method of directly performing depth search in depth space does not require parallax matching, image correction processing is directly performed in the depth search process, and the depth value is a continuous value, and the accuracy is not limited by the image pixel resolution as the disparity vector. It is known from equation (7) that in the case where the internal and external parameters of the camera are known, the pixel points of the composite view 2 are a function of the pixel points in the view 1 and their depth values. If the depth value corresponding to the pixel point P1 in the view 1 changes by Δζ, the pixel coordinates in the composite view 2 are:
Figure imgf000011_0003
Therefore, the depth value change Δζ of the pixel point Pi in the view 1 causes its pixel offset in the composite view 2
Figure imgf000012_0001
From equation (12), it can be derived that the relationship between the depth value change of the pixel in view 1 and the corresponding pixel point offset vector ΔΡ in the composite view 2 is:
Δ (¾ Pft - c3 J tBPx - {zxblPx + οζ ξΡ^ΑΡ) = ( ,¾ Ρλ +c3 J 1)1 ΑΡ (13) 用 ΔΡΤ左乘 (13) 的两边, 得到:
Figure imgf000012_0002
其中, |A/f =Δ ΤΔ 是像素偏移向量 ΔΡ的模的平方。 所以, 当相机参数已知时通 过 (14) 可求得在深度 处像素偏移向量 ΔΡ所对应的深度变化值 Δζ。 另外, 由式 (6) 可得到两幅视图的对应像素点该满足如下极线约束方程:
Δ (¾ Pft - c 3 J tBP x - {z x blP x + οζ ξΡ ^ ΑΡ) = (, ¾ Ρ λ + c 3 J 1) 1 ΑΡ (13) with ΔΡ Τ-multiplied (13) on both sides, get:
Figure imgf000012_0002
Where |A/f = Δ Τ Δ is the square of the modulus of the pixel offset vector ΔΡ. Therefore, the depth variation value Δζ corresponding to the pixel offset vector ΔΡ at the depth can be obtained by (14) when the camera parameters are known. In addition, the corresponding pixel points of the two views can be obtained from equation (6), which satisfies the following line constraint equation:
Figure imgf000012_0003
其中 x是向量的叉乘。 所以, 式 (15) 减去式 (16) 得到像素偏移向量 ΔΡ 也应该满足 极线约束方程:
Figure imgf000012_0003
Where x is the cross product of the vector. Therefore, the equation (15) minus the equation (16) to obtain the pixel offset vector ΔΡ should also satisfy the polar constraint equation:
APT(CtxB)P1 =0 (17) 给定相机参数和像素点?的情况下, 式 (17) 是关于像素偏移向量 ΔΡ的两个分量 和 Δν的齐次线性方程。 对于平行相机系统, 同一场景点在两幅视图中的视差 d与其深度成反比, 即 d = (18) 其中 d和 z分别是视差和深度, fiPB分别是相机的焦距和基线长度。 则视图 1中的像素点 Pi的深度值由 Zl变化到 ¾时, 它在合成视图 2中的对应投影点的像素偏移量为
Figure imgf000013_0001
根据 (19) 知道, 深度变化值与像素偏移量成正比, 与深度值的平方成反比。 对于 相同的像素偏移量, 当所处的深度值越大, 对应的深度变化值就越大, 当所处的深度值 越小, 对应的深度变化值就越小。 对于汇聚相机系统, 当两个相机的角度不是很大时, 从式 (12) 也可以看出, 深度变化值、 像素偏移量和深度值之间有近似的关系。 为了验证该结论, 我们用如图 3所示的 Uli测试序列 (该该多视角视频数据由德国 的 Heinrich-Hertz-Institut (HHI) 提 供 , 可 以 从 https:〃 www.3dtv-research.org/ 3dav_CfP_FhG_HHI/下载得到。 该视频序列采用由 8台以汇聚方式排列的摄像机拍摄得 到, 视频格式为 1024x768, 25fps)本文采用第 7和第 8台相机的视频序列的初始时刻的 视图) 的第 7和第 8台相机的参数, 根据公式 (14)和 (17) 计算在视图 7 (如图 3(a)) 中的像素点 P=[526,429]处(对应于衬衣领子右边的扣子), 深度变化值、深度值平方和 像素偏移量之间的关系。 给定满足极线约束(17) 的单位像素偏移向量, 即 |ΔΡ| = 1, 根 据 (14)计算得到不同的深度值对应的深度变化值, 它们之间的关系如图 4所示, 其中 横坐标为深度值的平方, 纵坐标为深度变化值。 图 4表明, 当合成视图中的像素偏移量 给定时, 深度变化值与深度值的平方近似成线性关系, 这意味着在不同深度值处, 视图 1中像素点的相同量的深度变化值在合成视图中引起不同的像素偏移量。 值得注意的是, 由于 (17)是关于像素偏移向量 ΔΡ的齐次线性方程, 所以 ΔΡ存在 互为相反方向的两种情形 ΔΡ +和 ΔΡ -, 把它们代入 (14) 可求得一正一负的两个深度变 化值 Δζ+和 -,即 ΔΡ+和 ΔΡ—分别对应于深度值增大和深度值减小所引起的像素偏移向 量。由前面分析知道,像素偏移量给定时,深度变化值与深度值的平方近似成正比关系, 所以相同大小不同方向的两个像素偏移向量 ΔΡ对应的深度变化值并不相同, 即深度值 减小量 I Δζ— I 小于深度值增大量 I Δζ+ I , 如图 5所示。例如, 取 Uli视图 7 (如图 3(a) 所示) 中的像素点 P=[526,429], 深度值 3172mm, 像素偏移量为 64个像素, 即 II ΔΡ II =64, 根据 (14) ( 17) 求得两个相反方向的像素偏移向量对应的深度变化值分别为 Δ z+=930和 Δζ— =-593。 根据以上分析知道, 相同像素偏移量的情况下, 深度变化值与深度值的平方近似成 正比关系。 所以使用固定搜索步长时, 若在较小深度值处给定的搜索步长对应于 1个像 素的偏移量, 则在较大深度值处, 该搜索步长对应的像素偏移量将小于 1个像素。 假设 在给定的深度值下投影到非整数像素时, 取最近邻的像素点作为投影点, 则深度搜索时 将在多个不同的深度值处搜索到同一像素点, 即出现了重复搜索。 反过来, 若给定的搜 索步长在较大深度值处对应于 1个像素的偏移量, 则在较小深度值处该搜索步长对应的 像素偏移量将大于 1个像素, 即相邻两个深度值将搜索到两个非相邻的像素点, 从而使 得有些像素点漏检, 产生搜索不全。 所以, 本来期望在搜索范围 [zmm , zmax ]内搜索 N个 像素点, 但由于产生了像素点重复搜索或漏搜索, 实际搜索到的有效搜索点要少于 N。 例如,我们对 Uli视图 7中的像素点 P=[526,429],在 [2000, 4500]的范围内,以 10mm 的固定步长进行深度搜索。 如图 6所示, 当深度值较小时, 例如在 2090处搜索到的像 素点 u坐标为 661, 而在深度值 2080处搜索到的像素点的 u坐标为 663, 中间有像素点 跳过而没有被搜索到; 而如图 7所示, 当深度值较大时, 例如两个不同的深度值 4450 和 4460处搜索到了 u坐标为 437的同一个像素点,即像素点进行了重复搜索。由于 10mm 的搜索步长在真实深度值 3170局部范围内对应于 1个像素的搜索精度, 所以本来我们 期望在 [2000, 4500]的范围内搜索到 250个不同的像素,但由于发生了像素漏搜索和重复 搜索, 实际计算发现只搜索了 200个像素。 为了使深度搜索过程中, 搜索步长对应于参考视图中相同的像素搜索精度, 即搜索 步长始终对应于参考视图中固定个像素的偏移量, 必须根据深度变化值与深度值之间的 关系动态调整搜索步长, 并确定相应的搜索范围。 假设视图 1 中的像素点!^的初始搜 索深度值为 , 则根据式 (14) 可方便地求得在深度 下, 参考视图 2中的像素偏移量 ΔΡ对应的视图 1中的深度变化值 Δζ。 当初始深度值 ζ。与真实深度值相差不是很大的情 况下, 像素点 Pi在参考视图 2中的真实对应像素点与深度 z。下求得的像素点之间的像素 偏移量通常局限于某一范围内。 下面给出在像素搜索范围 N内, 如何根据深度值自适应 确定搜索步长, 使得搜索步长始终对应固定个像素的偏移量。 给定像素点 Pi和相机参数, 根据像素偏移向量的极线约束方程 (16) , 很容易求解 得到像素偏移量 II ΔΡ II对应的两个互为相反方向的偏移向量 ΔΡ^ΡΔΡ—,然后根据(14) 计算出对应
Figure imgf000014_0001
把它们作为下一步深度值变小和变大方向的 搜索步长, 如图 8所示, ζ_λ = z0 +
AP T (CtxB)P 1 =0 (17) Given camera parameters and pixels? In the case, equation (17) is a homogeneous linear equation for the two components of the pixel offset vector ΔΡ and Δν. For a parallel camera system, the disparity d of the same scene point in the two views is inversely proportional to its depth, ie d = (18) where d and z are parallax and depth, respectively, and fiPB is the focal length and baseline length of the camera, respectively. Then, when the depth value of the pixel point Pi in the view 1 is changed from Z1 to 3⁄4, the pixel offset of the corresponding projection point in the composite view 2 is
Figure imgf000013_0001
According to (19), the depth change value is proportional to the pixel offset and inversely proportional to the square of the depth value. For the same pixel offset, the larger the depth value is, the larger the corresponding depth change value is. The smaller the depth value is, the smaller the corresponding depth change value is. For the converged camera system, when the angles of the two cameras are not very large, it can also be seen from equation (12) that there is an approximate relationship between the depth variation value, the pixel offset, and the depth value. To verify this conclusion, we use the Uli test sequence shown in Figure 3 (this multi-view video data is provided by Heinrich-Hertz-Institut (HHI) in Germany, available from https:〃 www.3dtv-research.org/ 3dav_CfP_FhG_HHI / Downloaded. The video sequence is taken by 8 cameras arranged in a convergent manner, the video format is 1024x768, 25fps) The 7th and 8th views of the video sequence of the 7th and 8th cameras are used. The parameters of the eight cameras are calculated according to equations (14) and (17) at the pixel point P=[526,429] in view 7 (Fig. 3(a)) (corresponding to the button on the right side of the shirt collar), the depth change value , the relationship between the square of the depth value and the pixel offset. Given the unit pixel offset vector satisfying the polar constraint (17), ie |ΔΡ| = 1, the depth variation values corresponding to different depth values are calculated according to (14), and the relationship between them is as shown in Fig. 4. The abscissa is the square of the depth value, and the ordinate is the depth change value. Figure 4 shows that when the pixel offset in the composite view is given, the depth change value is approximately linear with the square of the depth value, which means that the same amount of depth change value of the pixel in view 1 at different depth values Causes different pixel offsets in the composite view. It is worth noting that since (17) is a homogeneous linear equation for the pixel offset vector ΔΡ, ΔΡ exists in two opposite directions ΔΡ + and ΔΡ -, and they can be obtained by substituting them into (14). A negative two depth variation values Δζ + and -, ΔΡ + and ΔΡ, respectively correspond to pixel offset vectors caused by depth value increase and depth value reduction. According to the previous analysis, the pixel offset is given, and the depth change value is proportional to the square of the depth value. Therefore, the depth change values corresponding to the two pixel offset vectors ΔΡ of the same size and different directions are not the same, that is, the depth value. The decrease amount I Δζ−I is smaller than the depth value increase amount I Δζ + I , as shown in FIG. 5 . For example, take the pixel point P=[526,429] in the Uli view 7 (as shown in Figure 3(a)), the depth value is 3172mm, and the pixel offset is 64 pixels, ie II ΔΡ II =64, according to (14) (17) Find the depth change values corresponding to the pixel offset vectors of the two opposite directions as Δ z + = 930 and Δ ζ — = -593, respectively. According to the above analysis, in the case of the same pixel offset, the depth change value is approximately proportional to the square of the depth value. So when using a fixed search step, if the search step size at a smaller depth value corresponds to 1 image The offset of the prime, then at a larger depth value, the pixel offset corresponding to the search step will be less than 1 pixel. Assuming that a pixel of the nearest neighbor is taken as a projection point when projected to a non-integer pixel at a given depth value, the same pixel will be searched at a plurality of different depth values in the depth search, that is, a repeated search occurs. Conversely, if the given search step corresponds to an offset of 1 pixel at a larger depth value, the pixel offset corresponding to the search step will be greater than 1 pixel at a smaller depth value, ie Two adjacent depth values will search for two non-adjacent pixel points, causing some pixels to miss detection and incomplete search. Therefore, it is originally expected to search for N pixel points in the search range [z mm , z max ], but since the pixel point repeated search or the miss search is generated, the actual searched effective search point is less than N. For example, we perform a depth search on the pixel point P=[526,429] in Uli view 7, in the range of [2000, 4500], with a fixed step size of 10 mm. As shown in FIG. 6, when the depth value is small, for example, the pixel coordinate u searched at 2090 is 661, and the pixel coordinate of the pixel searched at the depth value 2080 is 663, and the pixel point is skipped in the middle. As shown in FIG. 7 , when the depth value is large, for example, two different depth values 4450 and 4460 search for the same pixel with the u coordinate of 437, that is, the pixel is repeatedly searched. Since the search step size of 10 mm corresponds to the search precision of 1 pixel in the local range of the true depth value 3170, we originally expected to search for 250 different pixels in the range of [2000, 4500], but due to the occurrence of pixel leakage Searching and repeating the search, the actual calculation found that only 200 pixels were searched. In order to make the depth search process correspond to the same pixel search precision in the reference view, that is, the search step always corresponds to the offset of the fixed pixel in the reference view, and must be based on the depth variation value and the depth value. The relationship dynamically adjusts the search step size and determines the corresponding search range. Assume the pixels in view 1! The initial search depth value of ^ is, and the depth change value Δζ in the view 1 corresponding to the pixel offset ΔΡ in the reference view 2 can be conveniently obtained according to the equation (14). When the initial depth value is ζ. In the case where the difference from the true depth value is not very large, the pixel corresponding to the pixel point Pi in the reference view 2 is the true corresponding pixel point and the depth z. The pixel offset between the pixels found below is usually limited to a certain range. Given in the pixel search range N, how to adaptively determine the search step size according to the depth value, so that the search step size always corresponds to the offset of a fixed pixel. Given the pixel point Pi and camera parameters, according to the polar line constraint equation (16) of the pixel offset vector, it is easy to solve the two offset vectors ΔΡ^ΡΔΡ corresponding to the pixel offset II ΔΡ II. And then calculate the correspondence according to (14)
Figure imgf000014_0001
Use them as the search step size for the next step to decrease the depth value and increase the direction, as shown in Figure 8. Ζ_ λ = z 0 +
A (20) zl = z0 + Azl 接着, 在深度值 和偏移向量 ΔΡ-下, 利用 (14)计算出对应的深度变化值△ , 在深度值 和偏移向量 ΔΡ+下, 利用 (14) 计算出对应的深度变化值 Az+2, 并把它们分 别作为下一步的搜索步长, 即 z_2 = z 1 + Az_2 A (20) z l = z 0 + Az l Next, under the depth value and the offset vector ΔΡ-, the corresponding depth change value Δ is calculated by (14), and the depth value and the offset vector ΔΡ + are utilized. (14) Calculate the corresponding depth change value Az +2 and use them as the search step of the next step, ie z_ 2 = z 1 + Az_ 2
. (21 ) z2 = Zj + Az2 以此类推, 可得到第 n步的搜索深度和步长为:
Figure imgf000015_0001
其中, 搜索步数 n根据搜索范围 N和搜索精度确定, 即 n满足《· Δ≤Λ^ 。 所以确定了搜索范围和初始深度值 后, 利用以上方法就可以求得随着深度值变化 而自适应调整的变长搜索步长, 使得深度搜索过程中保持相同的像素搜索精度, 克服了 固定搜索步长中像素重复搜索或漏搜索的缺陷。 由于深度搜索范围通过搜索步长的累加 得到, 因而也随着深度值的变化而自适应地调整, 深度值变大时, 相同的像素偏移量 II ΔΡ II对应的深度搜索范围相应地变大; 当深度值变小时, 相同的像素偏移量 II ΔΡ II对 应的深度搜索范围也相应地变小。 另外, 我们还可以很方便地通过像素精度 控制深度 搜索精度, 如 Δ=1对应于单位像素的搜索精度, 而 Δ=1/2对应于半像素的搜索精度。 所以, 有了深度值变化和像素偏移向量之间的关系, 如式 (14) , 就可以通过确定 像素搜索精度的方法来确定相应的深度搜索步长,深度搜索范围的确定也转化为相应的 像素偏移量的确定。像素偏移量和搜索精度的确定类似于视差估计中搜索范围和精度的 确定, 具有直观意义, 容易确定, 并且可根据图像内容或应用需求, 通过调整像素偏移 量和搜索精度, 动态确定相应的深度搜索范围和步长。 在深度估计过程中, 需要给定一个深度初始值 ζ。, 该初始值的取值好坏影响着深度 搜索性能和效果。 当 ζ。与真实深度值的偏差较小时, 可以使用较小的像素偏移量即搜索 范围可以小一些, 从而减小搜索量过高搜索速度; 当 ^与真实深度值的偏差较大时, 则 要使用相对较大的像素偏移量, 以保证搜索到真实深度值, 因而计算量较大。 虽然差的 深度初始值可通过设定大范围的搜索范围和高精度的搜索步长来改善搜索性能,但是良 好的深度初始值可以确定小范围的搜索范围和合适的步长, 从而提高深度搜索的效率和 性能。 所以, 在深度估计过程中深度初始值的估计和确定也非常重要。 视频序列图像的初始深度值的确定可分为两种情形, 初始时刻的图像和后续图像。 初始时刻图像的深度初始值的确定又分为两种, 即第一个像素和其它像素。 对于第一个 像素, 由于还未对任何像素进行过深度搜索, 因此没有任何已知的场景深度信息, 此时 需要考虑如何从图像特征和相机参数等信息中获得场景深度的大概值作为初始值; 对于 后续其他像素, 则可以根据图像内邻近像素点的深度估计值确定其初始深度。 对于后续 的图像, 由于同一视角的视频序列图像的深度值有很强的相关性, 静止不动的背景区域 的深度保持不变, 而只有少数的运动区域的深度发生变化, 所以可以把前一时刻图像的 同一像素点位置的深度值作为初始值。 所以, 初始深度值的确定中, 关键在于获得初始 时刻图像的场景深度信息, 为第一个像素提供较好的深度初始值。
(21) z 2 = Zj + Az 2 and so on, the search depth and step size of the nth step are obtained as follows:
Figure imgf000015_0001
Wherein, the number of search steps n is determined according to the search range N and the search precision, that is, n satisfies "· Δ ≤ Λ ^ . Therefore, after determining the search range and the initial depth value, the above method can be used to obtain the variable length search step size adaptively adjusted with the change of the depth value, so that the same pixel search precision is maintained during the depth search process, and the fixed search is overcome. A defect in which the pixel repeats the search or misses the search in the step size. Since the depth search range is obtained by the accumulation of the search step size, it is also adaptively adjusted as the depth value changes. When the depth value becomes larger, the depth search range corresponding to the same pixel offset II ΔΡ II becomes correspondingly larger. When the depth value becomes smaller, the depth search range corresponding to the same pixel offset II ΔΡ II is correspondingly smaller. In addition, we can easily control the depth search accuracy by pixel precision, such as Δ=1 corresponding to the search precision of the unit pixel, and Δ=1/2 corresponds to the search precision of the half pixel. Therefore, with the relationship between the depth value change and the pixel offset vector, as in equation (14), the corresponding depth search step size can be determined by determining the pixel search accuracy, and the determination of the depth search range is also converted into corresponding The determination of the pixel offset. The determination of pixel offset and search accuracy is similar to the determination of search range and precision in disparity estimation, which is intuitive and easy to determine, and can be dynamically determined by adjusting pixel offset and search accuracy according to image content or application requirements. Depth search range and step size. In the depth estimation process, a depth initial value 需要 needs to be given. The value of the initial value affects the depth search performance and effect. When you are. When the deviation from the true depth value is small, a smaller pixel offset can be used, that is, the search range can be smaller, thereby reducing the search speed too high; when the deviation from the true depth value is large, A relatively large pixel offset is used to ensure that the true depth value is searched, and thus the amount of calculation is large. Although the poor depth initial value can improve the search performance by setting a wide range of search range and high-precision search step size, a good depth initial value can determine a small range of search range and a suitable step size, thereby improving the depth search. Efficiency and performance. Therefore, the estimation and determination of the depth initial value during the depth estimation process is also very important. The determination of the initial depth value of the video sequence image can be divided into two situations, an image at the initial moment and a subsequent image. The determination of the depth initial value of the image at the initial time is further divided into two types, that is, the first pixel and the other pixels. For the first pixel, since there is no deep search for any pixel, there is no known scene depth information. In this case, it is necessary to consider how to obtain the approximate value of the scene depth from the image features and camera parameters as the initial value. For subsequent pixels, the initial depth can be determined from the depth estimates of neighboring pixels within the image. For subsequent images, since the depth values of the video sequence images of the same viewing angle have a strong correlation, the depth of the stationary background region remains unchanged, and only a small number of motion regions change in depth, so the previous one can be The depth value of the same pixel position of the time image is taken as an initial value. Therefore, in the determination of the initial depth value, the key is to obtain the scene depth information of the initial time image, and provide a better depth initial value for the first pixel.
多视角视频中, 不同视图的图像之间的差异或照相机的位置信息通常包含着有关场 景深度的信息。 下面针对汇聚相机系统和平行相机系统两种情形, 给出在没有任何已知 深度信息的情况下, 根据照相机参数或图像信息进行场景深度的初始估计。 多视角视频的主要目标是在多个角度拍摄同一场景的信息,所以相机通常呈圆弧形 放置, 并且相机光轴汇聚于一点, 即汇聚系统。 实际应用中, 虽然相机可能不严格汇聚 于一点, 但总可以找到一个与各相机光轴距离最近的一个点, 这个点被认为汇聚点。 汇 聚点通常都是场景所在的位置, 可以认为是场景的一个缩影点, 所以通过求得汇聚点的 位置就可以求得场景深度的一个大概值, 并把该值作为深度搜索中的初始值。 设汇聚点在世界坐标系中的坐标为 Mc=[Xe,ye, ], 该点位于每个相机的光轴上, 所 以该点在以光轴为 z轴的相机坐标系中可以表示为: , = [0, 0, zrl] In multi-view video, the difference between images of different views or the position information of the camera usually contains information about the depth of the scene. In the following, for both the converged camera system and the parallel camera system, an initial estimation of the depth of the scene based on camera parameters or image information without any known depth information is given. The main goal of multi-view video is to capture information of the same scene at multiple angles, so the camera is usually placed in a circular arc and the camera's optical axis converges at one point, the convergence system. In practice, although the camera may not be strictly concentrated at one point, it is always possible to find a point that is closest to the optical axis of each camera. This point is considered to be a convergence point. The convergence point is usually the location of the scene, which can be considered as a microscopic point of the scene. Therefore, by obtaining the position of the convergence point, an approximate value of the scene depth can be obtained, and the value is used as the initial value in the depth search. Let the coordinates of the convergence point in the world coordinate system be Mc=[ Xe , y e , ], which is located on the optical axis of each camera, so this point can be expressed in the camera coordinate system with the optical axis as the z-axis. : , = [0, 0, z rl ]
其中 ¾是汇聚点在相机 坐标系中的深度。 根据世界坐标与相机坐标的关系, 有以 下式子:
Figure imgf000017_0001
Where 3⁄4 is the depth of the convergence point in the camera coordinate system. According to the relationship between world coordinates and camera coordinates, there is the following formula:
Figure imgf000017_0001
= RM, +  = RM, +
(24)  (twenty four)
Mc =R„Mm+t„ 消去 Mc得到: M c =R„M m +t„ Eliminate Mc to get:
^[0,0,^] + ^ =^[0,0,^2] + ^2 ^[0,0,^] + ^ =^[0,0,^ 2 ] + ^ 2
^[0,0,^] + ^ =^[0,0,^3] + ^ ^[0,0,^] + ^ =^[0,0,^ 3 ] + ^
(25)
Figure imgf000017_0002
式 (25)是关于深度 , 2,— ,2„„的3(/7-1)个线性方程。 用线性最小二乘法求解方 程组 (25) , 可得到汇聚点在各相机坐标系中的深度值 , 它们是场景深度 的一个大概值, 可作为深度搜索中的深度初始值。 平行相机系统中不存在汇聚点, 不能用以上方法求深度信息, 但此时视差与深度有 简单的反比关系(18),所以可通过计算两幅视图之间的全局视差的方法得到深度信息。 全局视差可定义为使两视图的绝对差最小的像素偏移量, 即通过以下方法求得:
(25)
Figure imgf000017_0002
Equation (25) is a linear equation of 3 (/7-1) with respect to depth, 2 , -, 2.... Using the linear least squares method to solve the system of equations (25), you can get the depth values of the convergence points in each camera coordinate system. They are an approximate value of the depth of the scene and can be used as the depth initial value in the depth search. There is no convergence point in the parallel camera system, and the depth information cannot be obtained by the above method. However, the parallax has a simple inverse relationship with the depth (18), so the depth information can be obtained by calculating the global parallax between the two views. The global disparity can be defined as the pixel offset that minimizes the absolute difference between the two views, ie by the following method:
Figure imgf000017_0003
其中, R是视图 1和 2的重叠区域的像素个数。 由于对全局视差的估计精度要求不 高, 所以式 (26) 中像素偏移量 X的搜索单位可以设得大一点, 譬如 8个像素或 16个 像素, 从而可以大大减少计算量。 求得全局视差后, 根据深度与视差成反比的关系式 (18) , 即可求得深度初始值。 利用 Uli视频序列参数文档提供的一个场景点: 眼镜左边的高亮度点的真实世界坐 标 [35.07, 433.93, -1189.78] (;单位 mm), 并根据世界坐标和相机坐标的关系式(1)可得到 该场景点在相机坐标系下的坐标和真实深度信息; 再利用上述求两个相机汇聚点的方 法, 即通过求解线性方程组(26)得到汇聚点在相机 7和相机 8中的坐标系下的深度值, 计算结果如表 1所示。 根据人眼观测判断, Uli场景的景深变化不大, 而表 1中深度初 始估计值与场景点的真实深度信息相差不大, 说明了深度初始估计值较为有效合理, 为 深度估计提供了良好的初始值。
Figure imgf000017_0003
Where R is the number of pixels of the overlapping area of views 1 and 2. Since the estimation accuracy of the global disparity is not high, the search unit of the pixel offset X in the equation (26) can be set larger, such as 8 pixels or 16 pixels, so that the calculation amount can be greatly reduced. After obtaining the global disparity, the depth initial value can be obtained according to the relationship (18) whose depth is inversely proportional to the parallax. Use the Uli video sequence parameter document to provide a scene point: the real world coordinates of the high-brightness point on the left side of the glasses [35.07, 433.93, -1189.78] (in mm), and according to the relationship between world coordinates and camera coordinates (1) Obtain the coordinate and real depth information of the scene point in the camera coordinate system; and then use the above method of finding two camera convergence points, that is, by solving the linear equations (26) to obtain the coordinate system of the convergence point in the camera 7 and the camera 8. The depth value below is calculated as shown in Table 1. According to the human eye observation, the depth of field of the Uli scene does not change much, but the initial depth estimation value in Table 1 is not much different from the real depth information of the scene point, indicating that the initial depth estimation value is more effective and reasonable. The depth estimate provides a good initial value.
Figure imgf000018_0001
Figure imgf000018_0001
表 1 如图 3 (c)所示的 Uli视图从像素点 [527, 430]至 [590, 493]的 64x64图像区域。对该 区域内每隔 15个像素的像素点, 共 16个像素点, 用固定步长和自适应步长分别进行深 度搜索。 在固定搜索范围 [2000, 5000]内进行三次采用固定搜索步长的搜索, 步长分别 为 20, 35, 50。 自适应搜索步长的确定中, 初始深度为 2817, 像素偏移量设为 32个像 素, 搜索精度为 1个像素, 后续像素点的深度初始值设为邻近像素点的深度估计值。 根 据本发明自适应搜索步长的确定方法, 可得到像素点 [527, 430]在偏离初始搜索像素点 32个像素的搜索范围内, 每单位像素的搜索精度所对应的搜索步长, 如表 2所示, 按这 些搜索步长搜索到像素点如图 9所示。 表 2说明, 沿着深度值减小方向的步长为负值, 并且随着像素偏移量的增大、 深度值的减小, 步长的绝对值逐渐减小; 而沿着深度值增 大方向的步长为正值, 并且随着像素偏移量的增大、 深度值的增大, 步长的绝对值逐渐 增大。 图 9表明, 用表 2的变长搜索步长进行深度搜索时, 对应的像素搜索精度确保持 不变, 始终是 1个像素。  Table 1 The Uli view shown in Figure 3 (c) is from the pixel points [527, 430] to the 64x64 image area of [590, 493]. For every 15 pixels in the region, a total of 16 pixels are used for depth search using a fixed step size and an adaptive step size. Three searches with fixed search steps were performed in the fixed search range [2000, 5000] with steps of 20, 35, 50 respectively. In the determination of the adaptive search step size, the initial depth is 2817, the pixel offset is set to 32 pixels, the search precision is 1 pixel, and the depth initial value of the subsequent pixel is set as the depth estimation value of the adjacent pixel. According to the method for determining the adaptive search step size of the present invention, the search step corresponding to the search precision per unit pixel in the search range of pixels 32 [527, 430] deviating from the initial search pixel point can be obtained, such as a table. As shown in Fig. 2, the pixels are searched by these search steps as shown in Fig. 9. Table 2 shows that the step size along the direction in which the depth value decreases is negative, and as the pixel offset increases and the depth value decreases, the absolute value of the step gradually decreases; The step size in the large direction is positive, and as the pixel offset increases and the depth value increases, the absolute value of the step gradually increases. Figure 9 shows that when performing a depth search using the variable length search step of Table 2, the corresponding pixel search accuracy is guaranteed to remain constant, always 1 pixel.
像素偏 步长(深度值 步长(深度值 像素偏 步长(深度值 步长(深度值 移量 增大方向) 减小方向) 移量 增大方向) 减小方向)  Pixel step size (depth value step size (depth value pixel step size (depth value step size (depth value shift amount increase direction) decrease direction) shift amount increase direction) decrease direction)
1 11.4877 -11.2503 17 12.8909 -10.1000  1 11.4877 -11.2503 17 12.8909 -10.1000
2 11.5686 -11.1728 18 12.9870 -10.0340  2 11.5686 -11.1728 18 12.9870 -10.0340
3 11.6502 -11.0960 19 13.0842 -9.9687  3 11.6502 -11.0960 19 13.0842 -9.9687
4 11.7328 -11.0201 20 13.1824 -9.9041  4 11.7328 -11.0201 20 13.1824 -9.9041
5 11.8162 -10.9450 21 13.2818 -9.8400  5 11.8162 -10.9450 21 13.2818 -9.8400
6 11.9005 -10.8706 22 13.3823 -9.7766  6 11.9005 -10.8706 22 13.3823 -9.7766
7 11.9858 -10.7969 23 13.4840 -9.7138  7 11.9858 -10.7969 23 13.4840 -9.7138
8 12.0719 -10.7240 24 13.5868 -9.6515  8 12.0719 -10.7240 24 13.5868 -9.6515
9 12.1590 -10.6519 25 13.6908 -9.5899  9 12.1590 -10.6519 25 13.6908 -9.5899
10 12.2470 -10.5805 26 13.7961 -9.5289  10 12.2470 -10.5805 26 13.7961 -9.5289
11 12.3360 -10.5098 27 13.9025 -9.4684  11 12.3360 -10.5098 27 13.9025 -9.4684
12 12.4260 -10.4397 28 14.0101 -9.4086 13 12.5169 -10.3704 29 14.1191 -9.349312 12.4260 -10.4397 28 14.0101 -9.4086 13 12.5169 -10.3704 29 14.1191 -9.3493
14 12.6089 -10.3018 30 14.2293 -9.2905 14 12.6089 -10.3018 30 14.2293 -9.2905
15 12.7018 -10.2339 31 14.3407 -9.2323  15 12.7018 -10.2339 31 14.3407 -9.2323
16 12.7958 -10.1666 32 14.4535 -9.1746 表 2 利用基于块匹配的方法进行深度估计, 深度搜索结果图 10所示, 图中每个点表示 搜索得到的深度值下的合成块与实际块之间的绝对差, 该值越小通常代表深度估计值越 精确。 采用固定搜索步长时, 由于步长越小表示搜索的精度越高, 所以深度估计的效果 越好, 如步长 20mm搜索得到的深度值下得到的绝对差小于步长 35mm的绝对差, 而 35mm的绝对差小于 50mm。 但是, 自适应的搜索步长下搜索得到的深度值结果最好, 其对应的绝对差最小。 图 3(c)为视图 2 ( a) 中像素点 [527, 430]至 [590, 493]的图像区域内进行深度估计 的 16个像素点, 分别采用自适应搜索步长, 固定步长 20, 35, 50进行搜索。表 3表明, 采用自适应搜索步长时, 16个像素点都搜索到了正确的深度值,而采用固定步长时存在 错误的深度估计。 这是由于这几个像素点处于纹理缺乏的区域, 在大范围的固定搜索范 围内,搜索得到的绝对差最小点并不对应于正确的像素点。而采用自适应的搜索步长时, 由于初始值根据邻近信息确定, 像素偏移量可以设得较小, 即在相对较小的局部范围内 进行搜索, 降低了搜索到错误像素点的几率, 并且在保证了一定的深度光滑性。 表 3列 出了使用自适应搜索步长和固定搜索步长时的深度估计结果、深度搜索次数和错误估计 个数, 表中有边框的数据为错误数据。 表 3结果说明自适应搜索步长搜索时搜索次数少 且没有错误估计, 而固定搜索步长搜索时搜索次数多且还存在错误估计。 例如 32个像 素偏移量的自适应深度搜索中只要搜索 64个深度值,而 20mm的固定搜索步长在 [2000, 5000]的搜索范围内需要搜索 150个深度值。  16 12.7958 -10.1666 32 14.4535 -9.1746 Table 2 The depth estimation is performed by the block matching based method. The depth search result is shown in Fig. 10. Each point in the figure represents the absolute between the synthesized block and the actual block under the depth value obtained by the search. Poor, the smaller the value, the more accurate the depth estimate is. When using a fixed search step size, the smaller the step size, the higher the accuracy of the search, so the better the depth estimation is, the absolute difference obtained by the depth value obtained by searching for a step size of 20 mm is less than the absolute difference of 35 mm in steps. The absolute difference of 35mm is less than 50mm. However, the depth value obtained by the search under the adaptive search step is the best, and the corresponding absolute difference is the smallest. Figure 3(c) shows the 16 pixel points for depth estimation in the image area of the pixel points [527, 430] to [590, 493] in view 2 (a), using adaptive search step size, fixed step size 20 , 35, 50 to search. Table 3 shows that with adaptive search step size, 16 pixel points are searched for the correct depth value, while a fixed step size has an incorrect depth estimate. This is because these pixels are in an area where the texture is lacking. In a wide range of fixed search ranges, the minimum point of absolute difference in the search does not correspond to the correct pixel. When the adaptive search step is adopted, since the initial value is determined according to the neighboring information, the pixel offset can be set smaller, that is, searching in a relatively small local range, which reduces the probability of searching for the wrong pixel. And it guarantees a certain depth smoothness. Table 3 lists the depth estimation results, the number of deep search times, and the number of error estimates when using the adaptive search step size and the fixed search step size. The data with borders in the table is the error data. The results in Table 3 show that there are few search times in the adaptive search step search and there is no error estimation, and there are many search times in the fixed search step search and there are still error estimates. For example, in an adaptive depth search of 32 pixel offsets, only 64 depth values are searched, and a fixed search step size of 20 mm needs to search for 150 depth values in the search range of [2000, 5000].
Figure imgf000019_0001
3160 3180 3180 3200
Figure imgf000019_0001
3160 3180 3180 3200
固定搜索步 3160 3160 3160 2380  Fixed search step 3160 3160 3160 2380
150  150
长 (20mm) 3160 3160 3160 2300  Long (20mm) 3160 3160 3160 2300
3140 3160 3160 3160  3140 3160 3160 3160
3155 3190 3155 3190 3155 3190 3155 3190
固定搜索步 3155 3155 3155 3190  Fixed search step 3155 3155 3155 3190
86  86
长 (35 mm) 3155 3155 3155 2350  Long (35 mm) 3155 3155 3155 2350
3120 3155 3155 3120  3120 3155 3155 3120
3150 3200 3150 3230 3150 3200 3150 3230
固定搜索步 3150 3150 3150 2350  Fixed search step 3150 3150 3150 2350
60  60
长 (50mm) 3150 3150 3150 2300  Long (50mm) 3150 3150 3150 2300
3150 3150 3150 3150 表 3 通过表 3和图 10的结果, 得出结论: 自适应搜索步长的深度搜索性能上高于固定 搜索步长, 即利用估计的深度值合成的图像块与参考图像块的绝对差小, 错误估计少,  3150 3150 3150 3150 Table 3 From the results of Table 3 and Figure 10, it is concluded that the depth search performance of the adaptive search step is higher than the fixed search step size, that is, the image block and the reference image block synthesized by the estimated depth value. The absolute difference is small, the error estimate is small,

Claims

权 利 要 求 书 Claim
1、 多视角视频图像深度搜索方法, 其特征在于, 在深度搜索范围内每一步的搜索 步长根据当前深度值动态调整, 当前深度值越小, 采用的搜索步长越小; 当前深度值越 大, 采用的搜索步长越大, 使得每一步的搜索步长对应于相同的像素搜索精度。  The multi-view video image depth search method is characterized in that: the search step length of each step in the depth search range is dynamically adjusted according to the current depth value, and the smaller the current depth value is, the smaller the search step size is adopted; the current depth value is more Large, the search step size used is larger, so that the search step size of each step corresponds to the same pixel search precision.
2、 如权利要求 1所述多视角视频图像深度搜索方法, 其特征在于, 根据深度变化 值和像素偏移向量的关系,把所述深度搜索范围和搜索步长的确定转化为像素搜索范围 和像素搜索精度的确定。 2. The multi-view video image depth search method according to claim 1, wherein the determination of the depth search range and the search step length is converted into a pixel search range according to a relationship between a depth change value and a pixel offset vector. Determination of pixel search accuracy.
3、 如权利要求 2所述多视角视频图像深度搜索方法, 其特征在于, 所述像素搜索 精度等于搜索中每一次像素偏移向量的长度,所述像素搜索精度为分像素精度或整像素 精度。 3. The multi-view video image depth search method according to claim 2, wherein the pixel search accuracy is equal to the length of each pixel offset vector in the search, and the pixel search precision is sub-pixel precision or integer pixel precision. .
4、 如权利要求 2所述多视角视频图像深度搜索方法, 其特征在于, 所述搜索步长 等于搜索中每一次像素偏移向量所对应的深度变化值。 4. The multi-view video image depth search method according to claim 2, wherein the search step size is equal to a depth change value corresponding to each pixel offset vector in the search.
5、 如权利要求 2所述多视角视频图像深度搜索方法, 其特征在于, 所述搜索步长 由当前深度值、 像素偏移向量和照相机内外部参数确定。 5. The multi-view video image depth search method according to claim 2, wherein the search step size is determined by a current depth value, a pixel offset vector, and a camera internal/external parameter.
6、 如权利要求 5所述多视角视频图像深度搜索方法, 其特征在于, 搜索步长由以 下公式得到:
Figure imgf000021_0001
其中: P是目标视图中待深度估计的像素点, z为像素点 P的当前深度值, Δζ为像素 点 Ρ的深度变化值即搜索步长, APr为目标视图中像素点 P的深度变化值 Δζ在参考视图 r 中对应的像素偏移向量, II ΔΡΓ II 2= ΔΡΓ Τ · ΔΡΓ, ; Br = 4 — ^^― 1禾 =4 —1是 3 x 3的 矩阵, △ = t _tT是三维向量; 其中, R为目标视角的相机坐标系相对于世界坐标系的 三维旋转矩阵; t为目标视角的相机坐标系相对于世界坐标系的平移向量; A为目标视 角的相机内部参数矩阵; ¾为参考视角的相机坐标系相对于世界坐标系的三维旋转矩 阵; ^为参考视角的相机坐标系相对于世界坐标系的平移向量; Ατ为参考视角的相机内 部参数矩阵; b3和 c3分别是矩阵 B 的第三行向量。
6. The multi-view video image depth search method according to claim 5, wherein the search step size is obtained by the following formula:
Figure imgf000021_0001
Where: P is the pixel to be depth estimated in the target view, z is the current depth value of the pixel point P, Δζ is the depth change value of the pixel point 即, that is, the search step size, and AP r is the depth change of the pixel point P in the target view The value Δζ corresponds to the corresponding pixel offset vector in the view r, II ΔΡ Γ II 2 = ΔΡ Γ Τ · ΔΡ Γ , ; B r = 4 — ^^― 1禾=4 — 1 is a matrix of 3 x 3, △ = t _t T is a three-dimensional vector; where R is the three-dimensional rotation matrix of the camera coordinate system of the target perspective relative to the world coordinate system; t is the translation vector of the camera coordinate system of the target perspective relative to the world coordinate system; A is the target perspective Camera internal parameter matrix; 3⁄4 is the three-dimensional rotation matrix of the camera coordinate system with reference to the world coordinate system; ^ is the translation vector of the camera coordinate system with reference to the world coordinate system; Α τ is the camera internal parameter matrix of the reference angle of view ; b 3 and c 3 are the third row vectors of matrix B, respectively.
7、 如权利要求 6所述多视角视频图像深度搜索方法, 其特征在于, 所述参考视图 中的像素偏移向量满足目标视角和参考视角的极线约束方程:
Figure imgf000021_0002
其 中, Ρ是目标视图中的像素点, Δ 为参考视图中的像素偏移向量。
7. The multi-view video image depth search method according to claim 6, wherein the pixel offset vector in the reference view satisfies a polar constraint equation of the target view and the reference view:
Figure imgf000021_0002
Where Ρ is the pixel in the target view and Δ is the pixel offset vector in the reference view.
8、 如权利要求 7所述多视角视频图像深度搜索方法, 其特征在于, 存在两个互为 相反方向的所述像素偏移向量满足所述的极线约束方程,所述 2个像素偏移向量分别对 应深度值增大方向和深度值减小方向; 深度值增大方向的偏移向量所对应的深度变化值 大于深度值减小方向的偏移向量所对应的深度变化值。 8. The multi-view video image depth search method according to claim 7, wherein the two pixel offset vectors having mutually opposite directions satisfy the polar line constraint equation, and the two pixel offsets are The vector corresponds to the depth value increasing direction and the depth value decreasing direction respectively; the depth change value corresponding to the offset vector in the depth value increasing direction is greater than the depth change value corresponding to the offset vector in the depth value decreasing direction.
9、 如权利要求 6所述多视角视频图像深度搜索方法, 其特征在于, 在平行相机系 统中, 所述深度变化值与当前深度值的平方成正比。 9. The multi-view video image depth search method according to claim 6, wherein in the parallel camera system, the depth change value is proportional to a square of a current depth value.
10、 多视角视频图像的深度估计方法, 其特征在于, 在利用基于深度的视图合成和 基于块配的深度搜索中, 目标视图的深度搜索范围和搜索步长由参考视图的像素搜索范 围和像素搜索精度决定; 在深度搜索范围内, 每一步的搜索步长根据当前深度值动态调 整, 当前深度值越小, 采用的搜索步长越小, 当前深度值越大, 采用的搜索步长越大, 使得每一步的搜索步长对应于相同的像素搜索精度。 10. A depth estimation method for a multi-view video image, characterized in that in depth-based view synthesis and block-based depth search, a depth search range and a search step size of a target view are determined by a pixel search range and a pixel of a reference view. The search accuracy is determined. In the depth search range, the search step size of each step is dynamically adjusted according to the current depth value. The smaller the current depth value is, the smaller the search step size is, and the larger the current depth value is, the larger the search step size is. , so that the search step size of each step corresponds to the same pixel search precision.
11、 如权利要求 10所述多视角视频图像深度搜索方法, 其特征在于, 所述像素搜 索精度等于搜索中每一次像素偏移向量的长度,所述搜索步长等于搜索中每一次像素偏 移向量所对应的深度变化值。 The multi-view video image depth search method according to claim 10, wherein the pixel search precision is equal to the length of each pixel offset vector in the search, and the search step size is equal to each pixel offset in the search. The depth change value corresponding to the vector.
12、 如权利要求 10所述多视角视频图像的深度估计方法, 其特征在于, 所述基于 深度的视图合成和基于块配的深度搜索具体为, 利用当前深度值进行视图合成, 并计算 合成视图的像素块与参考视图的像素块之间的误差; 采用最小误差对应的深度值为目标 视图的深度估计值。 The depth estimation method of the multi-view video image according to claim 10, wherein the depth-based view synthesis and the block-based depth search are specifically: performing view synthesis using the current depth value, and calculating the composite view The error between the pixel block and the pixel block of the reference view; the depth value corresponding to the minimum error is the depth estimate of the target view.
13、 如权利要求 12所述多视角视频图像的深度估计方法, 其特征在于, 包括以下 步骤: 步骤 1 估计目标视图中的深度搜索初始值 ¾=。; 步骤 2 确定深度搜索对应于参考视图中的像素搜索范围和像素搜索精度,根据像素 搜索精度得到参考视图中像素偏移向量 ΔΡ^ 步骤 3 根据当前深度值 ¾和像素偏移向量 Δ , 得到对应的深度变化值 Δ¾, 所述 深度变化值△¾即为下一步搜索步长; 步骤 4 利用当前深度值 进行视图合成, 并计算合成视图的像素块与参考视图的像 素块之间的误差 ek; 步骤 4 更新当前深度值 ¾=¾+△¾; k=k+l ; 步骤 5 判断是否超过给定的像素搜索范围, 如是进入步骤 6, 如否, 进入步骤 3; 步骤 6 以误差 ek(k=0,…… ,Ν-1, Ν为搜索总步数) 中最小误差对应的深度值为估计 值。 13. The depth estimation method for a multi-view video image according to claim 12, comprising the steps of: Step 1 estimating a depth search initial value in the target view. Step 2 Determine that the depth search corresponds to the pixel search range and the pixel search precision in the reference view, and obtain the pixel offset vector ΔΡ^ in the reference view according to the pixel search precision. Step 3 According to the current depth value 3⁄4 and the pixel offset vector Δ, the corresponding correspondence is obtained. The depth change value Δ3⁄4, the depth change value Δ3⁄4 is the next search step; Step 4 uses the current depth value for view synthesis, and calculates the error e k between the pixel block of the composite view and the pixel block of the reference view ; step 4 to update the current depth value ¾ = ¾ + △ ¾; k = k + l; Step 5: Determine whether the given pixel search range is exceeded. If yes, go to step 6. If no, go to step 3. Step 6 with error e k (k=0,...,Ν-1, Ν is the total number of search steps) The depth value corresponding to the minimum error is an estimated value.
14、 如权利要求 13所述多视角视频图像的深度估计方法, 其特征在于, 所述误差 ek为合成视图的像素块与参考视图的像素块之间的绝对差或平方差。 14. The depth estimation method of a multi-view video image according to claim 13, wherein the error ek is an absolute difference or a square difference between a pixel block of the composite view and a pixel block of the reference view.
15、 如权利要求 13所述多视角视频图像的深度估计方法, 其特征在于, 对于汇聚 相机系统, 所述步骤 1中, 以汇聚相机系统的汇聚点所在深度作为目标视图中的深度搜 索初始值 ¾。 The depth estimation method of the multi-view video image according to claim 13, wherein, in the convergence camera system, in the step 1, the depth of the convergence point of the convergence camera system is used as the depth search initial value in the target view. 3⁄4.
16、 如权利要求 15所述多视角视频图像的深度估计方法, 其特征在于, 所述的汇 聚相机系统的汇聚点, 通过以下线性方程组求解得到: The method for estimating a depth of a multi-view video image according to claim 15, wherein the convergence point of the convergence camera system is solved by the following linear equations:
^[0,0,ζ0] + ί = ^[0,0,ζΓΐ] + ^ ^[0,0,ζ 0 ] + ί = ^[0,0,ζ Γΐ ] + ^
^[0,0,ζ0] + ί = ^[0,0,ζΓ2] + ί2 ^[0,0,ζ 0 ] + ί = ^[0,0,ζ Γ2 ] + ί 2
R[0, 0,z0] + t = RJO, ,zim] + t, 其中 ζ。为汇聚点在目标视图的相机坐标系中的深度值, zTi(i=l,……,m)为汇聚点在 参考视图 i的相机坐标系中的深度值, m为参考视图的个数。 R[0, 0, z 0 ] + t = RJO, , z im ] + t, where ζ. For the depth value of the convergence point in the camera coordinate system of the target view, z Ti (i=l,...,m) is the depth value of the convergence point in the camera coordinate system of the reference view i, and m is the number of reference views .
17、 如权利要求 13所述多视角视频图像的深度估计方法, 其特征在于, 对于平行 相机系统, 所述步骤 1中, 深度搜索初始值 z。通过全局视差与深度成反比的关系得到: z0 = ^-; 其中, z。为深度初始值, d为全局视差, f为相机的焦距, B为相机的基线长度。 17. The depth estimation method for a multi-view video image according to claim 13, wherein, for the parallel camera system, in the step 1, the depth is searched for an initial value z. It is obtained by inversely proportional to the global parallax and depth: z 0 = ^-; where z. For the depth initial value, d is the global disparity, f is the focal length of the camera, and B is the baseline length of the camera.
a  a
18、 如权利要求 17所述多视角视频图像的深度估计方法, 其特征在于, 所述全局 视差为平移后的参考视图与目标视图的绝对差最小的像素偏移向量。  18. The depth estimation method of a multi-view video image according to claim 17, wherein the global disparity is a pixel offset vector having a minimum absolute difference between the translated reference view and the target view.
19、 如权利要求 13所述多视角视频图像的深度估计方法, 其特征在于, 深度变化 值 由以下公式得到:
Figure imgf000023_0001
其中: Ρ是目标视图中待深度估计的像素点, ζ为像素点 Ρ的当前深度值, Δζ为像素 点 Ρ的深度变化值即搜索步长, APr为目标视图中像素点 P的深度变化值 Δζ在参考视图 r 中对应的像素偏移向量, II ΔΡΓ II 2= ΔΡΓ τ · ΔΡΓ, ; Br =4 Ji4- 1禾 PC 1是 3X3 的矩阵, △ = t _tT是三维向量; 其中, R为目标视角的相机坐标系相对于世界坐标系 的三维旋转矩阵; t为目标视角的相机坐标系相对于世界坐标系的平移向量; A为目标 视角的相机内部参数矩阵; ¾为参考视角的相机坐标系相对于世界坐标系的三维旋转矩 阵; ^为参考视角的相机坐标系相对于世界坐标系的平移向量; Ατ为参考视角的相机内 部参数矩阵; b3和 c3分别是矩阵 B 的第三行向量。
19. The depth estimation method of a multi-view video image according to claim 13, wherein the depth change value is obtained by the following formula:
Figure imgf000023_0001
Where: Ρ is the pixel to be depth estimated in the target view, ζ is the current depth value of the pixel Ρ, Δζ is the depth change value of the pixel Ρ, ie the search step, and AP r is the depth change of the pixel P in the target view The value Δζ corresponds to the corresponding pixel offset vector in the view r, II ΔΡ Γ II 2 = ΔΡ Γ τ · ΔΡ Γ , ; B r =4 Ji4- 1 and PC 1 is 3X3 The matrix, Δ = t _t T is a three-dimensional vector; where R is the three-dimensional rotation matrix of the camera coordinate system of the target perspective relative to the world coordinate system; t is the translation vector of the camera coordinate system of the target perspective relative to the world coordinate system; The internal parameter matrix of the camera for the target angle of view; 3⁄4 is the three-dimensional rotation matrix of the camera coordinate system with reference to the world coordinate system; ^ is the translation vector of the camera coordinate system with reference to the world coordinate system; Α τ is the reference angle of view The camera internal parameter matrix; b 3 and c 3 are the third row vectors of matrix B, respectively.
20、 如权利要求 19所述多视角视频图像的深度估计方法, 其特征在于, 所述参考 视图中的像素偏移向量 ΔΡ^¾足目标视角和参考视角的极线约束方程: 20. The depth estimation method of a multi-view video image according to claim 19, wherein the pixel offset vector ΔΡ^3⁄4 in the reference view is a polar constraint equation of the target viewing angle and the reference viewing angle:
APr T (C Atr x Br)P = 0 , 其中, P是目标视图中的像素点, △ ^为参考视图中的像素偏移向 AP r T (C At r x B r )P = 0 , where P is the pixel point in the target view, Δ ^ is the pixel offset direction in the reference view
21、 如权利要求 20所述多视角视频图像的深度估计方法, 其特征在于, 存在两 个互为相反方向的所述像素偏移向量满足所述的极线约束方程,所述 2个像素偏移向量 分别对应深度值增大方向、 深度值减小方向; 深度值增大方向的偏移向量所对应的深度 变化值大于深度值减小方向的偏移向量所对应的深度变化值对应于两个互为相反方向 的像素偏移向量。 The method for estimating a depth of a multi-view video image according to claim 20, wherein the two pixel offset vectors that are opposite to each other satisfy the polar line constraint equation, and the two pixel offsets The shift vector corresponds to the depth value increasing direction and the depth value decreasing direction respectively; the depth change value corresponding to the depth vector increasing direction is larger than the depth value corresponding to the depth value decreasing direction A pixel offset vector that is opposite to each other.
PCT/CN2008/072141 2008-02-03 2008-08-26 Depth searching method and depth estimating method for multi-viewing angle video image WO2009097714A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810300330A CN100592338C (en) 2008-02-03 2008-02-03 Multi-visual angle video image depth detecting method and depth estimating method
CN200810300330.7 2008-02-03

Publications (1)

Publication Number Publication Date
WO2009097714A1 true WO2009097714A1 (en) 2009-08-13

Family

ID=39898199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072141 WO2009097714A1 (en) 2008-02-03 2008-08-26 Depth searching method and depth estimating method for multi-viewing angle video image

Country Status (2)

Country Link
CN (1) CN100592338C (en)
WO (1) WO2009097714A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9747524B2 (en) 2014-02-28 2017-08-29 Ricoh Company, Ltd. Disparity value deriving device, equipment control system, movable apparatus, and robot
CN110162098A (en) * 2019-07-03 2019-08-23 安徽理工大学 A kind of mining unmanned plane
CN111179327A (en) * 2019-12-30 2020-05-19 青岛联合创智科技有限公司 Depth map calculation method
CN113240785A (en) * 2021-04-13 2021-08-10 西安电子科技大学 Multi-camera combined rapid ray tracing method, system and application
CN113538318A (en) * 2021-08-24 2021-10-22 北京奇艺世纪科技有限公司 Image processing method, image processing device, terminal device and readable storage medium
CN113643414A (en) * 2020-05-11 2021-11-12 北京达佳互联信息技术有限公司 Three-dimensional image generation method and device, electronic equipment and storage medium
WO2022267275A1 (en) * 2021-06-25 2022-12-29 北京市商汤科技开发有限公司 Depth detection method, apparatus and device, storage medium, computer program and product

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100592338C (en) * 2008-02-03 2010-02-24 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method
JP5713624B2 (en) 2009-11-12 2015-05-07 キヤノン株式会社 3D measurement method
CN101710423B (en) * 2009-12-07 2012-01-04 青岛海信网络科技股份有限公司 Matching search method for stereo image
KR101640404B1 (en) 2010-09-20 2016-07-18 엘지전자 주식회사 Mobile terminal and operation control method thereof
US9524556B2 (en) 2014-05-20 2016-12-20 Nokia Technologies Oy Method, apparatus and computer program product for depth estimation
TWI528783B (en) * 2014-07-21 2016-04-01 由田新技股份有限公司 Methods and systems for generating depth images and related computer products
CN109325992B (en) * 2018-10-19 2023-07-04 珠海金山数字网络科技有限公司 Image drawing method and device, computing device and storage medium
CN111476835B (en) * 2020-05-21 2021-08-10 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN113112551B (en) * 2021-04-21 2023-12-19 阿波罗智联(北京)科技有限公司 Camera parameter determining method and device, road side equipment and cloud control platform
CN113486928B (en) * 2021-06-16 2022-04-12 武汉大学 Multi-view image alignment method based on rational polynomial model differentiable tensor expression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1153362A (en) * 1995-03-29 1997-07-02 三洋电机株式会社 Methods for creating image for three-dimensional display, for calculating depth information, and for image processing using depth information
US20050163366A1 (en) * 2000-05-04 2005-07-28 Microsoft Corporation System and method for progressive stereo matching of digital images
CN101231754A (en) * 2008-02-03 2008-07-30 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6046763A (en) * 1997-04-11 2000-04-04 Nec Research Institute, Inc. Maximum flow method for stereo correspondence
JP2004534336A (en) * 2001-07-06 2004-11-11 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ MOTION OR DEPTH ESTIMATION METHOD AND ESTIMATION UNIT, AND IMAGE PROCESSING APPARATUS HAVING SUCH MOTION ESTIMATION UNIT
CN1851752A (en) * 2006-03-30 2006-10-25 东南大学 Dual video camera calibrating method for three-dimensional reconfiguration system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1153362A (en) * 1995-03-29 1997-07-02 三洋电机株式会社 Methods for creating image for three-dimensional display, for calculating depth information, and for image processing using depth information
US20050163366A1 (en) * 2000-05-04 2005-07-28 Microsoft Corporation System and method for progressive stereo matching of digital images
CN101231754A (en) * 2008-02-03 2008-07-30 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9747524B2 (en) 2014-02-28 2017-08-29 Ricoh Company, Ltd. Disparity value deriving device, equipment control system, movable apparatus, and robot
CN110162098A (en) * 2019-07-03 2019-08-23 安徽理工大学 A kind of mining unmanned plane
CN111179327A (en) * 2019-12-30 2020-05-19 青岛联合创智科技有限公司 Depth map calculation method
CN111179327B (en) * 2019-12-30 2023-04-25 青岛联合创智科技有限公司 Depth map calculation method
CN113643414A (en) * 2020-05-11 2021-11-12 北京达佳互联信息技术有限公司 Three-dimensional image generation method and device, electronic equipment and storage medium
CN113643414B (en) * 2020-05-11 2024-02-06 北京达佳互联信息技术有限公司 Three-dimensional image generation method and device, electronic equipment and storage medium
CN113240785A (en) * 2021-04-13 2021-08-10 西安电子科技大学 Multi-camera combined rapid ray tracing method, system and application
CN113240785B (en) * 2021-04-13 2024-03-29 西安电子科技大学 Multi-camera combined rapid ray tracing method, system and application
WO2022267275A1 (en) * 2021-06-25 2022-12-29 北京市商汤科技开发有限公司 Depth detection method, apparatus and device, storage medium, computer program and product
CN113538318A (en) * 2021-08-24 2021-10-22 北京奇艺世纪科技有限公司 Image processing method, image processing device, terminal device and readable storage medium
CN113538318B (en) * 2021-08-24 2023-12-15 北京奇艺世纪科技有限公司 Image processing method, device, terminal equipment and readable storage medium

Also Published As

Publication number Publication date
CN101231754A (en) 2008-07-30
CN100592338C (en) 2010-02-24

Similar Documents

Publication Publication Date Title
WO2009097714A1 (en) Depth searching method and depth estimating method for multi-viewing angle video image
Li et al. Hole filling with multiple reference views in DIBR view synthesis
Daribo et al. A novel inpainting-based layered depth video for 3DTV
Pollefeys et al. A simple and efficient rectification method for general motion
Daribo et al. Depth-aided image inpainting for novel view synthesis
US8116557B2 (en) 3D image processing apparatus and method
US7944444B2 (en) 3D image processing apparatus and method
KR100755450B1 (en) 3d reconstruction apparatus and method using the planar homography
US20020106120A1 (en) Method of analyzing in real time the correspondence of image characteristics in corresponding video images
US20110050853A1 (en) Method and system for converting 2d image data to stereoscopic image data
JP2018519697A (en) A method for synthesizing a light field in which omnidirectional parallax is compressed using depth information
WO2018188277A1 (en) Sight correction method and device, intelligent conference terminal and storage medium
JP6173218B2 (en) Multi-view rendering apparatus and method using background pixel expansion and background-first patch matching
Zhang et al. Stereoscopic video synthesis from a monocular video
JP3561446B2 (en) Image generation method and apparatus
Knorr et al. An image-based rendering (ibr) approach for realistic stereo view synthesis of tv broadcast based on structure from motion
Knorr et al. Stereoscopic 3D from 2D video with super-resolution capability
Jantet et al. Joint projection filling method for occlusion handling in depth-image-based rendering
JP4605716B2 (en) Multi-view image compression encoding method, apparatus, and program
Wang et al. Block-based depth maps interpolation for efficient multiview content generation
Knorr et al. From 2D-to stereo-to multi-view video
KR100655465B1 (en) Method for real-time intermediate scene interpolation
US20130229408A1 (en) Apparatus and method for efficient viewer-centric depth adjustment based on virtual fronto-parallel planar projection in stereoscopic images
Gurrieri et al. Efficient panoramic sampling of real-world environments for image-based stereoscopic telepresence
Robert et al. Disparity-compensated view synthesis for s3D content correction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08784131

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08784131

Country of ref document: EP

Kind code of ref document: A1