WO2005052858A1 - Determination d'informations de profondeur pour image video - Google Patents

Determination d'informations de profondeur pour image video Download PDF

Info

Publication number
WO2005052858A1
WO2005052858A1 PCT/IB2004/052501 IB2004052501W WO2005052858A1 WO 2005052858 A1 WO2005052858 A1 WO 2005052858A1 IB 2004052501 W IB2004052501 W IB 2004052501W WO 2005052858 A1 WO2005052858 A1 WO 2005052858A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
junction
determining
depth information
image
Prior art date
Application number
PCT/IB2004/052501
Other languages
English (en)
Inventor
Christiaan Varekamp
Fabian E. Ernst
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2005052858A1 publication Critical patent/WO2005052858A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/543Depth or shape recovery from line drawings

Definitions

  • the invention relates to a method and apparatus for determining depth information for a video image and in particular to determination of depth information by video image segmentation.
  • 3D information may be used for enhancing object grasping and video compression for video signals.
  • 3DTV three dimensional video or television
  • 3DTV is promising as a means for enhancing the user experience of the presentation of visual content and 3DTV could potentially be as significant as the introduction of color TV.
  • the most commercially interesting 3DTV systems are based on re-use of existing 2D video infrastructure thereby allowing for minimal cost and compatibility problems associated with a gradual roll out. For these systems conventional 2D video is distributed and is converted to 3D video at the location of the consumer.
  • the 2D-to-3D conversion process adds (depth) structure to 2D video and may also be used for video compression.
  • depth the conversion of 2D video into video comprising 3D information is a major image processing challenge. Consequently, significant research has been undertaken in this area and a number of algorithms and approaches have been suggested for extracting 3D information from 2D images.
  • Known methods for deriving depth or occlusion relations from monoscopic video comprise the structure from motion approach and the dynamic occlusion approach.
  • points of an object are tracked as the object moves and are used to derive a 3D model of the object.
  • the 3D model is determined as that which would most closely result in the observed movement of the tracked points.
  • the dynamic occlusion approach utilizes the fact that as different objects move within the image, the occlusion (i.e. the overlap of one object over another in a 2D image) provides information indicative of the relative depth of the objects.
  • structure from motion requires the presence of camera motion and cannot deal with independently moving objects (non-static scene).
  • both approaches rely on the existence of moving objects and fail in situations where there is very little or no apparent motion in the video sequence.
  • a depth cue which may provide static information is the T-junction which may correspond to an intersection between objects.
  • T-junctions as a depth cue for vision
  • computational methods for detecting T-junctions in video and use of T-junctions for automatic depth extraction have had very limited success so far.
  • Previous research into the use of T-junctions has mainly focused on the T- junction detection task and example of schemes for detecting T-junctions are given in "Filtering, Segmentation and Depth" by M. Nitzberg, D. Mumford and T. Shiota, 1991. Lecture Notes in Computer Science 662. Springer- Verlag, Berlin; "Steerable-scalable kernels for edge detection and junction analysis” by P. Perona, 1992. 2nd European Conference of Computer Vision pages 3-18 Image and Vision Computing, vol. 10, pag. 663-672 and
  • the described method is rather complex and requires significant computational resources. Furthermore, since the disclosed method is contour based, it is not easily integrated with other region based depth cues such as depth from motion and depth from dynamic occlusion. Furthermore, although T-junctions may be suitable for determining depth information, there are generally only relatively few T-junctions to base the determination of depth information on, and therefore the determined depth information may frequently be limited or less reliable. Hence, an improved system for determining depth information would be advantageous and in particular a system allowing for increased reliability and/or an increased amount of depth information would be advantageous.
  • a method of determining depth information for a video image comprising the steps of: segmenting the video image to determine a plurality of image segments; determining a plurality of junction points associated with the plurality of image segments; and for at least a first junction point of the plurality of image segments deriving a depth characteristic by performing the steps of: determining first depth information data for the first junction point in response to a first object association assumption, determining second depth information data for the first junction point in response to a second object association assumption, and determining the depth characteristic for the first junction point in response to the first depth information data and the second depth information data.
  • the junction points may specifically be points corresponding to junctions between edges of the image segments and may in particular be segmentation T-junctions where three image segments meet.
  • the object association assumptions are in particular related to assumptions of an association between image segments and underlying objects in the video image.
  • the first object association assumption may be an assumption that three image segments associated with a junction point belong to three different objects whereas the second object association assumption may be an assumption that two or three of the image segments belong to the same object.
  • the invention may allow additional and/or improved depth information to be derived.
  • depth information is not limited to only consider object T-junctions but may also include information derived from other junction points.
  • object T- junctions correspond to a relatively low proportion of the total number of junction points determined by segmentation.
  • a depth characteristic may be determined for a first junction points whether this is an object T-junction or not.
  • the invention may further allow an improved reliability of the depth information of the depth characteristic.
  • the probability of the image segments corresponding to different objects may be included.
  • the depth characteristic may comprise depth information related to both object association assumptions.
  • the depth characteristic may comprise depth information corresponding to the first junction point being an object T-junction as well as depth information corresponding to the first junction point being on a surface of the same object.
  • the depth characteristic in this case further comprises a probability indication of the junction point being an object T-junction and of the junction point being on the surface of an objection.
  • the step of determining a depth characteristic may comprise further steps of determining depth information data in response to other object association assumptions and this depth information data may be used in determining the depth characteristic.
  • the first object association assumption is an object T-junction assumption.
  • the depth information is determined assuming the first junction point is an object T-junction where three objects of the image meet. An object T-junction provides advantageous occlusion and thus depth information.
  • the second object association assumption is an object edge assumption.
  • the depth information is determined assuming the first junction point is an object edge junction where two objects meet.
  • An object edge junction may provide depth information and may specifically indicate that some image segments are at the same relative depth level whereas a third image segment is at a different depth level (which may be above or below the other two image segments).
  • the second object association assumption is an internal object junction assumption.
  • the depth information is determined assuming the first junction point is internal object junction edge where all image segments belong to a surface of an object.
  • the internal object junction may provide depth information and may specifically indicate that some or all associated image segments are at the same relative depth level.
  • the first object junction assumption may be an object edge assumption when the second object association assumption is an internal object junction assumption.
  • the first object association assumption is an object T-junction assumption
  • the second object association assumption is an object edge assumption
  • the step of deriving the depth characteristic comprises determining third depth information data for the first junction point in response to an internal object junction assumption and the step of determining the depth characteristic is further in response to the third depth information data.
  • depth information data is determined assuming that the first junction point is associated with one, two or three objects and the depth characteristic is determined in response to these three sets of data.
  • the depth characteristic is determined in response to the likelihood of the individual assumptions. This allows enhanced or additional depth information to be determined.
  • the depth information data comprises a relative depth of at least two image segments associated with the first junction point. This allows for a simple depth determination which may follow directly from the object association assumption. It may further provide useful depth information that may be used in determining relative depth information of objects in the video image.
  • the step of deriving the depth characteristic is repeated for a plurality of junction points associated with an image segment pair and the method further comprises the step of determining depth information for the image segment pair in response to the depth characteristics. For example, two image segments may be selected and a depth characteristic may be determined for each junction point that is associated with both of these image segments.
  • the step of deriving the depth characteristic comprises the step of determining a likelihood indication of the first object association assumption and the second object association assumption and the step of determining the depth characteristic comprises selecting between the first depth information data and the second depth information data in response to the likelihood indication.
  • the depth characteristic may correspond to depth information related to an object T-junction, an object edge junction or an internal object junction depending on which of these possibilities is the most likely.
  • the likelihood indication is determined in response to a fit of a first model associated with the first object association assumption and a fit of a second model associated with the second object association assumption. This provides for a simple and accurate determination of the likelihood indication.
  • a T-junction model, an object edge model and an internal object model may be fitted to the junction point and the depth information data corresponding to the model that can be fitted with the lowest error in accordance with a suitable error function is selected.
  • the depth characteristic comprises a reliability indication of depth information comprised in the depth characteristic.
  • the depth indications may be used dependent on the reliability of the depth indications. For example, if depth information related to an image segment pair is determined from a plurality of junction points, the reliability of the depth information of each junction may be used in the combination. For example, more reliable depth indications may be weighted higher than less reliable depth indications.
  • the reliability indication may be determined in any suitable way such as for example in response to how close a fit to a given model can be achieved.
  • the step of determining the depth characteristic comprises performing a weighted averaging of the first and second depth information data. This may allow increased flexibility of depth information algorithms and may allow improved depth information that more accurately reflects the uncertainty of the determination.
  • the step of deriving the depth characteristic is iterated for a plurality of junction points to create a plurality of depth characteristics and further comprises the step of determining a depth map of the video image in response to the plurality of depth characteristics.
  • a depth map of the entire video image may be obtained by analyzing depth characteristics for a plurality of junction points and preferably for a large number of junction points including object T-junctions, object edge junctions and internal object junctions.
  • the step of determining the depth map comprises: assigning initial assigned depth levels to the plurality of image segments; determining depth relations associated with each image segment pair of the plurality of image segments in response to the plurality of depth characteristics; determining an error function dependent on the depth relations and the assigned depth levels; adjusting the assigned depth levels such as to minimize the error function; and determining the depth map in response to the assigned depth levels.
  • This approach provides a simple to implement, reliable, high performance and efficient way of determining a depth map based on depth characteristics from a plurality of junction points.
  • the step of determining the error function and the step of adjusting the assigned depth levels are repeated. This provides for improved accuracy of the depth map.
  • the error function comprises a weighting of each image segment pair.
  • This provides for improved accuracy of the depth map.
  • the weighting may be in response to the reliability of the determined depth indications for the image segment pair.
  • an apparatus for determining depth information for a video image comprising: means for segmenting the video image to determine a plurality of image segments; means for determining a plurality of junction points associated with the plurality of image segments; and means for, for at least a first junction point of the plurality of image segments, deriving a depth characteristic further comprising: means for determining first depth information data for the first junction point in response to a first object association assumption, means for determining second depth information data for the first junction point in response to a second object association assumption, and means for determining the depth characteristic for the first junction point in response to the first depth information data and the second information data.
  • Fig. 1 illustrates an example of a T-junction in an image
  • Fig. 2 illustrates an apparatus for determining depth information suitable for an embodiment of the invention
  • Fig. 3 illustrates a method of determining depth information of at least one image suitable for an embodiment of the invention
  • Fig. 4 illustrates an example of a relation between the placement of edge points and the segmentation matrix in accordance with an embodiment of the invention
  • Fig. 5 illustrates an example of image segmentation in accordance with an embodiment of the invention
  • FIG. 6 illustrates an apparatus for determining depth information for a video image in accordance with an embodiment of the invention
  • Fig. 7 illustrates a flow chart for a method of determining depth information for a video image in accordance with an embodiment of the invention
  • Fig. 8 illustrates a flow chart for determining a depth map in accordance with an embodiment of the invention.
  • Fig. 1 illustrates an example of an object T-junction in an image.
  • the image comprises a first rectangle 101 and a second rectangle 103.
  • the first rectangle 101 overlaps the second rectangle 103 and accordingly edges of the objects form an intersection known as a T-junction 105.
  • a first edge 107 of the second rectangle 103 is cut short by a second edge 109 of the first rectangle.
  • the first edge 107 forms a stem 111 of the T-junction 105
  • the second edge 109 forms a top 113 of the T-junction.
  • the T-junction 105 is the point in the image plane where the object edges 107, 109 form a "T" by one edge 107 terminating on a second edge 109.
  • Humans are capable of identifying that some objects are nearer than others just by the presence of T-junctions. In the example of Fig.
  • FIG. 2 illustrates an apparatus 200 for determining depth information for at least one image in accordance with a preferred embodiment of the invention.
  • the apparatus 200 comprises a segmentation processor 201 which receives one or more images for which depth information is to be provided.
  • the segmentation processor 201 is operable to segment the at least one image into a plurality of image segments.
  • the segmentation processor 201 is coupled to a junction extraction processor
  • the junction extraction processor 203 is operable to determine at least one junction associated with overlapping objects of the at least one image in response to the plurality of image segments. In the preferred embodiment, the junction extraction processor 203 determines a plurality of T-junctions corresponding to intersections of edges between the segments.
  • the junction extraction processor 203 is coupled to a depth information processor 205 which receives characteristics of the detected junctions from the junction extraction processor 203 and image segmentation information from the segmentation processor 201.
  • the depth information processor 205 is operable to determine depth information associated with objects of the at least one image in response to the at least one junction.
  • the depth information processor 205 generates a depth map for the image in response to a plurality of T-junctions determined by the junction extraction processor 203.
  • Fig. 3 illustrates a method of determining depth information of at least one image in accordance with an embodiment of the invention. The method is applicable to the apparatus of Fig. 2 and will be described with reference to this.
  • the segmentation processor 201 receives an image from a suitable source.
  • Step 301 is followed by step 303 wherein the image is segmented into a plurality of image segments.
  • the aim of image segmentation is to group pixels together into image segments which are unlikely to contain depth discontinuities. A basic assumption is that a depth discontinuity causes a sharp change of brightness or color in the image.
  • image segmentation thus comprises the process of a spatial grouping of pixels based on a common property.
  • the segmentation includes detecting disjoint regions of the image in response to a common characteristic and subsequently tracking this object from one image or image to the next.
  • the segmentation comprises grouping image elements having similar brightness levels in the same image segment. Contiguous groups of image elements having similar brightness levels tend to belong to the same underlying object.
  • the segmentation comprises grouping image elements in response to motion characteristics of objects of the at least one image.
  • Conventional motion estimation compression techniques such as MPEG 2 based video compression, utilize motion estimation to identify and track moving objects between images thereby allowing for regions associated with the moving objects to be coded simply by a motion vector and differential information.
  • typical video compression techniques comprise processing of image segments based on motion estimation characteristics.
  • such segmentation is re-used by the segmentation processor 201 thus allowing for the segmentation process to re-use existing segmentations.
  • Step 303 thus results in the segmentation processor 201 generating a plurality of image segments.
  • the segmentation allows for a high probability that boundaries between objects in the image will correspond to boundaries between image segments.
  • edges of objects in the image may be analyzed by investigating the edges of the determined image segments.
  • Step 303 is followed by step 305 wherein the junction extraction processor 203 determines at least one junction associated with overlapping objects of the at least one image in response to the plurality of image segments.
  • the image is divided into 2 by 2 image element matrices and each matrix is compared to a predetermined junction detection criterion.
  • the matrix meets the junction detection criterion it is assumed that the matrix comprises a junction point, and if the junction detection criterion is not met it is assumed that the matrix does not comprise a junction point.
  • only part of the image is divided into 2 by 2 pixel matrices and specifically a lower computational resource use may be achieved by only considering 2 by 2 groups of pixels along the boundaries between the image segments.
  • An approach for determining junction points in accordance with the preferred embodiment will be described in more detail in the following. The description will use a notation wherein both the image and segmentation are represented by matrices of size NxM .
  • the junction points are identified by analyzing all 2x2 sub-matrices of the NxM segmentation matrix S . Since the T-junctions between image segments are to be detected, the analysis focuses on 3-junctions which are junctions at which exactly three different image segments meet. It should be noted that a 3 -junction is not necessarily a T-junction, but may also indicate a fork or an arrow shape (which may for example occur in the image of a cube). Specifically, the parameters for the segmentation performed in step 303 are preferably set conservatively such that there is a high probability that all T-junctions between objects (object T-junctions) are determined.
  • junction points determined in step 303 correspond to all junction points existing between image segments. Only some of these and typically only a small proportion are object T- junctions.
  • segmentation matrix S An element S y of this matrix contains the segment number at pixel location (/, j) .
  • the segment number itself is arbitrary and in the following we only use the property that the segment number changes at edges and the property that the segmentation is four-connected.
  • a sub- matrix contains a 3 -junction if exactly one of the four differences
  • This sub-matrix is not considered to be a 3 -junction because region number 1, which occurs twice, is not 4 -connected. This violates the basic assumption that regions in the segmentation must be 4 -connected on a square sampling grid.
  • a 2 by 2 sub-matrix is considered a 3 -junction if the four elements correspond to exactly three image segments and the two samples from the same image segment are next to each other either vertically or horizontally (but not diagonally).
  • the actual junction point in the Cartesian coordinates of the image is placed in the center of the 2 2 sub-matrix:
  • Step 305 is followed by step 307 wherein the depth information processor 205 determines depth information of one or more objects in the image.
  • the depth information is determined in response to the detected 3-junctions and preferably a large number of object T- junctions are used to develop a depth map for a plurality of objects in the image.
  • step 307 the following process is performed assuming that the junction point is an object T-junction.
  • a selection of object T-junctions from the total number of junction points is made either before the processing or after the processing of the depth information processor 205. As illustrated in Fig.
  • a T-junction has a characteristic geometry where one edge known as the top 111 ends abruptly in the middle of a second edge known as the stem 113. Identification of the top and stem is used in deriving a possible depth order. To identify the top and the stem, it is in the preferred embodiment assumed that both are straight lines which pass through the junction point (x jun , y- ), but with an arbitrary orientation angle.
  • first and second curves which in the preferred embodiment are straight lines. In other embodiments more complex curves may be used.
  • edge points are extracted from the segmentation matrix.
  • Fig. 4 illustrates an example of a relation between the placement of edge points and the segmentation matrix 400.
  • the three image segments surrounding a junction point are identified by the set ⁇ 1,2,3 ⁇ .
  • Edge points are now placed between rows and columns in the segmentation matrix, only if the values of the segmentation matrix changes from one row to the next or from one column to the next. As illustrated in Fig.
  • edge points 401 between image segment 1 and image segment 2 three edge points 403 between image segment 2 and image segment 3 and four edge points 405 between image segment 1 and image segment 3.
  • edge points within a given radius of the junction are determined. This allows for a reduced complexity and computational requirement yet results in good performance. Specifically, only the subset of edge points that fall inside a circle with radius R are used in the calculations. It has been found that using a radius R of 3-20 pixels provide desirable performance yet achieves low complexity and computational burden. Edge points between rows are calculated from matrix indices (/, j) and
  • edge points are then split into three subsets: l ⁇ k ⁇ N (edge points that lie on the edge between image segment 1 and 2) (edge points that lie on the edge between image segment 1 and 3)
  • Model A edges 1 and 2 form the top and edge 3 forms the stem
  • Model B edges 1 and 3 form the top and edge 2 forms the stem
  • Model C edges 2 and 3 form the top and edge 1 forms the stem
  • Each model has two parameters, the line orientation angles ⁇ top and ⁇ stsm which can vary between 0 and ⁇ . These parameters are in the preferred embodiment determined by minimizing the sum of squared perpendicular distances between edge points and the line. The sum of squared distances may be determined from the separate contributions from edge 1, 2 and 3. For instance for edge 1, this sum as a function of orientation angle is given by
  • m e [A, B, C ⁇ denote the model number.
  • the best model is selected as the one that minimizes the sum of all perpendicular distances between edge points and the T-junction: m best
  • a plurality of junction models have been predefined or predetermined. Specifically, the junction models comprising straight lines and corresponding to the possible alignments of these lines and the stem and top of a T-junction have been determined.
  • the step of determining depth information then comprises fitting each junction model to the data and selecting the model with the best fit.
  • depth information corresponding to the relative depth order of the adjacent image segments is readily available.
  • the image segment of an object T-junction which partly forms the top but not the stem is inherently in front of the image segments forming the stem.
  • Depth information between the two image segments forming the stem cannot directly be derived from the object T-junction.
  • a certainty or reliability measure is further determined for each T-junction.
  • the certainty measure is indicative of the certainty or reliability of the generated depth information.
  • the certainty measure may thus increase the accuracy of processes considering a plurality of T-junctions and specifically may increase the accuracy of a depth map of the image.
  • each T-junction may be weighted in accordance with the certainty measure thereby providing for conflicts between different depth estimates to be resolved and/or taken into account.
  • the certainty measure is determined by combining a geometric certainty measure and a photometric certainty measure.
  • the geometric certainty measure is determined in response to an accuracy of the fit of the first and second straight lines and the photometric certainty measure is determined in response to a color variation between at least two image segments of the junction.
  • only one of the geometric or photometric uncertainty measures may be used.
  • the error of the best model may be compared to the error of the worst model.
  • a suitable measure is
  • a preferred photometric certainty measure suitable for object T-junctions determines the color variations around the T-junction point. If the color contrast is high, it is more likely that there is indeed a depth step compared to when the color contrast is low as similar colors typically are indicative of the same object and therefore not a depth step.
  • the three sides adjacent a corner may have different shades of blue due to light reflections.
  • the corner may be detected as a 3-junction but it will result in a very low photometric certainty measure as the color contrast between segments will be low.
  • the blue cube occludes a yellow object any object T- junctions between the two objects will include a color contrast between the blue and yellow thus resulting in a high photometric certainty measure.
  • a suitable photometric certainty measure p may be proportional to the minimum color difference vector of the three possible image segments of the junction: p Oc minOli -I ⁇ ,]!, ⁇ I3 . I2 - I3I)
  • V >g' b with r >8> b denoting the red, green and blue color channel and ' ' denoting the magnitude of vector I .
  • the right hand side may be normalized by the magnitude of the largest possible difference vector. Normalization will not be required for applications where only the relative strength in an image is used.
  • the mean color vectors >' 2 ' 3 are calculated using pixel locations that lie within a given distance (e.g. measured in pixels) from the bifurcation point i.e. from the junction point.
  • the preferred embodiment comprises determining a combined certainty measure from the geometric and photometric certainty measures.
  • the certainty measures may be used to reject a candidate object T-junction.
  • the junction points may in accordance with the example of Fig. 3 be processed as described under the assumption that they are all object T-junctions. After processing the reliability of the resulting depth information may then be used to select a subset of the junction points as corresponding to object T-junctions. Specifically, only junction points resulting in a reliability above a given threshold are considered object T- junctions, and thus the depth information derived using an object T-junction hypothesis is only considered valid for these points. In the example of Fig. 3, depth information is derived from object T-junctions.
  • FIG. 5 illustrates an example of image segmentation in accordance with an embodiment of the invention.
  • a circle object occludes a square object.
  • the segmentation has resulted in a background segment 501, a square object segment 502, and three segments for the circle object 503, 504, 505. (Note that the segmentation process simply provides the image segments 501-505 without any information related to the objects of the image.
  • the above reference to the objects corresponding to the image segments is simply for clarity and brevity of the description).
  • junction detection will in > this example determine three different junction points 507, 509, 511 for image segment 503.
  • one junction point 507 corresponds to an object T-junction
  • one junction point 509 corresponds to an edge junction point (on the edge between two (but not three)) junction points
  • one junction point 511 corresponds to an internal object junction (all image segments are of the same object). Processing these junction points in accordance with the example of Fig. 3 will result in all three junction points initially being considered object T-junctions.
  • the reliability of the depth information will be high for junction point 507 but low for junction points 509 and 511. Accordingly, only the depth information for junction point 507 will be considered valid.
  • the object edge junction and the internal object junction comprise depth information that may be useful in determining depth information for the image.
  • the object edge junction 509 if correctly detected as an object edge junction indicates that image segment 502 is either in front or behind image segments 503 and 504 and that image segments 503 and 504 are at the same depth level.
  • the internal object junction 511 if correctly detected as an internal object junction indicates that all three image segments 503, 504, 505 are at the same level. Hence, preferably this information is not discarded.
  • Fig. 6 illustrates an example of an apparatus for determining depth information for a video image in accordance with an embodiment of the invention.
  • the apparatus comprises a segmentation processor 601 which specifically may be identical to the segmentation processor 201 described with reference to Fig. 2.
  • the segmentation processor 601 generates a number of image segments in accordance with a suitable algorithm.
  • the segmentation processor 601 is coupled to a junction extraction processor 603 which specifically may be identical to the segmentation processor 601 described with reference to Fig. 2.
  • the extraction junction extraction processor 603 generates a number of junction points corresponding to points where three image segments coincide.
  • the junction extraction processor 603 is coupled to first, second and third depth information processors 605, 607, 609. Each of the depth information processors 605, 607, 609 is operable to determine depth information data for a given junction point in accordance with a specific assumption of the association between the image segments and the underlying objects.
  • the first depth information processor 605 determines depth information data assuming that the three image segments of the current junction point belong to three different objects. Thus, the first depth information processor 605 determines depth information data for a junction point assuming that this is an object T- junction. Similarly, the second depth information processor 607 determines depth information data assuming that the three image segments belong to two different objects. Thus, the second depth information processor 607 determines depth information data for a junction point assuming that this is an object edge junction. Similarly, the third depth information processor 607 determines depth information data assuming that the three image segments belong to a single object.
  • the second depth information processor 607 determines depth information data for a junction point assuming that this is an internal object junction.
  • the depth information processors 605, 607, 609 determine depth information data comprising a relative depth of two or more of the image segments.
  • the first depth information processor 605 may determine that a given image segment is in front of the two other image segments.
  • the third depth information processor 609 may for example determine that all of the image segments are at the same depth as they are all part of the same object.
  • the depth information processors 605, 607, 609 furthermore determine a reliability of the depth information.
  • the depth information processors 605, 607, 609 may attempt to fit an object model corresponding to the object association assumption to the image data and accordingly determine a reliability of the object association assumption being correct.
  • the first depth information processor 605 may use the specific method of determining a reliability as described with reference to Fig. 2 and 3.
  • the depth information processors 605, 607, 609 are coupled to a depth characteristic processor 611 which receives the depth information data from the three depth information processors 605, 607, 609. In response to the depth information data, the depth characteristic processor 611 determines a depth characteristic for the junction point. In a simple embodiment, the depth characteristic processor 611 simply selects the depth information data from the depth information processor 605, 607, 609 which has the highest probability.
  • the depth characteristic processor 611 determines a relative depth of at least two of the image segments as well as a reliability of this information.
  • the depth information processors 605, 607, 609 and the depth characteristic processor 611 perform the described operation for all junction points detected in the image.
  • depth information is derived for all possible junction point types and the depth information most likely to be correct is selected. Consequently, the derived depth information is not limited to object T-junctions but information may also be derived from object edge junctions and internal object junctions. Thus, additional and improved depth information may be derived. In other embodiments, more complex algorithms for generating a depth characteristic from the depth information data may be used.
  • the depth information may be weighted according to the determined reliability before being combined into a single depth measure.
  • the depth characteristic processor 611 is coupled to an object depth processor 613.
  • the object depth processor 613 is operable to process the depth information from the individual junction points to derive depth information related to objects of the image.
  • the object depth processor 613 may be operable to determine a depth map for the objects of the image in response to the depth characteristic determined for the junction points of the image.
  • Fig. 7 illustrates a flow chart of a method of determining depth information for a video image in accordance with an embodiment of the invention. The method is suitable for the apparatus of Fig. 6 and will be described with reference to this.
  • the method initiates in step 701 wherein a video image is received from a suitable source.
  • Step 701 is followed by step 703 wherein the image is segmented into a plurality of images as previously described. Specifically, step 701 and 703 may be performed by the segmentation processor 601 of Fig. 6. Step 703 is followed by step 705 wherein junction points are determined in response to the image segments as previously described. Specifically, step 705 may be performed by the junction extraction processor 603 of Fig. 6. Step 705 is followed by step 707 wherein the first depth information processor 605 determines first depth information data for a junction point in response to a first object association assumption.
  • the object association assumption is specifically that the junction point is an object T-junction. Specifically, the two segments forming the stem of the T-junction are considered to be below the image segment forming the top of the T-junction.
  • step 709 the second depth information processor 607 determines second depth information data for a junction point in response to a second object association assumption.
  • the object association assumption is specifically that the junction point is an object edge junction.
  • the two image segments forming the stem will be at the same depth level and the third image segment will be at a different depth level which may be above or below the depth of the first two image segments.
  • a reliability or likelihood of the depth information is determined. For example, a model may be fitted and the reliability P E may be determined in response to the closeness of this fit.
  • step 709 is followed by step 711 wherein the third depth information processor determines third depth information data for a junction point in response to a third object association assumption.
  • the object association assumption is specifically that the junction point is an internal object junction. If the current junction point is an internal object junction, the three image segments involved will be at the same depth level.
  • a reliability P H of the information is determined in any suitable way and specifically may be determined in response to how close an internal object junction model fits the image data. It will thus be appreciated that in accordance with the described embodiment, a set of possible depth information data is determined in steps 707, 709 and 711. As a specific example, the following data may have been determined for the current junction point after step 711:
  • step 711 is followed by step 713 wherein the depth characteristic processor determines a depth characteristic for the junction point in response to the first, second and third information data.
  • step 713 simply corresponds to selecting the depth step(s) and weight(s) corresponding to the object association assumption being most likely, i.e. having the highest weight.
  • each model has a different number of model parameters. Assuming that the geometry follows directly from the edges surrounding the junction point, it follows that the internal object junction has 3 free parameters, the object edge junction has 6 free parameters and the object T-junction has 9 free parameters. This difference in the number of parameters needs to be accounted for.
  • each model is weighted where the weight is smallest for the model with the largest number of parameters (in this case the object T-junctions).
  • a more complex determination may be used than simple selection.
  • a weighted averaging of the individual depth steps may be performed.
  • Step 713 is followed by step 715 wherein it is determined if a depth characteristic has been determined for all junction points. If not, the method returns to step 707 where the next junction point is processed. If a depth characteristic has been determined for all junction points, the method continues in step 717 wherein depth information is determined for image segment pairs. Specifically, two image segments are selected and all the junction points corresponding to these two image segments are identified. The derived depth steps for all these junction points are then combined to generate a single depth step indicative of the depth step between the two image segments. Additionally, the derived weights for all the junction points are combined to generate a single weight indicative of the reliability of the depth step determined for the two image segments. Specifically, the depth steps may be determined by a weighted average. For example, referring to Fig. 5, a single depth step and weight may be determined for image segment pair 503 and 504 by taking into account both junction points 509, 511 relating to these image segments. Denoting image segment 503 by A and image segment 504 by B the following calculation may be made:
  • step 717 one depth step and corresponding weight exists for each segmentation pair of all the image segments forming junction points.
  • Step 717 is followed by step 719 wherein the depth information for the image segment pairs is used to determine a depth map for the image. Determining a depth map requires a number of functions to be performed. First, the depth information from the junction points must be extended to the complete image. Secondly, the depth ordering information at the junction points must be translated to quantitative depth jumps. Finally, the information from all the junction points should be combined. In the following description, constant depth is assumed for each image segment but it will be appreciated that the approach outlined can be generalized to a per-pixel depth map.
  • Each image segment pair has an associated depth inequality of the type di ⁇ dj, where i and j are neighboring image segments.
  • d . d. + ⁇ ..
  • Fig. 8 illustrates a flow chart for determining a depth map in accordance with an embodiment of the invention.
  • the method starts in step 801 wherein initial assigned depth levels are given to the plurality of image segments. Specifically, the same depth level d m i t may be assigned to all image segments.
  • step 801 is followed by step 803 wherein depth relations associated with each image segment pair of the plurality of image segments are determined in response to the plurality of depth characteristics. In the described embodiment, this operation has already been performed in step 717 and thus step 803 may simply consist in retrieving these values (or steps 717 and 803 may be perceived as the same step).
  • Step 803 is followed by step 805 wherein an error function dependent on the depth relations and the assigned depth levels is determined.
  • the error function preferably comprises a weighting of each image segment pair. Specifically the error function to be minimized may be a least-squares error norm:
  • pi can be positive or negative, depending on the occlusion
  • the weight W ps j s use d to associate a certain importance to the depth step.
  • Step 805 is followed by step 807 wherein the assigned depth levels are adjusted so as to minimize the error function.
  • step 807 is followed by step 809 wherein it is determined if the adjustments of step 801 are below a given threshold. If not, the method returns to step 803. Otherwise, it is considered that the algorithm has reached convergence and the method continues in step 811 wherein the depth map is detennined in response to the assigned depth levels. Specifically, the depth map is generated by including the determined depth d s for each image segment in the depth map.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
  • the invention is implemented as computer software running on one or more data processors and/or digital signal processors.
  • the elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
  • the present invention has been described in connection with the preferred embodiment, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. In the claims, the term comprising does not exclude the presence of other elements or steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente invention se rapporte à un système permettant de déterminer des informations de profondeur pour une image vidéo. L'image vidéo est segmentée (703) afin de définir plusieurs segments d'images. On détermine (705) plusieurs points de jonction associés auxdits segment d'images. Ces points de jonction peuvent se trouver entre trois objets différents, entre deux objets différents ou au sein d'un même objet. Pour chaque point de jonction, on détermine (707, 709, 711) des informations indépendantes de profondeur en se fondant sur l'hypothèse que le point de jonction en question correspond à chacune des trois associations d'objets possibles. La fiabilité de l'hypothèse est déterminée en réponse à la cohérence d'un modèle correspondant, et une caractéristique de profondeur est dérivée (713) par sélection des informations les plus fiables. Ensuite, on exploite les caractéristiques de profondeur de points de jonction multiples afin de définir une cartographie de profondeurs. La détermination et l'exploitation des informations de profondeur de tous les points de jonction et pas uniquement des jonctions T d'objets permet d'obtenir des informations de profondeur supplémentaires améliorées.
PCT/IB2004/052501 2003-11-26 2004-11-22 Determination d'informations de profondeur pour image video WO2005052858A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03104374.8 2003-11-26
EP03104374 2003-11-26

Publications (1)

Publication Number Publication Date
WO2005052858A1 true WO2005052858A1 (fr) 2005-06-09

Family

ID=34626409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/052501 WO2005052858A1 (fr) 2003-11-26 2004-11-22 Determination d'informations de profondeur pour image video

Country Status (1)

Country Link
WO (1) WO2005052858A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2747028A1 (fr) 2012-12-18 2014-06-25 Universitat Pompeu Fabra Procédé de récupération d'une carte de profondeur relative à partir d'une image unique ou d'une séquence d'images fixes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNG R C ET AL: "Use of monocular groupings and occlusion analysis in a hierarchical stereo system", PROCEEDINGS OF THE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. LAHAINA, MAUI, HAWAII, JUNE 3 - 6, 1991, LOS ALAMITOS, IEEE. COMP. SOC. PRESS, US, 3 June 1991 (1991-06-03), pages 50 - 56, XP010023186, ISBN: 0-8186-2148-6 *
ZERROUG M ET AL: "From an intensity image to 3-D segmented descriptions", PATTERN RECOGNITION, 1994. VOL. 1 - CONFERENCE A: COMPUTER VISION & IMAGE PROCESSING., PROCEEDINGS OF THE 12TH IAPR INTERNATIONAL CONFERENCE ON JERUSALEM, ISRAEL 9-13 OCT. 1994, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, vol. 1, 9 October 1994 (1994-10-09), pages 108 - 113, XP010215982, ISBN: 0-8186-6265-4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2747028A1 (fr) 2012-12-18 2014-06-25 Universitat Pompeu Fabra Procédé de récupération d'une carte de profondeur relative à partir d'une image unique ou d'une séquence d'images fixes

Similar Documents

Publication Publication Date Title
Yoon et al. Locally adaptive support-weight approach for visual correspondence search
KR101899866B1 (ko) 병변 경계의 오류 검출 장치 및 방법, 병변 경계의 오류 수정 장치 및 방법 및, 병변 경계의 오류 검사 장치
CN107437060B (zh) 对象识别设备、对象识别方法和程序
Kumar et al. Review on image segmentation techniques
US9165211B2 (en) Image processing apparatus and method
Lin et al. Locating the eye in human face images using fractal dimensions
Elder et al. The statistics of natural image contours
Le et al. Acquiring qualified samples for RANSAC using geometrical constraints
Chang et al. Disparity map enhancement in pixel based stereo matching method using distance transform
Liu et al. Dense stereo correspondence with contrast context histogram, segmentation-based two-pass aggregation and occlusion handling
Ruzon et al. Corner detection in textured color images
Kovacs et al. Orientation based building outline extraction in aerial images
Srikakulapu et al. Depth estimation from single image using defocus and texture cues
EP1277173A1 (fr) Soustraction d'images
US20210216829A1 (en) Object likelihood estimation device, method, and program
JP2005165969A (ja) 画像処理装置、及び方法
WO1999027493A9 (fr) Detection de correspondance d'images a l'aide de la similitude cumulative radiale
CN105931231A (zh) 一种基于全连接随机场联合能量最小化的立体匹配方法
WO2005052858A1 (fr) Determination d'informations de profondeur pour image video
Broussard et al. Using artificial neural networks and feature saliency to identify iris measurements that contain the most discriminatory information for iris segmentation
Giannarou et al. Edge detection using quantitative combination of multiple operators
US20060251337A1 (en) Image object processing
Kovacs et al. Edge detection in discretized range images
CN109636844B (zh) 一种基于3d双边对称的复杂桌面点云分割的方法
Bergevin et al. Detection and characterization of junctions in a 2D image

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase