WO2012172853A1 - Apparatus for generating three-dimensional image, method for generating three-dimensional image, program, and recording medium - Google Patents

Apparatus for generating three-dimensional image, method for generating three-dimensional image, program, and recording medium Download PDF

Info

Publication number
WO2012172853A1
WO2012172853A1 PCT/JP2012/059043 JP2012059043W WO2012172853A1 WO 2012172853 A1 WO2012172853 A1 WO 2012172853A1 JP 2012059043 W JP2012059043 W JP 2012059043W WO 2012172853 A1 WO2012172853 A1 WO 2012172853A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
vanishing point
depth model
target image
unit
Prior art date
Application number
PCT/JP2012/059043
Other languages
French (fr)
Japanese (ja)
Inventor
健史 筑波
正宏 塩井
健明 末永
敦稔 〆野
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2012172853A1 publication Critical patent/WO2012172853A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals

Definitions

  • the present invention relates to a stereoscopic image generation apparatus, a stereoscopic image generation method, a program, and a recording medium that add binocular stereoscopic information to a 2D image to generate a 3D image.
  • Patent Document 1 As a technique for realizing 2D / 3D conversion, for example, a technique disclosed in Patent Document 1 is known.
  • This patent document 1 is provided with a basic depth model indicating depth values of three basic images, and the input image depth model is changed by changing the composition ratio of the three basic depth models according to the pattern of the input image.
  • a stereoscopic image generation apparatus that generates an image to be presented to the left eye / right eye from the generated depth model and an input image is disclosed.
  • Japanese Patent No. 4214976 Japanese Patent Laid-Open No. 2005-1515344
  • the present invention was made to solve the above-described problems, and when the vanishing point position can be estimated by a geometric depth cue, a depth model of the image is generated based on the vanishing point, If the vanishing point position cannot be estimated due to a geometric depth cue, a more natural depth sensation can be obtained by generating a depth model of the image based on the degree of saliency representing the attractiveness in the image based on human visual characteristics. It is an object of the present invention to provide a stereoscopic image generating apparatus, a stereoscopic image generating method, a program, and a recording medium that can generate a certain stereoscopic image.
  • a first technical means of the present invention is a stereoscopic image generation apparatus that generates binocular stereoscopic information by adding binocular stereoscopic information to a 2D image, and estimates a vanishing point from the processing target image.
  • Vanishing point estimating means, depth model generating means for generating a different depth model based on whether the vanishing point can be estimated by the vanishing point estimating means, the depth model generated by the depth model generating means, and the processing target Based on the image and the assumed viewing condition information, a viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image is provided, and the depth model generation unit can estimate the vanishing point by the vanishing point estimation unit.
  • a depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated by the vanishing point estimating means, a depth model is generated based on the saliency of each pixel in the processing target image. It is obtained by and generating a.
  • the second technical means includes, in the first technical means, reduced image generation means for generating a reduced image having a predetermined image size from the processing target image, and the reduced image is converted into the vanishing point estimation means and the depth model generation. And an enlarged depth model generating means for generating an enlarged depth model having the same image size as the processing target image from the depth model of the reduced image generated by the depth model generating means. .
  • the depth model of the processing target image generated by the depth model generation means is smoothed in the spatial direction, and the depth model of the processing target image smoothed in the spatial direction is used.
  • the depth model smoothed in the spatio-temporal direction of the comparison target image in the past than the processing target image the depth model of the processing target image is smoothed in the time direction, and the spatio-temporal of the processing target image
  • a spatio-temporal direction smoothing means for generating a depth model smoothed in the direction is provided.
  • the assumed viewing condition information includes a pixel pitch of a display for displaying the 3D image, an image size of the display, a viewer to the display And a parallax range representing the depth of the 3D image, and a baseline length that is a distance between the left and right virtual viewpoints.
  • the saliency of each pixel in the processing target image is a point where the color difference between the target pixel and its peripheral pixels is large, or the target The higher the color difference between the pixel and the whole image, or the greater the color difference between the local region including the target pixel and its surrounding region, the higher the calculation.
  • a sixth technical means is the fifth technical means, wherein when the vanishing point cannot be estimated by the vanishing point estimating means, the depth model generating means has a high saliency of each pixel in the processing target image. The depth model is generated so that is on the near side.
  • a seventh technical means is any one of the first to sixth technical means, wherein the vanishing point estimating means estimates the vanishing point of the processing target image from the straight line information in the processing target image.
  • An inter-frame vanishing point that estimates a vanishing point of the processing target image based on a point estimation unit, the processing target image, a comparison target image that is earlier than the processing target image, and a vanishing point position in the comparison target image; And an estimation means.
  • the eighth technical means comprises scene change detection means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image in the seventh technical means, and the scene change detection means When a change is detected, the intra-frame vanishing point estimating means is selected, and when no scene change is detected by the scene change detecting means, the inter-frame vanishing point estimating means is selected. .
  • a ninth technical means includes, in the eighth technical means, storage means for storing vanishing point information including the position of the vanishing point of the comparison target image, and the vanishing point information of the comparison target image is stored in the storage means.
  • the inter-frame vanishing point estimating means is selected, and when the vanishing point information of the comparison target image is not stored in the storage means, the intra-frame vanishing point estimating means is selected. It is what.
  • the tenth technical means is any one of the seventh to ninth technical means, wherein the comparison target image is an image immediately before the processing target image.
  • An eleventh technical means is a stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image, and the stereoscopic image generation apparatus detects a vanishing point from a processing target image.
  • a depth model is generated based on the vanishing point, and when the vanishing point cannot be estimated in the vanishing point estimating step, It is obtained by and generating a depth model based on the saliency of each pixel in the.
  • the twelfth technical means is a program for causing a computer to execute the stereoscopic image generating method in the eleventh technical means.
  • the thirteenth technical means is a computer-readable recording medium recording the program in the twelfth technical means.
  • a stereoscopic image in which the depth feeling by the geometric depth cue is emphasized is generated by generating the image depth model based on the vanishing point. Can be generated. Further, according to the present invention, when the vanishing point position cannot be estimated by the geometric depth cue, by generating the image depth model from the saliency representing the attractiveness in the image based on the human visual characteristics, It is possible to generate a stereoscopic image in which the sense of depth of a portion that is noticed by a person is emphasized.
  • FIG. 1 is a block diagram illustrating a schematic configuration example of a stereoscopic image generating apparatus according to the present invention.
  • reference numeral 1 denotes a stereoscopic image generating apparatus.
  • the stereoscopic image generation apparatus 1 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, and a viewpoint image generation unit 40.
  • FIG. 2 is a flowchart for explaining an operation example in units of frames of the stereoscopic image generating apparatus 1 according to the present invention.
  • the stereoscopic image generation apparatus 1 in FIG. 1 uses an input image at time t (hereinafter also referred to as a processing target image F (t)) as a scene change detection unit 10, a vanishing point estimation unit 20, a depth. It outputs to the model production
  • the scene change detection unit 10 in FIG. 1 corresponds to the scene change detection means of the present invention, and the input processing target image F (t) and the image input immediately before the processing target image F (t) (
  • a predetermined image feature amount is calculated from the comparison target image F (t-1)), the similarity of the calculated image feature amount is compared, and the segment points (scene change) of the images that are continuous in time series ,
  • scene change information S (t) indicating the presence or absence of a scene change in the processing target image F (t) is output to the vanishing point estimation unit 20 and the depth model generation unit 30 (step S12 in FIG. 2).
  • scene change detection based on a luminance histogram representing the appearance frequency of the luminance value of an image will be described with reference to FIGS.
  • the scene change detection based on the luminance histogram is performed based on the luminance histograms H L (t) and H L (t ⁇ 1) from the processing target image F (t) and the comparison target image F (t ⁇ 1). ) And the similarity d (H L (t), H L (t ⁇ 1)) of the calculated luminance histogram is compared with a predetermined threshold value to determine whether or not there is a scene change in the processing target image F (t) Is to determine.
  • the scene change detection unit 10 includes a luminance histogram generation unit 101, a buffer 102, a histogram similarity calculation unit 103, and a scene change determination unit 104.
  • FIG. 5 is a flowchart for explaining an operation example of the scene change detection unit 10.
  • the luminance histogram generation unit 101 in FIG. 4 acquires luminance information from the input processing target image F (t), and a luminance histogram H L (t) that represents the appearance frequency of the luminance value from the acquired luminance information.
  • the calculation result luminance histogram H L (t)
  • the histogram similarity calculation unit 103 step S21 in FIG. 5).
  • the buffer 102 in FIG. 4 stores the luminance histogram H L (t) of the processing target image F (t) in order to detect a scene change in the image F (t + 1) immediately after the processing target image F (t).
  • the histogram similarity calculation unit 103 in FIG. 4 inputs the luminance histogram H L (t) of the input processing target image F (t) and the luminance histogram H of the comparison target image F (t ⁇ 1) read from the buffer 102.
  • the similarity d (H L (t), H L (t ⁇ 1)) is calculated from L (t ⁇ 1) by the equation (1), and the calculation result is output to the scene change determination unit 104 (FIG. 5). Step S23).
  • W represents the number of pixels per line of the image
  • H represents the number of lines of the image
  • v represents the luminance value
  • V represents the number of gradations of the luminance value
  • t) represents the appearance frequency of the luminance value v on the image F (t) at time t.
  • the range of values taken by the similarity d (H L (t), H L (t ⁇ 1)) is 0 to 2, and the closer the value is to 0, the more similar the shape of the histogram is. The closer the value is to 2, the more different the shape of the histogram is.
  • the scene change determination unit 104 in FIG. 4 performs threshold determination based on the similarity d (H L (t), H L (t ⁇ 1)) of the input histogram and a predetermined threshold d th , and the expression (2 ), Scene change information S (t) indicating the presence / absence of a scene change in the processing target image F (t) is set and output to the outside (step S24 in FIG. 5).
  • the scene change determination unit 104 determines that there is no scene change, and the scene change information S (t) Set “0” to. In other cases, it is determined that there is a scene change, and “1” is set in the scene change information S (t).
  • the predetermined image feature amount is calculated from the processing target image F (t) and the comparison target image F (t-1), and the similarity of the calculated image feature amount is compared.
  • the similarity of the calculated image feature amount is compared.
  • the vanishing point estimation unit 20 corresponds to the vanishing point estimation unit of the present invention, and the input scene change information S (t) of the processing target image F (t) and the vanishing point estimation unit 20.
  • Vanishing point estimating means intraframe vanishing point estimating means for estimating vanishing point position from straight line in image, feature between images based on previous vanishing point information VP (t ⁇ 1) stored internally An inter-frame vanishing point estimating means for estimating the vanishing point position in the current frame from the point correspondence and the vanishing point position of the previous frame) is selected, and the processing target image F (t) input by the selected vanishing point estimating means is selected.
  • the vanishing point position is estimated, and vanishing point information VP (t) describing the result is output to the depth model generating unit 30 (step S13 in FIG. 2).
  • the “vanishing point” is a point where, when two parallel lines in a three-dimensional space are projected (projected) onto a plane, those lines always converge to one point.
  • the vanishing point estimation unit 20 in the present embodiment includes a switching unit 201, a switching unit 202, an intraframe vanishing point estimation unit 21, an interframe vanishing point estimation unit 22, a buffer 203, and a buffer 204.
  • the intra-frame vanishing point estimation unit 21 in FIG. 6 includes an edge detection unit 211, a straight line detection unit 212, and a vanishing point identification unit 213.
  • This intra-frame vanishing point estimation unit 21 corresponds to the intra-frame vanishing point estimation means of the present invention, and estimates the vanishing point of the processing target image F (t) from the straight line information in the processing target image F (t).
  • This inter-frame vanishing point estimation unit 22 corresponds to the inter-frame vanishing point estimation means of the present invention, and is compared with the processing target image F (t) and the processing target image F (t) in the past compared image F (t ⁇ 1). ) And the position of the vanishing point in the comparison target image F (t ⁇ 1), the vanishing point of the processing target image F (t) is estimated.
  • FIG. 7 is a flowchart for explaining an operation example of the vanishing point estimation unit 20.
  • the vanishing point estimation unit 20 in FIG. 6 estimates vanishing points based on the input scene change information S (t) and the previous vanishing point information VP (t ⁇ 1) read from the buffer 204.
  • the switching unit 201 in FIG. 6 outputs an image input destination, and the switching unit 202 in FIG. 6 outputs vanishing point information.
  • the original is switched to the intra-frame vanishing point estimation unit 21 (step S32 in FIG. 7), and then the intra-frame vanishing point estimation unit 21 estimates the vanishing point position from the straight line in the image, and the result (vanishing point information) VP (t)) is output (step S33 in FIG. 7).
  • FIG. 8 is a flowchart for explaining an operation example of the intra-frame vanishing point estimation unit 21.
  • FIG. 9 is a diagram illustrating an image example corresponding to the flow of FIG.
  • the edge detection unit 211 in FIG. 6 calculates edge point information Edge (t) used for straight line detection from the input processing target image F (t) (see FIG. 9A). .
  • a differential operator is applied to each color component (for example, RGB (Red (red), Green (green), Blue (blue))), and a gradient vector of each color component i in the x direction and the y direction.
  • the edge detection unit 211 calculates the edge strength E (x, y
  • the edge detection unit 211 performs the calculation of Expression (4) for each pixel of the coordinates (x, y), so that the edge strength is locally maximum from the edge strength E (x, y
  • the edge detection unit 211 detects an edge when the edge strength E (x, y
  • W1 represents the size of the window in the x direction
  • W2 represents the size of the window in the y direction.
  • the straight line detection unit 212 in FIG. 6 applies straight line information L (t) to the input edge point information Edge (t) by applying Hough transform (see FIG. 9B).
  • L (t) straight line information
  • Edge (t) straight line information
  • FIG. 10A feature points A, B, and C on a certain straight line L are edge points that become “Edge (x, y
  • t) 1” obtained by the edge detection unit 211.
  • a straight line L on the image space shown in FIG. 10A is expressed using polar coordinate expressions ( ⁇ , ⁇ ).
  • represents the distance of a perpendicular drawn from the origin of the image space to the straight line L, and ⁇ represents the angle formed by the perpendicular to the x axis of the image space.
  • the range of ⁇ is ⁇ ⁇ 0, and the range of ⁇ is 0 ⁇ ⁇ ⁇ 2 ⁇ .
  • a straight line L that passes through the feature points A, B, and C on the image space of FIG. 10A is expressed by Expression (5) using parameters ( ⁇ 0 , ⁇ 0 ).
  • a straight line group passing through each of the feature points A, B, and C is mapped to the parameter space, it is represented as a curve a, a curve b, and a curve c on the parameter space in FIG. 10B. That is, the intersection ( ⁇ 0 , ⁇ 0 ) of the curve a, the curve b, and the curve c is detected as a straight line L passing through the feature points A, B, and C on the parameter space.
  • the range of NL is 0 ⁇ NL ⁇ NL max .
  • the vanishing point identifying unit 213 in FIG. 6 obtains the number of straight lines NL from the input straight line information L (t), compares the number of straight lines NL with a predetermined threshold value, and compares the magnitude of the step 334.
  • step S335 it is determined whether or not the vanishing point position is estimated (step S333).
  • the threshold ThL ( ⁇ 2) or in the following case) (No in step S333)
  • it is determined that there is no geometric depth clue sufficient to estimate the vanishing point and the process proceeds to step S336.
  • Equation (9) P (x) represents the probability that the vector x (coordinates of the intersection Pij) will appear.
  • Kc represents the number of classes (number of Gaussian distributions)
  • wi represents a weighting coefficient of class i Gaussian distribution
  • the sum of the weighting coefficients is 1.
  • ⁇ i represents a class i average vector (class i barycentric coordinates)
  • ⁇ i represents a class i covariance matrix
  • D represents the number of dimensions of the vector x.
  • ⁇ i, ⁇ i) in Expression (9) represents a Gaussian distribution (normal distribution) of class i, and is expressed using an average vector ⁇ i and a covariance matrix ⁇ i.
  • the vanishing point identifying unit 213 determines the average vector ⁇ i (center of gravity coordinates) of the distribution of the top N ( ⁇ 1) classes having a large weighting coefficient wi as the vanishing point position.
  • N the number of vanishing points
  • the vanishing point identifying unit 213 sets vanishing point information VP (t) as shown in FIG. 9C based on the result of step S333 or step S335 (step S336).
  • the vanishing point information VP (t) is expressed as data in Table 1, for example.
  • vanishing point information VP (t) is “vp_time” indicating time t (or a number (frame number) assigned to a frame of an image), “vp_num” indicating the number of detected vanishing points, And a list representing the position “vp_pos [n]” of the n vanishing points detected.
  • step S34 switching to the inter-frame vanishing point estimation unit 22 is performed (step S34), and then the inter-frame vanishing point estimation unit 22 and the input processing target image F (t) and the previous image F (t stored in the buffer 203 ( From t-1), the correspondence between feature points between images is obtained, and the vanishing point position in the processing target image F (t) is estimated from the correspondence and the previous vanishing point information VP (t-1).
  • the result (vanishing point information VP (t)) is output (step Flop S35). That is, the vanishing point estimation unit 20 includes storage means (buffer 204 in FIG. 6) that stores vanishing point information VP (t ⁇ 1) including the position of the vanishing point of the comparison target image F (t ⁇ 1).
  • Whether or not the frame has a vanishing point is determined by storing the vanishing point information VP (t ⁇ 1) of the previous frame in this storage means, and the vanishing point information VP (t ⁇ 1) is “vp_num> 0”. It is determined by whether or not there is.
  • FIG. 11 is a flowchart for explaining an operation example of the inter-frame vanishing point estimation unit 22.
  • FIG. 12 is a diagram illustrating an example of an image corresponding to the flow of FIG.
  • the described feature point information K (t) is output to the corresponding point calculation unit 222 in FIG. 6 (step S351).
  • a feature point is a point extracted as a part or vertex of an object's edge based on a change in color or luminance between pixels.
  • a secondary expressed using a gradient vector Gi (x, y) (i x, y) of luminance in the x and y directions within the range of the local region S with the pixel (x, y) as the center.
  • Gi (x, y) (i x, y) of luminance in the x and y directions within the range of the local region S with the pixel (x, y) as the center.
  • a first eigenvalue ⁇ 1 and a second eigenvalue ⁇ 2 of the moment matrix A (Expression (10)) are obtained, and a pixel (x, y) that satisfies the condition shown in Expression (11) is detected as a feature point.
  • Equation (10) the coefficient w (u, v) represents a weighting coefficient for a pixel (x + u, y + v) that is separated from the pixel (x, y) by u in the x direction and v in the y direction.
  • a value obtained by normalizing the value of the secondary Gaussian distribution within the range of the local region S determined so as to satisfy the condition (12) is used.
  • Corresponding point information Q (t, t ⁇ 1) describing the positions at time t and time t ⁇ 1 is output to the transformation matrix calculation unit 223 in FIG. 6 (step S352).
  • the position (x Ks, t ⁇ 1 , y Ks, t ⁇ 1 ) at which each feature point Ks of the processing target image F (t) is on the previous image F ( t ⁇ 1 ) is, for example, an expression It can be acquired by solving the constraint condition of the optical flow by the gradient method shown in (13) with respect to (x Ks, t ⁇ 1 , y Ks, t ⁇ 1 ) (for example, see Non-Patent Document 2).
  • S represents a local area of a predetermined size centered on the feature point Ks.
  • the transformation matrix H is called projective transformation because it can express general transformation.
  • Expression (14) is expressed as Expression (15).
  • coefficients tx and ty represent amounts of movement in the x and y directions, respectively.
  • Expression (14) is expressed as Expression (16).
  • Coefficients a, b, c, and d in Expression (16) represent enlargement / reduction and rotation, and coefficients tx and ty are the same as those in Expression (15).
  • the vanishing point position calculation unit 224 in FIG. 6 determines that “the vanishing point position at time t is the vanishing point position on the previous image F (t ⁇ 1). Using the transformation matrix H calculated by the transformation matrix calculation unit 223, the vanishing point position at the time t is calculated (step 354), assuming that the position is projected onto the processing target image F (t). Based on the result, vanishing point information VP (t) is set (step S355).
  • FIG. 12C shows an image example when the vanishing point on the image F (t ⁇ 1) is projected onto the image F (t) by the transformation matrix H.
  • the buffer 203 in FIG. 6 deletes the previous image F (t ⁇ 1) and stores the input processing target image F (t). Also, the buffer 204 in FIG. 6 deletes the previous vanishing point information VP (t ⁇ 1), and the vanishing point information input from the intra-frame vanishing point estimation unit 21 or the inter-frame vanishing point estimation unit 22. VP (t) is stored, and the vanishing point estimation process for the processing target image F (t) is terminated (step S36).
  • the vanishing point estimation unit 20 of the present embodiment As described above, according to the vanishing point estimation unit 20 of the present embodiment, as shown in FIG. 13, in the same scene (a time series image group correlated in the spatial direction and the time direction), the same from the first frame F (t 0 ). Until the frame F (t 0 + k ⁇ 1) in which the vanishing point is first detected in the scene, the vanishing point is estimated by the intra-frame vanishing point estimating means (intra-frame vanishing point estimating unit 21).
  • the inter-frame vanishing point estimation means (inter-frame Since the vanishing point is estimated by the vanishing point estimation unit 22), the vanishing point is more robust to the camera work than the case where the vanishing point is estimated by the intra-frame vanishing point estimation unit, and the vanishing point fluctuation is suppressed and stabilized.
  • the vanishing point can be estimated.
  • the depth model generation unit 30 in FIG. 1 corresponds to the depth model generation means of the present invention, and generates different depth models based on whether or not the vanishing point can be estimated by the vanishing point estimation unit 20. To do. That is, the depth model generation unit 30 is based on the vanishing point information VP (t), based on the visual model of the depth model creating means (first depth model creating means for creating the depth model from the vanishing point position, human visual characteristics). Second depth model creating means for creating a depth model from the saliency representing the attractiveness in the image), and the depth value of each pixel in the processing target image F (t) is set by the selected depth model creating means. Then, the depth model D (t) representing the depth value of each pixel is output to the viewpoint image generation unit 40 (step S14 in FIG. 2).
  • the depth model generation unit 30 includes a switching unit 301, a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306.
  • FIG. 15 is a flowchart for explaining an operation example of the depth model generation unit 30.
  • the depth model generation unit 30 in FIG. 14 selects a depth model creation means based on the input vanishing point information VP (t) (step S41). That is, when there is a vanishing point in the current frame (“vp_num> 0” of vanishing point information VP (t)) (Yes in step S41), the switching unit 301 in FIG. 14 transfers the image input destination to the region dividing unit 303. The switching unit 302 in FIG. 14 switches the output source of the data input to the depth value setting unit 306 to the distance calculation unit 304, and the first depth model creation means based on the vanishing point is selected (step S42).
  • step S43 the distance calculation unit 304 in FIG. 14 calculates the distance Dist (x, y) between the coordinates of the vanishing point of the vanishing point information VP (t) and the coordinates of each pixel, and describes the result.
  • the distance information Dist (t) is output to the depth value setting unit 306 (step S43).
  • the distance Dist (x, y) between the coordinates of the vanishing point and the coordinates of each pixel is calculated based on any one of Expression (17), Expression (18), and Expression (19).
  • ⁇ x and ⁇ y represent the distance in the x direction and the distance in the y direction between each pixel and the vanishing point, respectively.
  • FIG. 16 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP is present in the screen.
  • FIG. 16A shows an example in which the vanishing point VP is in the screen.
  • BD1a in FIG. 16B represents equation (17)
  • BD1b in FIG. 16C represents equation (18)
  • BD1c in FIG. 16D represents distance information Dist (t) based on equation (19).
  • FIG. 17 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP exists outside the screen.
  • FIG. 17A shows an example in which the vanishing point VP is outside the screen.
  • the region dividing unit 303 in FIG. 14 has a plurality of similar feature amounts (feature value values are within a predetermined range) by region division (clustering) of the processing target image F (t).
  • region division region division
  • the region dividing unit 303 divides the image into a plurality of regions by clustering in the feature amount space.
  • the clustering by the feature amount space means that each pixel of the image space is mapped to the feature amount space (for example, color, edge, motion vector), and the K-means method, the Mean-Shift method, or the K nearest neighbor in the feature amount space.
  • Clustering is performed by a technique such as a search method (approximate K nearest neighbor search method).
  • the pixel value in the class is replaced with a pixel value (for example, an average value) that is a representative value of each region, and the pixel value in the original image space is replaced with each region. Is assigned to all pixels in each region, and region information R (t) describing the result is output to the depth value setting unit 306 (step S44).
  • a pixel value for example, an average value
  • the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel based on the input distance information Dist (t) and region information R (t). Specifically, as shown in Expression (20), the average value of the distances Dist (x, y) of the pixels in each area indicated by the area information R (t) is scaled, and the depth value D base serving as a reference is scaled. A value shifted by (x, y) is set as the depth value D (x, y) of each pixel (step S45).
  • D max is the upper limit value of the depth value
  • D min is the lower limit value of the depth value
  • Dist max is the maximum value of the distance information Dist (t)
  • Dist min is the minimum value of the distance information Dist (t).
  • the value D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view).
  • an image A represents an example of the processing target image F (t)
  • an image B is a region division result (region division information R (t)) of the processing target image F (t) obtained by the region division unit 303.
  • the image C represents an example of the vanishing point VP of the processing target image F (t)
  • the image D represents an example of the distance information Dist (t) of the processing target image F (t) obtained by the distance calculation unit 304.
  • the image E is an example of the depth model obtained by the depth value setting unit 306 based on the region division information R (t) of the image B and the distance information Dist (t) of the image D.
  • the bright part represents the near side
  • the dark part represents the back.
  • the saliency calculating unit 305 in FIG. 14 calculates the saliency M (t) representing the attractiveness in the image based on the human visual characteristics from the input processing target image F (t) (step S47).
  • portions that are easy for people to pay attention to are locations where the color difference between the pixel of interest and its surrounding pixels is large (local color difference), locations where the color difference between the pixel of interest and the entire image is large, or a local region including the pixel of interest There is a portion (global color difference) where the color difference between the image and the surrounding area is large.
  • Color difference is a quantitative representation of perceptual differences in color.
  • CIELAB color space CIE 1976 L * a * b *
  • space is a uniform color space. Also referred to as space).
  • the saliency M (x, y) of each pixel is calculated by Equation (21).
  • Expression (21) ⁇ E local represents a local color difference
  • ⁇ E global represents a global color difference
  • coefficients ⁇ and ⁇ represent predetermined weight coefficients. That is, Expression (21) expresses the saliency by a linear sum of the local color difference and the global color difference.
  • the local color difference ⁇ E local is calculated by equation (22)
  • the global color difference ⁇ E global is calculated by equation (23).
  • L * represents a lightness index
  • a * represents red-green perceptual chromaticity
  • b * represents yellow-blue perceived chromaticity.
  • CIELV color space also referred to as CIE 1976 L * u * v * color space
  • CIE 1976 L * u * v * color space may be used as the color space for evaluating the color difference.
  • the coefficient w (u, v) in the equation (22) is the same as that in the equation (12), and thus the description thereof is omitted.
  • the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel by the calculation of Expression (24) based on the input saliency M (t), and describes the result.
  • the depth model D (t) is output (step S48).
  • D max is the upper limit value of the depth value
  • D min is the lower limit value of the depth value
  • M max is the maximum value of the saliency M (t)
  • M min is the minimum value of the saliency M (t)
  • D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). That is, the saliency M (x, y) of each pixel is scaled by the equation (24), and the value shifted by the reference depth value D base (x, y) is the depth value D (x, y) of each pixel.
  • D max is the upper limit value of the depth value
  • D min is the lower limit value of the depth value
  • M max is the maximum value of the saliency M (t)
  • M min is the minimum value of the saliency M (t)
  • D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value
  • an image A represents an example of the processing target image F (t)
  • an image B represents an example of the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305
  • the image C represents an example of a reference depth model (D base )
  • the image D is created by combining the depth model (D base ) of the image C with the saliency M (t) of the image B in the depth value setting unit 306. It is an example of a depth model.
  • a bright part represents a part that is easily noticed by a person (high attractiveness), and a dark part (black) represents a part that is difficult for human attention (low attractiveness).
  • a dark part represents a part that is difficult for human attention (low attractiveness).
  • the depth model creation means based on the saliency superimposes the scaled saliency on the reference depth surface (D base ), and the region of interest and its surrounding region By emphasizing the difference in relative depth, a pseudo depth feeling is perceived.
  • the depth model When the depth model is generated based on the saliency, the depth of the portion with high saliency (high attraction) is on the near side on the reference depth surface, and the saliency is low (low attraction) ) Set the depth of the part to the back side.
  • the relative depth difference between the region of interest and the surrounding region is emphasized, and a pseudo depth sensation can be perceived.
  • the depth when the saliency of the attention area is higher than the saliency of the surrounding area, the depth is set to be relatively closer to the surrounding area. Further, when the saliency of the attention area is equal to the saliency of the surrounding area, the depth is set to be relatively the same. Further, when the saliency of the attention area is lower than the saliency of the peripheral area, the depth is set to be relatively deep with respect to the peripheral area.
  • the depth model generation unit 30 when the vanishing point position can be estimated by the geometric depth clue, the depth model of the image is created based on the vanishing point, thereby generating the geometric depth clue. It is possible to create a depth model that emphasizes the sense of depth.
  • the depth model of the image is created from the saliency representing the attractiveness in the image based on the visual characteristics of the person, so that It is possible to create a depth model that emphasizes the sense of depth.
  • the viewpoint image generation unit 40 of FIG. 1 corresponds to the viewpoint image generation means of the present invention, and each pixel represented by the depth model D (t) based on the assumed viewing condition information set in advance.
  • “assumed viewing condition information” is information for generating a stereoscopic image (left-eye presentation image, right-eye presentation image) to be presented to the viewer, and the pixel pitch (pixel) of the display that displays the stereoscopic image.
  • Distance) ⁇ display image size, distance between viewer and display for displaying stereoscopic image (assumed viewing distance) f, parallax range (range of parallax vector) representing depth of stereoscopic image, baseline length t (viewpoint) This represents the distance between the virtual right viewpoint Cr of the image Fr (t) and the virtual left viewpoint Cl of the viewpoint image Fl (t).
  • FIG. 20 shows an overhead view of an example of a camera (viewpoint) arrangement for generating a viewpoint image based on the assumed viewing condition information.
  • a camera on the virtual right viewpoint Cr and a camera on the virtual left viewpoint Cl are arranged in parallel with the camera on the reference viewpoint Cc in the x-axis direction, respectively.
  • Is observing a point of interest P in a three-dimensional space Is observing a point of interest P in a three-dimensional space.
  • the position of the attention point P projected on the image plane Ic of the reference viewpoint Cc is Xc
  • the position of the attention point P projected on the image plane Il of the virtual left viewpoint Cl is X1
  • the image plane Ir of the virtual right viewpoint Cr is the image plane Ir of the virtual right viewpoint Cr.
  • Xr be the position of the point of interest P projected above.
  • the geometrical relationships between the positions Xl and Xc, Xr and Xc of the target point P projected on each image plane using the distance t / 2 in the x direction are expressed by equations (25) and (26), respectively.
  • FIG. 21 is a block diagram illustrating a configuration example of the viewpoint image generation unit 40.
  • FIG. 22 is a flowchart for explaining an operation example of the viewpoint image generation unit 40.
  • FIG. 23 is a diagram for explaining a viewpoint image generation example in the viewpoint image generation unit 40.
  • the viewpoint image generation unit 40 includes a disparity vector calculation unit 401, a texture shift unit 402, a gap filling unit (also referred to as an occlusion compensation unit) 403, and a floating window superimposing unit 404. .
  • the disparity vector calculation method is based on the assumed viewing condition information in advance as shown in the LUT of FIG.
  • g (D) in FIG. 23B represents a function for converting the depth value D into a disparity vector.
  • the spreading direction and the crossing direction of the disparity vector (shift amount) in FIG. 23B will be described with reference to FIG.
  • P be a certain point of interest
  • Pr be the point of interest P projected onto the display surface when viewed from the right eye
  • Pl be the point of interest P projected onto the display surface when viewed from the left eye.
  • the disparity vector in the spreading direction is such that a certain point of interest P is located behind the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl This is a case where the value of the parallax vector from to Pr becomes positive.
  • the disparity vector in the cross direction is such that a certain point of interest P is located in front of the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl to Pr. This is a case where the value of the parallax vector to becomes negative. When the value of the parallax vector is zero, the attention point P is located on the display surface.
  • texture shift is performed from a pixel whose disparity vector value has a value on the spreading direction side (for example, d2 on the LUT in FIG. 23B).
  • FIG. 23A a case where a virtual left viewpoint viewpoint image Fl (t) (left eye presentation image) is generated from the reference image iF and the depth model iD is considered.
  • the depth value of the white portion is D1
  • the depth value of the black portion is represented by D2.
  • each pixel of the layer L2 having the depth value D2 in FIG. 23A is shifted by d2 in the spreading direction. After that, when each pixel of the layer L1 having the depth value D1 in FIG.
  • the missing area represents an area in which no pixel value is set because there is no corresponding pixel on the reference image in the viewpoint image oF1 in FIG.
  • the gap filling unit 403 in FIG. 21 in the input viewpoint image Fi (t) (i 1, r), the missing region that is not located at the screen edge (for example, the viewpoint image oF1 in FIG. 23A).
  • linear interpolation, a median filter, or a known image restoration method is used as a method for interpolating a pixel in a defective region.
  • the maximum width W1 of the defect region is obtained.
  • W2 is a value obtained by scaling W1 with a predetermined constant ⁇ .
  • the viewpoint image after the floating window is inserted is, for example, the viewpoint image oF2 of FIG. In the viewpoint image oF2 in FIG.
  • floating windows fw1 and fw2 are inserted at the left end and the right end of the screen, respectively.
  • the reason for inserting the floating window is that the position or shape of a certain target is extremely different in the image presented to the left eye / right eye (for example, the generated viewpoint image is located at the left end / right end of the screen). This is to suppress a visual field struggle that alternately perceives left and right retinal images caused by the inability to see both eyes as one object.
  • the depth model of the image is generated based on the vanishing point, so that the depth sensation by the geometric depth cue can be obtained.
  • An enhanced stereoscopic image can be generated.
  • the depth model of the image is generated from the saliency representing the attractiveness in the image based on the human visual characteristics, thereby A stereoscopic image in which the sense of depth is emphasized can be generated.
  • the depth model generation unit 30 scales the saliency M (x, y) of each pixel according to the equation (24) as an example of the depth model creation unit based on the saliency, and becomes a reference depth value D.
  • the configuration of the depth model generation unit 30 may be changed so that the switching unit 301 is removed as illustrated in FIG. 25 and the image F (t) is input to the region division unit 303 and the saliency calculation unit 305.
  • the depth model generation unit 30a includes a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306.
  • the saliency calculating unit 305 in FIG. 25 calculates the saliency M (t) from the processing target image F (t) by the same processing as step S47 in FIG. 15, and the result is sent to the depth value setting unit 306. This is output (step S47 ′ in FIG. 26).
  • the area dividing unit 303 in FIG. 25 divides the processing target image F (t) into areas by processing similar to that in step S44 in FIG. 15, and sets area information R (t) describing the result to the depth value setting. It outputs to the part 306 (step S48 'of FIG. 26).
  • each area information R (t) indicates, as shown in Expression (29).
  • the average value of the saliency M (x, y) of the pixels in the region is scaled, and the value shifted by the reference depth value D base (x, y) is used as the depth value D (x, y) of each pixel. Setting is made (step S49 'in FIG. 26).
  • D max is the upper limit value of the depth value
  • D min is the lower limit value of the depth value
  • M max is the maximum value of the saliency M (t)
  • M min is the minimum value of the saliency M (t).
  • D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view).
  • image A represents an example of the processing target image F (t)
  • image B is an example of the region division result (region information M (t)) of the processing target image F (t) obtained by the region dividing unit 303.
  • the image C represents the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305
  • the image D represents an example of a reference depth model (D base )
  • the image E Is an example of the depth model obtained by the depth value setting unit 306 based on the region information R (t) of the image B, the saliency M (t) of the image C, and the depth model (D base ) as a reference of the image D. is there.
  • a bright part (white) represents a part that is easily noticed by a person (high attraction)
  • a dark part (black) represents a part that is difficult for a person to pay attention (low attraction).
  • the bright part represents the near side
  • the dark part represents the back.
  • the reference depth model (D base ) in the image D of FIG. 27, a plane having the same depth value is given as an example, but the present invention is not limited to this.
  • the plane equation shown in the following equation (30) may be determined in advance, and the reference depth value D base (x, y) may be set by the coordinates (x, y) of each pixel.
  • An example of the depth model serving as a reference expressed by the equation (30) is shown in an image A of FIG. Image A in FIG.
  • the depth model generation unit 30a and the saliency representing the attractiveness in the image based on the visual characteristics of the person and the image segmentation result ( By setting a uniform depth value for each region based on (region information), it is possible to generate a depth model that suppresses errors in the depth context.
  • the scene change detection unit 10 has been described using the luminance histogram as the image feature amount used for scene change detection, but the present invention is not limited to this.
  • a color histogram representing the appearance frequency of each color component, an average error of inter-frame differences, and a motion vector distribution may be used as the image feature amount.
  • the edge detection unit 211 has described edge detection in which a point (Local Maxima) where the edge intensity is locally maximum in the image space is extracted as an edge point, but the present invention is not limited to this.
  • a known edge detection method such as Canny Edge detection may be used.
  • a differential operator edge detector
  • a known method such as a Sobel filter, a Prewitt filter, a LoG filter (Laplacian of Gaussian), or a DoG filter (Difference of Gaussian) is used. May be.
  • the vanishing point identification part 213 demonstrated the case where Gaussian distribution was used as a distribution model used for a mixed model, this invention is not limited to this.
  • an exponential distribution family (Laplace distribution, beta distribution, Bernoulli distribution, etc.) may be used for the distribution model.
  • the vanishing point identifying unit 213 may determine the number of classes Kc used in the mixed model as a predetermined value and determine the value as in the following example.
  • the vanishing point identifying unit 213 sets a predetermined class number Kc ′ as the class number Kc, and performs clustering by the K-means method.
  • the vanishing point identifying unit 213 merges the class Ci and the class Cj to form a new class Ck ′.
  • the vanishing point identifying unit 213 determines the number of classes Kc ( ⁇ Kc ′) by repeating this process until the number of classes converges to a constant value.
  • the method used by the vanishing point identifying unit 213 to estimate the distribution model of the intersection is not limited to a parametric estimation method such as a mixed model, but is also a Mean-shift method, a K-means method, a K nearest neighbor search method (approximate K).
  • a non-parametric estimation method such as a nearest neighbor search method may be used.
  • Clustering in the image space is a method for performing region division based on the similarity between pixels or pixel groups (regions) constituting the region in the original image space without mapping to the feature amount space. is there.
  • the region dividing unit 303 uses (a) pixel combination method, (b) region growth method (also referred to as Region Growing method), and (c) region division integration method (also referred to as Split & Merge method) in the image space. Clustering may be performed.
  • the saliency calculating unit 305 calculates the saliency based on the local color difference and the global color difference has been described.
  • the present invention is not limited to this, and the local color difference (expression ( The saliency may be calculated based on one of the indicators of the first term ⁇ E local ) in 21) or the global color difference (second term ⁇ E global in equation (21)).
  • the color difference may be calculated using only the lightness index L * without using the red-green perceptual chromaticity a * and the yellow-blue perceptual chromaticity b *.
  • the local color difference ⁇ E local and the global color difference ⁇ E global are expressed by the following equation (31), using the lightness difference ⁇ L *, chroma difference ⁇ C *, and hue difference ⁇ H * based on the CIE method, respectively. You may obtain
  • the coefficient w (u, v) in the equation (31) is the same as that in the equation (12).
  • the coefficients l, c, h in the expressions (31) and (32) are predetermined weighting coefficients.
  • the method of obtaining the saliency is not limited to the color difference, and the saliency may be calculated based on a plurality of image feature amounts such as a color difference, an edge gradient, and a motion vector (for example, see Non-Patent Document 5).
  • the input / output image size of the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 of the stereoscopic image generation device 1 is assumed to be the same as the input image F (t).
  • the present invention is not limited to this.
  • images input to the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 are reduced in advance to a predetermined image size, and the depth You may add and implement the process which expands the depth model output from the model production
  • the first modification (stereoscopic image generating apparatus 2) of the stereoscopic image generating apparatus 1 includes a reduction processing unit 50, a scene change detecting unit 10, a vanishing point estimating unit 20, and a depth model generating unit as shown in FIG. 30, an enlargement processing unit 60, and a viewpoint image generation unit 40.
  • the reduction processing unit 50 corresponds to reduced image generation means of the present invention, and generates a reduced image having a predetermined image size from the input image F (t). Then, the generated reduced image is input to the vanishing point estimation unit 20 and the depth model generation unit 30.
  • the enlargement processing unit 60 corresponds to the enlarged depth model generation unit of the present invention, and generates an enlarged depth model having the same image size as the input image F (t) from the depth model of the reduced image generated by the depth model generation unit 30. .
  • step S63, step S64, step S65, step S67 in FIG. 30 Each operation (step S63, step S64, step S65, step S67 in FIG. 30) of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model generation unit 30, and the viewpoint image generation unit 40 in FIG.
  • step S12 in FIG. Since it is the same as S13, step S14, and step S15), description thereof is omitted.
  • the stereoscopic image generation device 2 in FIG. 29 outputs the input image at time t to the reduction processing unit 50 and the viewpoint image generation unit 40 (step S61 in FIG. 30).
  • image reduction reduces the input processing target image F (t) to a predetermined image size, converts the reduced image Fd (t) to the scene change detection unit 10, the vanishing point estimation unit 20, and It outputs to the depth model production
  • image reduction is performed using, for example, a nearest neighbor method, a bilinear method, or a bicubic method.
  • the depth model is enlarged using, for example, the nearest neighbor method, the bilinear method, or the bicubic method.
  • scene change detection processing, vanishing point estimation processing, and depth model generation processing are performed using a reduced image having an image size smaller than the input image, and therefore, compared with the stereoscopic image generating apparatus 1 of FIG. 1.
  • the memory size and the amount of calculation can be reduced.
  • the stereoscopic image generation apparatus 1 when the vanishing point position can be estimated by the geometric depth cue, the stereoscopic image generation apparatus 1 generates a depth model of the image based on the vanishing point and estimates the vanishing point position by the geometric depth cue.
  • the depth model of the image is generated based on the saliency representing the attractiveness in the image based on the visual characteristics of the person. For this reason, in the frames before and after the depth model generation unit is switched, the depth model is different in the time direction, so that the change in parallax (depth) is considered to be large.
  • the spatio-temporal direction in which the depth model is smoothed in the spatiotemporal direction in order to reduce the change in parallax in the temporal direction is provided between the depth model generation unit 30 and the viewpoint image generation unit 40.
  • the stereoscopic image generation device 3 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, a spatio-temporal direction smoothing unit 70, and a viewpoint image generation unit 40 as illustrated in FIG. Is done.
  • the spatio-temporal direction smoothing unit 70 corresponds to the spatio-temporal direction smoothing means of the present invention, and includes a spatial direction smoothing unit 701, a time direction smoothing unit 702, and a buffer 703 as shown in FIG.
  • the spatiotemporal direction smoothing unit 70 smoothes the depth model D (t) of the processing target image F (t) generated by the depth model generation unit 30 in the spatial direction, and the image F (t) smoothed in the spatial direction.
  • the image F The depth model Ds (t) of (t) is smoothed in the time direction, and the depth model Dt (t) smoothed in the spatio-temporal direction of the image F (t) is generated.
  • step S72, step S73, step S74, step S76 of FIG. 33 movement (step S72, step S73, step S74, step S76 of FIG. 33) of the scene change detection part 10, the vanishing point estimation part 20, the depth model generation part 30, and the viewpoint image generation part 40 of FIG.
  • the spatiotemporal direction smoothing unit 70 in FIG. 31 performs a smoothing process in the spatiotemporal direction on the depth model D (t) of the input processing target image F (t), and the result (smoothed depth model Dt ( t) is output (step S75 in FIG. 33).
  • the spatial direction smoothing unit 701 in FIG. 32 smoothes the depth model D (t) in the spatial direction using a one-dimensional smoothing filter in the order of the horizontal direction, the vertical direction, or the vertical direction and the horizontal direction.
  • the result (depth model Ds (t)) is output to the time direction smoothing unit 702 (step S81 in FIG. 34).
  • the one-dimensional smoothing filter for example, a one-dimensional Gaussian filter is used.
  • the coefficient ⁇ in the equation (33) is a predetermined value between 0 and 1.
  • the stereoscopic image generating device 3 by changing the depth model in the spatio-temporal direction, the change in parallax (depth) in the frames before and after the depth model generating means is switched and in the frames before and after the scene change occurs. Can be reduced.
  • the present invention may be in the form of a stereoscopic image generating method by the stereoscopic image generating apparatus 1. Moreover, it is good also as a form of the program for making a computer perform this stereo image production
  • a part of the stereoscopic image generating apparatus 1 in the above-described embodiment may be realized by a computer.
  • a program for realizing this control function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed.
  • the “computer system” is a computer system built in the stereoscopic image generation apparatus 1 and includes an OS and hardware such as peripheral devices.
  • the “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, and a hard disk incorporated in a computer system.
  • the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line,
  • a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time.
  • the program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
  • a part or all of the stereoscopic image generating apparatus 1 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration).
  • LSI Large Scale Integration
  • Each functional block of the stereoscopic image generating apparatus 1 may be individually made into a processor, or a part or all of them may be integrated into a processor.
  • the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
  • 1, 2, 3, 3D image generation device, 10 scene change detection unit, 20, vanishing point estimation unit, 21, intra-frame vanishing point estimation unit, 22, interframe vanishing point estimation unit, 30, depth model generation unit, 40: viewpoint image generation unit, 50 ... reduction processing unit, 60 ... enlargement processing unit, 70 ... spatiotemporal direction smoothing unit, 101 ... luminance histogram generation unit, 102, 203, 204, 703 ... buffer, 103 ... histogram similarity Calculation unit 104... Scene change determination unit 201, 202, 301, 302... Switching unit 211 211 Edge detection unit 212 Straight line detection unit 213 Vanishing point identification unit 221 Feature point detection unit 222 Point calculation unit, 223 ... transformation matrix calculation unit, 224 ...
  • vanishing point position calculation unit 303 ... area division unit, 304 ... distance calculation unit, 305 ... saliency calculation unit, 306 ... depth value setting unit, 01 ... disparity vector calculation unit, 402 ... texture shift unit, 403 ... gap filling section, 404 ... floating window superimposing unit, 701 ... spatial direction smoothing unit, 702 ... time direction smoothing unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)

Abstract

When the position of a vanishing point can be estimated, a depth model of an image is generated on the basis of the vanishing point, and when the position of the vanishing point cannot be estimated, the depth model of the image is generated on the basis of the degree of prominence, whereby a three-dimensional image can be generated with a more natural feeling of depth. An apparatus for generating a three-dimensional image (1) is provided with: a vanishing point estimation unit (20) for estimating a vanishing point from an image to be processed; a depth model generator (30) for generating a different depth model on the basis of whether it was possible or impossible for the vanishing point estimation unit (20) to estimate the vanishing point; and a viewpoint image generator (40) for generating the image presented to the right eye and the image presented to the left eye on the basis of the depth model generated by the depth model generator (30), the image to be processed, and assumed viewing condition information. If it was possible for the vanishing point estimation unit (20) to estimate the vanishing point, the depth model generator (30) generates the depth model on the basis of the vanishing point, and if it was impossible for the vanishing point estimation unit (20) to estimate the vanishing point, the depth model is generated on the basis of the degree of prominence of each pixel within the image to be processed.

Description

立体画像生成装置、立体画像生成方法、プログラム、および記録媒体Stereoscopic image generating apparatus, stereoscopic image generating method, program, and recording medium
 本発明は、2D画像に対して両眼立体情報を付加し、3D画像を生成する立体画像生成装置、立体画像生成方法、プログラム、及び記録媒体に関する。 The present invention relates to a stereoscopic image generation apparatus, a stereoscopic image generation method, a program, and a recording medium that add binocular stereoscopic information to a 2D image to generate a 3D image.
 近年、3DTV(3D Television)の普及と3Dデジタル放送の開始により、家庭において3D映像を視聴する環境が整いつつある。しかし、3D映像の再生環境の整備に伴い、3D映像のコンテンツ不足が指摘されている。こうしたコンテンツ不足の解消へのアプローチとして、2D画像に対して人工的に両眼立体情報を付加し、3D画像を生成する2D/3D変換(2D to 3D conversion)が注目されている。 In recent years, with the spread of 3DTV (3D Television) and the start of 3D digital broadcasting, the environment for viewing 3D video at home is being prepared. However, with the improvement of the 3D video playback environment, it has been pointed out that there is a shortage of 3D video content. As an approach to resolving such a shortage of content, 2D / 3D conversion (2D to 3D conversion), in which binocular stereoscopic information is artificially added to a 2D image to generate a 3D image, has attracted attention.
 2D/3D変換を実現する手法として、例えば、特許文献1に示す手法が知られている。この特許文献1には、基本となる3種類の画像の奥行値を示す基本奥行モデルを備え、入力画像のパターンによって、3種類の基本奥行モデルの合成比を変えて、入力画像の奥行モデルを生成し、生成した奥行モデルと入力画像とから、左眼/右眼へ提示する画像を生成する立体画像生成装置が開示されている。 As a technique for realizing 2D / 3D conversion, for example, a technique disclosed in Patent Document 1 is known. This patent document 1 is provided with a basic depth model indicating depth values of three basic images, and the input image depth model is changed by changing the composition ratio of the three basic depth models according to the pattern of the input image. A stereoscopic image generation apparatus that generates an image to be presented to the left eye / right eye from the generated depth model and an input image is disclosed.
特許第4214976号明細書(特開2005-151534号公報)Japanese Patent No. 4214976 (Japanese Patent Laid-Open No. 2005-151534)
 しかしながら、特許文献1に開示されている立体画像生成装置では、想定した基本奥行モデルに合致しない画像の奥行モデルを表現することが困難である。例えば、画面外に消失点がある場合、3種類の基本奥行モデルの合成比を変更しても表現することができないという問題がある。このため、自然な奥行感のある立体映像を生成することができなかった。 However, in the stereoscopic image generating device disclosed in Patent Document 1, it is difficult to express a depth model of an image that does not match the assumed basic depth model. For example, when there is a vanishing point outside the screen, there is a problem that it cannot be expressed even if the composition ratio of the three basic depth models is changed. For this reason, it was not possible to generate a stereoscopic image with a natural sense of depth.
 本発明は、上述のような問題点を解決するためになされたものであって、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成し、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度に基づいて画像の奥行モデルを生成することにより、より自然な奥行感のある立体画像を生成可能とする立体画像生成装置、立体画像生成方法、プログラム、及び記録媒体を提供することを目的とする。 The present invention was made to solve the above-described problems, and when the vanishing point position can be estimated by a geometric depth cue, a depth model of the image is generated based on the vanishing point, If the vanishing point position cannot be estimated due to a geometric depth cue, a more natural depth sensation can be obtained by generating a depth model of the image based on the degree of saliency representing the attractiveness in the image based on human visual characteristics. It is an object of the present invention to provide a stereoscopic image generating apparatus, a stereoscopic image generating method, a program, and a recording medium that can generate a certain stereoscopic image.
 上記課題を解決するために、本発明の第1の技術手段は、2D画像に両眼立体情報を付加し、3D画像を生成する立体画像生成装置であって、処理対象画像から消失点を推定する消失点推定手段と、該消失点推定手段により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成手段と、該奥行モデル生成手段により生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成手段とを備え、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴としたものである。 In order to solve the above-described problem, a first technical means of the present invention is a stereoscopic image generation apparatus that generates binocular stereoscopic information by adding binocular stereoscopic information to a 2D image, and estimates a vanishing point from the processing target image. Vanishing point estimating means, depth model generating means for generating a different depth model based on whether the vanishing point can be estimated by the vanishing point estimating means, the depth model generated by the depth model generating means, and the processing target Based on the image and the assumed viewing condition information, a viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image is provided, and the depth model generation unit can estimate the vanishing point by the vanishing point estimation unit. A depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated by the vanishing point estimating means, a depth model is generated based on the saliency of each pixel in the processing target image. It is obtained by and generating a.
 第2の技術手段は、第1の技術手段において、処理対象画像から所定の画像サイズの縮小画像を生成する縮小画像生成手段を備え、前記縮小画像を、前記消失点推定手段と前記奥行モデル生成手段の入力とし、該奥行モデル生成手段により生成した前記縮小画像の奥行モデルから前記処理対象画像と同一画像サイズの拡大奥行モデルを生成する拡大奥行モデル生成手段を備えることを特徴としたものである。 The second technical means includes, in the first technical means, reduced image generation means for generating a reduced image having a predetermined image size from the processing target image, and the reduced image is converted into the vanishing point estimation means and the depth model generation. And an enlarged depth model generating means for generating an enlarged depth model having the same image size as the processing target image from the depth model of the reduced image generated by the depth model generating means. .
 第3の技術手段は、第1の技術手段において、前記奥行モデル生成手段により生成した処理対象画像の奥行モデルを空間方向に平滑化し、該空間方向に平滑化された前記処理対象画像の奥行モデルと、該処理対象画像よりも過去の比較対象画像の時空間方向に平滑化された奥行モデルとに基づいて、前記処理対象画像の奥行モデルを時間方向に平滑化し、前記処理対象画像の時空間方向に平滑化された奥行モデルを生成する時空間方向平滑化手段を備えることを特徴としたものである。 According to a third technical means, in the first technical means, the depth model of the processing target image generated by the depth model generation means is smoothed in the spatial direction, and the depth model of the processing target image smoothed in the spatial direction is used. And the depth model smoothed in the spatio-temporal direction of the comparison target image in the past than the processing target image, the depth model of the processing target image is smoothed in the time direction, and the spatio-temporal of the processing target image A spatio-temporal direction smoothing means for generating a depth model smoothed in the direction is provided.
 第4の技術手段は、第1~第3のいずれか1の技術手段において、前記想定視聴条件情報は、前記3D画像を表示するディスプレイの画素ピッチ、該ディスプレイの画像サイズ、視聴者から前記ディスプレイまでの距離、前記3D画像の奥行量を表す視差範囲、左右の仮想視点間の距離である基線長を含むことを特徴としたものである。 In a fourth technical means according to any one of the first to third technical means, the assumed viewing condition information includes a pixel pitch of a display for displaying the 3D image, an image size of the display, a viewer to the display And a parallax range representing the depth of the 3D image, and a baseline length that is a distance between the left and right virtual viewpoints.
 第5の技術手段は、第1~第4のいずれか1の技術手段において、前記処理対象画像内の各画素の顕著度は、注目画素とその周辺画素との色差が大きい箇所、あるいは、注目画素と画像全体との色差が大きい箇所、あるいは、注目画素を含む局所領域とその周辺領域との色差が大きい箇所ほど高く算出されることを特徴としたものである。 According to a fifth technical means, in any one of the first to fourth technical means, the saliency of each pixel in the processing target image is a point where the color difference between the target pixel and its peripheral pixels is large, or the target The higher the color difference between the pixel and the whole image, or the greater the color difference between the local region including the target pixel and its surrounding region, the higher the calculation.
 第6の技術手段は、第5の技術手段において、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度が高い箇所が手前側になるように奥行モデルを生成することを特徴としたものである。 A sixth technical means is the fifth technical means, wherein when the vanishing point cannot be estimated by the vanishing point estimating means, the depth model generating means has a high saliency of each pixel in the processing target image. The depth model is generated so that is on the near side.
 第7の技術手段は、第1~第6のいずれか1の技術手段において、前記消失点推定手段は、前記処理対象画像内の直線情報から該処理対象画像の消失点を推定するフレーム内消失点推定手段と、前記処理対象画像と該処理対象画像よりも過去の比較対象画像と該比較対象画像における消失点の位置とに基づいて、前記処理対象画像の消失点を推定するフレーム間消失点推定手段とを備えたことを特徴としたものである。 A seventh technical means is any one of the first to sixth technical means, wherein the vanishing point estimating means estimates the vanishing point of the processing target image from the straight line information in the processing target image. An inter-frame vanishing point that estimates a vanishing point of the processing target image based on a point estimation unit, the processing target image, a comparison target image that is earlier than the processing target image, and a vanishing point position in the comparison target image; And an estimation means.
 第8の技術手段は、第7の技術手段において、前記処理対象画像と前記比較対象画像との間でシーンチェンジがあったか否かを検出するシーンチェンジ検出手段を備え、該シーンチェンジ検出手段によりシーンチェンジが検出された場合、前記フレーム内消失点推定手段が選択され、前記シーンチェンジ検出手段によりシーンチェンジが検出されない場合、前記フレーム間消失点推定手段が選択されることを特徴としたものである。 The eighth technical means comprises scene change detection means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image in the seventh technical means, and the scene change detection means When a change is detected, the intra-frame vanishing point estimating means is selected, and when no scene change is detected by the scene change detecting means, the inter-frame vanishing point estimating means is selected. .
 第9の技術手段は、第8の技術手段において、前記比較対象画像の消失点の位置を含む消失点情報を記憶する記憶手段を備え、該記憶手段に前記比較対象画像の消失点情報が記憶されている場合、前記フレーム間消失点推定手段が選択され、前記記憶手段に前記比較対象画像の消失点情報が記憶されていない場合、前記フレーム内消失点推定手段が選択されることを特徴としたものである。 A ninth technical means includes, in the eighth technical means, storage means for storing vanishing point information including the position of the vanishing point of the comparison target image, and the vanishing point information of the comparison target image is stored in the storage means. The inter-frame vanishing point estimating means is selected, and when the vanishing point information of the comparison target image is not stored in the storage means, the intra-frame vanishing point estimating means is selected. It is what.
 第10の技術手段は、第7~第9のいずれか1の技術手段において、前記比較対象画像は、前記処理対象画像の1つ前の画像であることを特徴としたものである。 The tenth technical means is any one of the seventh to ninth technical means, wherein the comparison target image is an image immediately before the processing target image.
 第11の技術手段は、2D画像に両眼立体情報を付加し、3D画像を生成する立体画像生成装置による立体画像生成方法であって、前記立体画像生成装置が、処理対象画像から消失点を推定する消失点推定ステップと、該消失点推定ステップにて消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成ステップと、該奥行モデル生成ステップにて生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成ステップとを備え、前記奥行モデル生成ステップは、前記消失点推定ステップにて消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定ステップにて消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴としたものである。 An eleventh technical means is a stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image, and the stereoscopic image generation apparatus detects a vanishing point from a processing target image. A vanishing point estimating step, a depth model generating step for generating a different depth model based on whether or not the vanishing point can be estimated in the vanishing point estimating step, and a depth model generated in the depth model generating step; A viewpoint image generation step of generating a right eye presentation image and a left eye presentation image based on the processing target image and the assumed viewing condition information, and the depth model generation step includes a vanishing point in the vanishing point estimation step. Can be estimated, a depth model is generated based on the vanishing point, and when the vanishing point cannot be estimated in the vanishing point estimating step, It is obtained by and generating a depth model based on the saliency of each pixel in the.
 第12の技術手段は、コンピュータに、第11の技術手段における立体画像生成方法を実行させるためのプログラムである。 The twelfth technical means is a program for causing a computer to execute the stereoscopic image generating method in the eleventh technical means.
 第13の技術手段は、第12の技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The thirteenth technical means is a computer-readable recording medium recording the program in the twelfth technical means.
 本発明によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成することにより、幾何的な奥行手掛かりによる奥行感を強調した立体画像を生成することができる。
 また、本発明によれば、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを生成することにより、人の注目する部分の奥行感を強調した立体画像を生成することができる。
According to the present invention, when the vanishing point position can be estimated by the geometric depth cue, a stereoscopic image in which the depth feeling by the geometric depth cue is emphasized is generated by generating the image depth model based on the vanishing point. Can be generated.
Further, according to the present invention, when the vanishing point position cannot be estimated by the geometric depth cue, by generating the image depth model from the saliency representing the attractiveness in the image based on the human visual characteristics, It is possible to generate a stereoscopic image in which the sense of depth of a portion that is noticed by a person is emphasized.
本発明の実施形態に係る立体画像生成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus which concerns on embodiment of this invention. 輝度ヒストグラムに基づくシーンチェンジ検出の概略図である。It is the schematic of the scene change detection based on a brightness | luminance histogram. 本発明の実施形態に係るシーンチェンジ検出部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the scene change detection part which concerns on embodiment of this invention. 本発明の実施形態に係るシーンチェンジ検出部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the scene change detection part which concerns on embodiment of this invention. 本発明の実施形態に係る消失点推定部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the vanishing point estimation part which concerns on embodiment of this invention. 本発明の実施形態に係る消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part which concerns on embodiment of this invention. 本発明の実施形態に係るフレーム内消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part in a flame | frame which concerns on embodiment of this invention. 図8のフローに対応した画像の一例を示す図である。It is a figure which shows an example of the image corresponding to the flow of FIG. ハフ変換による直線検出を説明するための概略図である。It is the schematic for demonstrating the straight line detection by Hough transform. 本発明の実施形態に係るフレーム間消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part between frames which concerns on embodiment of this invention. 図11のフローに対応した画像の一例を示す図である。It is a figure which shows an example of the image corresponding to the flow of FIG. 同一シーン内において、フレーム内消失点推定手段、フレーム間消失点推定手段の適用される範囲の一例を示す図である。It is a figure which shows an example of the range to which an intra-frame vanishing point estimation means and an inter-frame vanishing point estimation means are applied in the same scene. 本発明の実施形態に係る奥行モデル生成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る奥行モデル生成部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る画面内に消失点がある場合の基本奥行モデルの一例を示す図である。It is a figure which shows an example of a basic depth model in case there exists a vanishing point in the screen which concerns on embodiment of this invention. 本発明の実施形態に係る画面外に消失点がある場合の基本奥行モデルの一例を示す図である。It is a figure which shows an example of the basic depth model in case there exists a vanishing point outside the screen which concerns on embodiment of this invention. 本発明の実施形態に係る消失点に基づいた奥行モデルを求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model based on the vanishing point which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデルを求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model based on the saliency which concerns on embodiment of this invention. 視点画像を生成するためのカメラ(視点)配置の俯瞰図である。It is a bird's-eye view of camera (viewpoint) arrangement for generating a viewpoint image. 本発明の実施形態に係る視点画像生成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the viewpoint image generation part which concerns on embodiment of this invention. 本発明の実施形態に係る視点画像生成部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the viewpoint image generation part which concerns on embodiment of this invention. 本発明の実施形態に係る視点画像を生成する過程の一例を示す図である。It is a figure which shows an example of the process which produces | generates the viewpoint image which concerns on embodiment of this invention. 交差方向、及び開散方向の視差ベクトルを示す図である。It is a figure which shows the parallax vector of a cross direction and a spreading | diffusion direction. 本発明の実施形態に係る奥行モデル生成部の変形例を示すブロック図である。It is a block diagram which shows the modification of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る奥行モデル生成部の変形例における顕著度に基づく奥行モデル作成手段の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the depth model production | generation means based on the saliency in the modification of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデル(変形例)を求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model (modification) based on the saliency which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデル(変形例)において、基準となる奥行モデルの変形例の一例と対応する奥行モデルの一例を示す図である。It is a figure which shows an example of the depth model corresponding to an example of the modification of the depth model used as a reference | standard in the depth model (modification) based on the saliency which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置(第一の変形例)の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus (1st modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置(第の一変形例)のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus (1st modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置(第二の変形例)の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus (2nd modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置(第二の変形例)のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus (2nd modification) which concerns on embodiment of this invention. 本発明の実施形態に係る時空間方向平滑化部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the spatio-temporal direction smoothing part which concerns on embodiment of this invention. 本発明の実施形態に係る時空間方向平滑化部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the spatiotemporal direction smoothing part which concerns on embodiment of this invention.
 以下、図面を参照しながら本発明の実施形態について詳しく説明する。なお、図面において同じ機能を有する部分については同じ符号を付し、繰り返しの説明は省略する。
 図1は、本発明に係る立体画像生成装置の概略構成例を示すブロック図である。図中、1は立体画像生成装置を示す。立体画像生成装置1は、シーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、及び視点画像生成部40を備えている。また、図2は、本発明に係る立体画像生成装置1のフレーム単位の動作例を説明するためのフロー図である。
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, portions having the same function are denoted by the same reference numerals, and repeated description is omitted.
FIG. 1 is a block diagram illustrating a schematic configuration example of a stereoscopic image generating apparatus according to the present invention. In the figure, reference numeral 1 denotes a stereoscopic image generating apparatus. The stereoscopic image generation apparatus 1 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, and a viewpoint image generation unit 40. FIG. 2 is a flowchart for explaining an operation example in units of frames of the stereoscopic image generating apparatus 1 according to the present invention.
 図2において、まず、図1の立体画像生成装置1は、入力された時刻tの画像(以降、処理対象画像F(t)ともいう)をシーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、及び視点画像生成部40へ出力する(図2のステップS11)。 2, first, the stereoscopic image generation apparatus 1 in FIG. 1 uses an input image at time t (hereinafter also referred to as a processing target image F (t)) as a scene change detection unit 10, a vanishing point estimation unit 20, a depth. It outputs to the model production | generation part 30 and the viewpoint image production | generation part 40 (step S11 of FIG. 2).
(シーンチェンジ検出部10について)
 図1のシーンチェンジ検出部10は、本発明のシーンチェンジ検出手段に相当し、入力された処理対象画像F(t)と、処理対象画像F(t)より一つ前に入力された画像(以降、比較対象画像F(t-1)ともいう)から所定の画像特徴量を算出し、算出した画像特徴量の類似度を比較して、時系列に連続する画像の区分点(シーンチェンジ)を検出し、処理対象画像F(t)においてシーンチェンジの有無を表すシーンチェンジ情報S(t)を消失点推定部20、および奥行モデル生成部30へ出力する(図2のステップS12)。ここで、所定の画像特徴量の一例として、画像の輝度値の出現頻度を表す輝度ヒストグラムに基づくシーンチェンジ検出について図3~図5に基づき説明する。
(About the scene change detection unit 10)
The scene change detection unit 10 in FIG. 1 corresponds to the scene change detection means of the present invention, and the input processing target image F (t) and the image input immediately before the processing target image F (t) ( Hereinafter, a predetermined image feature amount is calculated from the comparison target image F (t-1)), the similarity of the calculated image feature amount is compared, and the segment points (scene change) of the images that are continuous in time series , And scene change information S (t) indicating the presence or absence of a scene change in the processing target image F (t) is output to the vanishing point estimation unit 20 and the depth model generation unit 30 (step S12 in FIG. 2). Here, as an example of a predetermined image feature amount, scene change detection based on a luminance histogram representing the appearance frequency of the luminance value of an image will be described with reference to FIGS.
 図3に示すように、輝度ヒストグラムに基づくシーンチェンジ検出は、処理対象画像F(t)と比較対象画像F(t-1)からそれぞれの輝度ヒストグラムH(t)とH(t-1)を算出し、算出した輝度ヒストグラムの類似度d(H(t), H(t-1))と所定の閾値とを比較して、処理対象画像F(t)においてシーンチェンジの有無を判定するというものである。 As shown in FIG. 3, the scene change detection based on the luminance histogram is performed based on the luminance histograms H L (t) and H L (t−1) from the processing target image F (t) and the comparison target image F (t−1). ) And the similarity d (H L (t), H L (t−1)) of the calculated luminance histogram is compared with a predetermined threshold value to determine whether or not there is a scene change in the processing target image F (t) Is to determine.
 図4に示すように、シーンチェンジ検出部10は、輝度ヒストグラム生成部101、バッファ102、ヒストグラム類似度算出部103、およびシーンチェンジ判定部104で構成されている。図5は、シーンチェンジ検出部10の動作例を説明するためのフロー図である。図5において、図4の輝度ヒストグラム生成部101は、入力された処理対象画像F(t)から輝度情報を取得し、取得した輝度情報から輝度値の出現頻度を表す輝度ヒストグラムH(t)を算出し、その算出結果(輝度ヒストグラムH(t))をバッファ102、ヒストグラム類似度算出部103へ出力する(図5のステップS21)。 As shown in FIG. 4, the scene change detection unit 10 includes a luminance histogram generation unit 101, a buffer 102, a histogram similarity calculation unit 103, and a scene change determination unit 104. FIG. 5 is a flowchart for explaining an operation example of the scene change detection unit 10. In FIG. 5, the luminance histogram generation unit 101 in FIG. 4 acquires luminance information from the input processing target image F (t), and a luminance histogram H L (t) that represents the appearance frequency of the luminance value from the acquired luminance information. And the calculation result (luminance histogram H L (t)) is output to the buffer 102 and the histogram similarity calculation unit 103 (step S21 in FIG. 5).
 図4のバッファ102は、処理対象画像F(t)の1つ後の画像F(t+1)におけるシーンチェンジ検出のために、処理対象画像F(t)の輝度ヒストグラムH(t)を記憶する(図5のステップS22)。図4のヒストグラム類似度算出部103は、入力された処理対象画像F(t)の輝度ヒストグラムH(t)と、バッファ102より読みだした比較対象画像F(t-1)の輝度ヒストグラムH(t-1)から類似度d(H(t), H(t-1))を式(1)により算出し、その算出結果をシーンチェンジ判定部104へ出力する(図5のステップS23)。 The buffer 102 in FIG. 4 stores the luminance histogram H L (t) of the processing target image F (t) in order to detect a scene change in the image F (t + 1) immediately after the processing target image F (t). (Step S22 in FIG. 5). The histogram similarity calculation unit 103 in FIG. 4 inputs the luminance histogram H L (t) of the input processing target image F (t) and the luminance histogram H of the comparison target image F (t−1) read from the buffer 102. The similarity d (H L (t), H L (t−1)) is calculated from L (t−1) by the equation (1), and the calculation result is output to the scene change determination unit 104 (FIG. 5). Step S23).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ここで、式(1)において、Wは画像の1ライン毎のピクセル数を表し、Hは画像のライン数を表し、vは輝度値を表し、Vは輝度値の階調数を表し、H(v|t)は時刻tにおける画像F(t)上の輝度値vの出現頻度を表す。また、式(1)において、類似度d(H(t), H(t-1))のとる値の範囲は0~2となり、値が0に近いほどヒストグラムの形状が似ており、値が2に近いほどヒストグラムの形状が異なることを表す。 Here, in Expression (1), W represents the number of pixels per line of the image, H represents the number of lines of the image, v represents the luminance value, V represents the number of gradations of the luminance value, and H L (v | t) represents the appearance frequency of the luminance value v on the image F (t) at time t. In equation (1), the range of values taken by the similarity d (H L (t), H L (t−1)) is 0 to 2, and the closer the value is to 0, the more similar the shape of the histogram is. The closer the value is to 2, the more different the shape of the histogram is.
 図4のシーンチェンジ判定部104は、入力されたヒストグラムの類似度d(H(t), H(t-1))と、所定の閾値dthとで閾値判定を行い、式(2)により処理対象画像F(t)においてシーンチェンジの有無を表すシーンチェンジ情報S(t)を設定し、外部へ出力する(図5のステップS24)。 The scene change determination unit 104 in FIG. 4 performs threshold determination based on the similarity d (H L (t), H L (t−1)) of the input histogram and a predetermined threshold d th , and the expression (2 ), Scene change information S (t) indicating the presence / absence of a scene change in the processing target image F (t) is set and output to the outside (step S24 in FIG. 5).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 つまり、シーンチェンジ判定部104は、類似度d(H(t), H(t-1))が閾値dthより小さい場合、シーンチェンジが無いと判定し、シーンチェンジ情報S(t)に「0」を設定する。それ以外の場合は、シーンチェンジが有ると判定し、シーンチェンジ情報S(t)に「1」を設定する。 That is, when the similarity d (H L (t), H L (t−1)) is smaller than the threshold value d th , the scene change determination unit 104 determines that there is no scene change, and the scene change information S (t) Set “0” to. In other cases, it is determined that there is a scene change, and “1” is set in the scene change information S (t).
 以上、シーンチェンジ検出部10によれば、処理対象画像F(t)と、比較対象画像F(t-1)から所定の画像特徴量を算出し、算出した画像特徴量の類似度を比較することで、時系列に連続する画像の区分点(シーンチェンジ)を検出することができる。 As described above, according to the scene change detection unit 10, the predetermined image feature amount is calculated from the processing target image F (t) and the comparison target image F (t-1), and the similarity of the calculated image feature amount is compared. Thus, it is possible to detect segment points (scene changes) of images that are continuous in time series.
(消失点推定部20について)
 図1に戻って、消失点推定部20は、本発明の消失点推定手段に相当し、入力された処理対象画像F(t)のシーンチェンジ情報S(t)と、消失点推定部20の内部で記憶している一つ前の消失点情報VP(t-1)に基づいて消失点の推定手段(画像内の直線から消失点位置を推定するフレーム内消失点推定手段、画像間の特徴点の対応関係と前フレームの消失点位置から現フレームにおける消失点位置を推定するフレーム間消失点推定手段)を選択し、選択した消失点推定手段により入力された処理対象画像F(t)から消失点の位置を推定して、その結果を記述した消失点情報VP(t)を奥行モデル生成部30へ出力する(図2のステップS13)。ここで「消失点」とは、3次元空間において平行な2直線を平面に射影(投影)すると、それらの線が必ず1点に収束する点のことである。
(About the vanishing point estimation unit 20)
Returning to FIG. 1, the vanishing point estimation unit 20 corresponds to the vanishing point estimation unit of the present invention, and the input scene change information S (t) of the processing target image F (t) and the vanishing point estimation unit 20. Vanishing point estimating means (intraframe vanishing point estimating means for estimating vanishing point position from straight line in image, feature between images based on previous vanishing point information VP (t−1) stored internally An inter-frame vanishing point estimating means for estimating the vanishing point position in the current frame from the point correspondence and the vanishing point position of the previous frame) is selected, and the processing target image F (t) input by the selected vanishing point estimating means is selected. The vanishing point position is estimated, and vanishing point information VP (t) describing the result is output to the depth model generating unit 30 (step S13 in FIG. 2). Here, the “vanishing point” is a point where, when two parallel lines in a three-dimensional space are projected (projected) onto a plane, those lines always converge to one point.
 続いて、本実施形態における消失点推定部20について詳細に説明する。図6に示すように、消失点推定部20は、切替部201,切替部202、フレーム内消失点推定部21、フレーム間消失点推定部22、バッファ203、およびバッファ204で構成されている。また、図6のフレーム内消失点推定部21は、エッジ検出部211、直線検出部212、および消失点同定部213で構成されている。このフレーム内消失点推定部21は、本発明のフレーム内消失点推定手段に相当し、処理対象画像F(t)内の直線情報から処理対象画像F(t)の消失点を推定する。また、図6のフレーム間消失点推定部22は、特徴点検出部221、対応点算出部222、変換行列算出部223、および消失点位置算出部224で構成されている。このフレーム間消失点推定部22は、本発明のフレーム間消失点推定手段に相当し、処理対象画像F(t)と処理対象画像F(t)よりも過去の比較対象画像F(t-1)と比較対象画像F(t-1)における消失点の位置とに基づいて、処理対象画像F(t)の消失点を推定する。図7は、消失点推定部20の動作例を説明するためのフロー図である。 Subsequently, the vanishing point estimation unit 20 in the present embodiment will be described in detail. As illustrated in FIG. 6, the vanishing point estimation unit 20 includes a switching unit 201, a switching unit 202, an intraframe vanishing point estimation unit 21, an interframe vanishing point estimation unit 22, a buffer 203, and a buffer 204. The intra-frame vanishing point estimation unit 21 in FIG. 6 includes an edge detection unit 211, a straight line detection unit 212, and a vanishing point identification unit 213. This intra-frame vanishing point estimation unit 21 corresponds to the intra-frame vanishing point estimation means of the present invention, and estimates the vanishing point of the processing target image F (t) from the straight line information in the processing target image F (t). 6 includes a feature point detection unit 221, a corresponding point calculation unit 222, a transformation matrix calculation unit 223, and a vanishing point position calculation unit 224. This inter-frame vanishing point estimation unit 22 corresponds to the inter-frame vanishing point estimation means of the present invention, and is compared with the processing target image F (t) and the processing target image F (t) in the past compared image F (t−1). ) And the position of the vanishing point in the comparison target image F (t−1), the vanishing point of the processing target image F (t) is estimated. FIG. 7 is a flowchart for explaining an operation example of the vanishing point estimation unit 20.
 図7において、図6の消失点推定部20は、入力されたシーンチェンジ情報S(t)、およびバッファ204より読み出した一つ前の消失点情報VP(t-1)に基づいて消失点推定手段を選択する(図7のステップS31)。具体的には、シーンチェンジが有る場合(「S(t)=1」)、もしくは、消失点情報VP(t-1)が、前フレームに消失点が無いことを示す場合、つまり、消失点情報VP(t-1)が「vp_num=0」の場合(図7のステップS31においてYes)、図6の切替部201は画像の入力先を、図6の切替部202は消失点情報の出力元を、フレーム内消失点推定部21へそれぞれ切り替え(図7のステップS32)、その後、フレーム内消失点推定部21は、画像内の直線から消失点位置を推定し、その結果(消失点情報VP(t))を出力する(図7のステップS33)。 In FIG. 7, the vanishing point estimation unit 20 in FIG. 6 estimates vanishing points based on the input scene change information S (t) and the previous vanishing point information VP (t−1) read from the buffer 204. A means is selected (step S31 in FIG. 7). Specifically, when there is a scene change (“S (t) = 1”), or when the vanishing point information VP (t−1) indicates that there is no vanishing point in the previous frame, that is, the vanishing point. When the information VP (t−1) is “vp_num = 0” (Yes in step S31 in FIG. 7), the switching unit 201 in FIG. 6 outputs an image input destination, and the switching unit 202 in FIG. 6 outputs vanishing point information. The original is switched to the intra-frame vanishing point estimation unit 21 (step S32 in FIG. 7), and then the intra-frame vanishing point estimation unit 21 estimates the vanishing point position from the straight line in the image, and the result (vanishing point information) VP (t)) is output (step S33 in FIG. 7).
(フレーム内消失点推定部21について)
 ここで、フレーム内消失点推定部21について詳細に説明する。図8は、フレーム内消失点推定部21の動作例を説明するためのフロー図である。図9は、図8のフローに対応した画像例を示す図である。図8のステップS331において、図6のエッジ検出部211は、入力された処理対象画像F(t)(図9(A)を参照)から直線検出に用いるエッジ点情報Edge(t)を算出する。具体的には、まず、色成分(例えば、RGB(Red(赤)、Green(緑)、Blue(青)))毎に微分オペレータを適用し、x方向、y方向における各色成分iの勾配ベクトルG(x,y|t)=(ΔGix(t), ΔGiy(t))(i=1,2,3)を算出する。なお、i=1、2、3は、それぞれ、R成分、G成分、B成分である。
(Intraframe vanishing point estimation unit 21)
Here, the intra-frame vanishing point estimation unit 21 will be described in detail. FIG. 8 is a flowchart for explaining an operation example of the intra-frame vanishing point estimation unit 21. FIG. 9 is a diagram illustrating an image example corresponding to the flow of FIG. In step S331 in FIG. 8, the edge detection unit 211 in FIG. 6 calculates edge point information Edge (t) used for straight line detection from the input processing target image F (t) (see FIG. 9A). . Specifically, first, a differential operator is applied to each color component (for example, RGB (Red (red), Green (green), Blue (blue))), and a gradient vector of each color component i in the x direction and the y direction. G i (x, y | t) = (ΔG ix (t), ΔG iy (t)) (i = 1, 2, 3) is calculated. Note that i = 1, 2, and 3 are an R component, a G component, and a B component, respectively.
 続いて、エッジ検出部211は、式(3)の演算を、座標(x,y)の画素毎に行うことで、エッジ強度E(x,y|t)を算出する。 Subsequently, the edge detection unit 211 calculates the edge strength E (x, y | t) by performing the calculation of Expression (3) for each pixel of the coordinates (x, y).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 続いて、エッジ検出部211は、式(4)の演算を、座標(x,y)の画素毎に行うことで、エッジ強度E(x,y|t)から局所的にエッジ強度が極大値となる座標をエッジ点として抽出し、その結果を記述したエッジ点情報Edge(t)を直線検出部212へ出力する。 Subsequently, the edge detection unit 211 performs the calculation of Expression (4) for each pixel of the coordinates (x, y), so that the edge strength is locally maximum from the edge strength E (x, y | t). Are extracted as edge points, and edge point information Edge (t) describing the result is output to the straight line detection unit 212.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 つまり、エッジ検出部211は、座標(x,y)を中心とした窓サイズW1×W2の範囲内で、エッジ強度E(x,y|t)が極大値となる場合(Local Maxima)、エッジ点Edge(x,y|t)に「1」、それ以外はエッジ点Edge(x,y|t)に「0」を設定する。ここで、W1はx方向の窓のサイズ、W2はy方向の窓のサイズを表す。 In other words, the edge detection unit 211 detects an edge when the edge strength E (x, y | t) has a maximum value (Local Maxima) within the range of the window size W1 × W2 with the coordinates (x, y) as the center. “1” is set to the point Edge (x, y | t), and “0” is set to the edge point Edge (x, y | t) otherwise. Here, W1 represents the size of the window in the x direction, and W2 represents the size of the window in the y direction.
 図8のステップS332に進んで、図6の直線検出部212は、入力されたエッジ点情報Edge(t)にハフ変換を適用して直線情報L(t)(図9(B)を参照)を取得する。ここで、ハフ変換による直線検出に関して図10に基づいて説明する。なお、図10(A)において、ある直線L上にある特徴点A,B,Cを、エッジ検出部211において得られた「Edge(x,y|t)=1」となるエッジ点とする。まず、ハフ変換では、図10(A)に示す画像空間上の直線Lを極座標表現(ρ、θ)を用いて表現する。ρは画像空間の原点から直線Lへ引いた垂線の距離を表し、θはその垂線が画像空間のx軸となす角度を表す。なお、ρの範囲はρ≧0であり、θの範囲は0≦θ<2πである。 Proceeding to step S332 in FIG. 8, the straight line detection unit 212 in FIG. 6 applies straight line information L (t) to the input edge point information Edge (t) by applying Hough transform (see FIG. 9B). To get. Here, the straight line detection by the Hough transform will be described with reference to FIG. In FIG. 10A, feature points A, B, and C on a certain straight line L are edge points that become “Edge (x, y | t) = 1” obtained by the edge detection unit 211. . First, in the Hough transform, a straight line L on the image space shown in FIG. 10A is expressed using polar coordinate expressions (ρ, θ). ρ represents the distance of a perpendicular drawn from the origin of the image space to the straight line L, and θ represents the angle formed by the perpendicular to the x axis of the image space. The range of ρ is ρ ≧ 0, and the range of θ is 0 ≦ θ <2π.
 図10(A)の画像空間上にある特徴点A,B,Cを通過する直線Lは、パラメータ(ρ00)を用いて式(5)によって表される。 A straight line L that passes through the feature points A, B, and C on the image space of FIG. 10A is expressed by Expression (5) using parameters (ρ 0 , θ 0 ).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 また、特徴点A,B,Cをそれぞれ通過する直線群は、パラメータ空間へ写像すると、図10(B)においてパラメータ空間上の曲線a,曲線b,曲線cとして表現される。つまり、パラメータ空間上で、曲線a、曲線b、曲線cの交点(ρ00)が、特徴点A,B,Cを通過する直線Lとして検出される。 Further, when a straight line group passing through each of the feature points A, B, and C is mapped to the parameter space, it is represented as a curve a, a curve b, and a curve c on the parameter space in FIG. 10B. That is, the intersection (ρ 0 , θ 0 ) of the curve a, the curve b, and the curve c is detected as a straight line L passing through the feature points A, B, and C on the parameter space.
 以上のように「Edge(x,y|t)=1」となるエッジ点に対して、ハフ変換を適用し、パラメータ空間上で所定の閾値以上の曲線が交差し、かつ交差数の多い順にNL個の極座標(ρs,θs)(s=1,・・・,NL)を直線Lsとして抽出し、その極座標(ρs,θs)を記述したデータを直線情報L(t)とする。なお、NLの範囲は0≦NL≦NLmaxである。ここで、NLmaxは抽出する直線数の上限値を表す所定の定数である。また、上記直線抽出の条件を満たさず、直線が検出されない場合は、NL=0となる。 As described above, the Hough transform is applied to the edge point where “Edge (x, y | t) = 1”, and the curves having a predetermined threshold value or more intersect on the parameter space, and the number of intersections increases. NL polar coordinates (ρs, θs) (s = 1,..., NL) are extracted as a straight line Ls, and data describing the polar coordinates (ρs, θs) is set as straight line information L (t). The range of NL is 0 ≦ NL ≦ NL max . Here, NL max is a predetermined constant representing the upper limit value of the number of straight lines to be extracted. If the straight line extraction condition is not satisfied and no straight line is detected, NL = 0.
 再び図8に戻って、図6の消失点同定部213は、入力された直線情報L(t)から直線数NLを取得し、直線数NLと所定の閾値と大小関係を比較し、ステップS334~ステップS335の処理によって消失点の位置推定を行うか否かを決定する(ステップS333)。直線数NLが閾値ThL(≧2)より小さい場合(または以下の場合)(ステップS333においてNo)、消失点の推定に十分な幾何的な奥行手掛かりが無いと判定し、ステップS336へ進む。また、直線数NLが閾値ThL(≧2)以上の場合(または大きい場合)(ステップS333においてYes)、消失点同定部213は、消失点の推定に十分な幾何的な奥行手掛かりがあると判定し、入力された直線情報L(t)から、直線を表す角度θに関して式(6)、式(7)の条件を満たす直線Li(i=1,・・・, NL)と直線Lj (j=1,・・・, NL)を選び、その交点Pij(i≠j)を式(8)の行列演算によって算出し、交点情報を取得する(ステップS334)。なお、直線の交点を求める際に、一度選んだ直線Liと直線Lj同士の重複演算はしないものとする。なお、式(6)の条件は選んだ二直線が平行でないことを表し、式(7)の条件は水平方向(|θ-π|≒π/2)近傍、及び垂直方向(|θ-π|≒0、または|θ-π|≒π)近傍の直線でないことを表す。 Returning to FIG. 8 again, the vanishing point identifying unit 213 in FIG. 6 obtains the number of straight lines NL from the input straight line information L (t), compares the number of straight lines NL with a predetermined threshold value, and compares the magnitude of the step 334. In step S335, it is determined whether or not the vanishing point position is estimated (step S333). When the number of straight lines NL is smaller than the threshold ThL (≧ 2) (or in the following case) (No in step S333), it is determined that there is no geometric depth clue sufficient to estimate the vanishing point, and the process proceeds to step S336. When the number of straight lines NL is greater than or equal to the threshold ThL (≧ 2) (or larger) (Yes in step S333), the vanishing point identifying unit 213 determines that there is a sufficient geometric depth clue for vanishing point estimation. From the input straight line information L (t), the straight line Li (i = 1,..., NL) and the straight line Lj (j) satisfying the conditions of the expressions (6) and (7) with respect to the angle θ representing the straight line. = 1,..., NL), the intersection P ij (i ≠ j) is calculated by the matrix operation of the equation (8), and the intersection information is acquired (step S334). It should be noted that when the intersection of the straight lines is obtained, the overlap calculation between the straight line Li and the straight line Lj once selected is not performed. Note that the condition of Expression (6) represents that the selected two straight lines are not parallel, and the condition of Expression (7) is the horizontal direction (| θ−π | ≈π / 2) and the vertical direction (| θ−π | ≈0 or | θ−π | ≈π).
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 続いて、消失点同定部213は、取得した直線の交点Pijの分布モデルを、式(9)に示すKc個のガウス分布の混合モデルGMM(Gaussian Mixture Model)を用いて表されると仮定し、EM(Expectation-Maximization)アルゴリズムによって、分布モデルのパラメータ(wi,μi,Σi)(i=1,・・・,Kc)を取得し、消失点の位置を決定する(ステップS335)。 Subsequently, the vanishing point identifying unit 213 assumes that the distribution model of the acquired intersection point P ij of the straight line is expressed using a mixed model GMM (Gaussian Mixture Model) of Kc Gaussian distributions shown in Expression (9). Then, the parameters (wi, μi, Σi) (i = 1,..., Kc) of the distribution model are acquired by the EM (Expectation-Maximization) algorithm, and the position of the vanishing point is determined (step S335).
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 なお、式(9)において、P(x)は、ベクトルx(交点Pijの座標)が出現する確率を表す。Kcはクラス数(ガウス分布の個数)を表し、wiはクラスiのガウス分布の重み係数を表し、重み係数の総和は1となる。また、μiはクラスiの平均ベクトル(クラスiの重心座標)を表し、Σiはクラスiの共分散行列を表し、Dはベクトルxの次元数を表す。式(9)中のN(x|μi,Σi)は、クラスiのガウス分布(正規分布)を表し、平均ベクトルμi、共分散行列Σiを用いて表現される。つまり、消失点同定部213は、重み係数wiが大きい上位N(≧1)クラスの分布の平均ベクトルμi(重心座標)を、消失点位置と定める。以降、簡単化のため、消失点の数をN=1として説明するが、これに限定されるものではない。 In Equation (9), P (x) represents the probability that the vector x (coordinates of the intersection Pij) will appear. Kc represents the number of classes (number of Gaussian distributions), wi represents a weighting coefficient of class i Gaussian distribution, and the sum of the weighting coefficients is 1. Μi represents a class i average vector (class i barycentric coordinates), Σi represents a class i covariance matrix, and D represents the number of dimensions of the vector x. N (x | μi, Σi) in Expression (9) represents a Gaussian distribution (normal distribution) of class i, and is expressed using an average vector μi and a covariance matrix Σi. That is, the vanishing point identifying unit 213 determines the average vector μi (center of gravity coordinates) of the distribution of the top N (≧ 1) classes having a large weighting coefficient wi as the vanishing point position. Hereinafter, for the sake of simplicity, the number of vanishing points will be described as N = 1, but the present invention is not limited to this.
 続いて、消失点同定部213は、ステップS333またはステップS335の結果に基づいて、図9(C)に示すように消失点情報VP(t)を設定する(ステップS336)。なお、消失点情報VP(t)は、例えば、表1のデータとして表現される。 Subsequently, the vanishing point identifying unit 213 sets vanishing point information VP (t) as shown in FIG. 9C based on the result of step S333 or step S335 (step S336). The vanishing point information VP (t) is expressed as data in Table 1, for example.
Figure JPOXMLDOC01-appb-T000010
Figure JPOXMLDOC01-appb-T000010
 表1において、消失点情報VP(t)は、時刻t(又は、画像のフレームに付した番号(フレーム番号)でもよい)を表す「vp_time」、検出した消失点の数を表す「vp_num」、及び検出したn個の消失点の位置「vp_pos[n]」を表すリストによって示される。 In Table 1, vanishing point information VP (t) is “vp_time” indicating time t (or a number (frame number) assigned to a frame of an image), “vp_num” indicating the number of detected vanishing points, And a list representing the position “vp_pos [n]” of the n vanishing points detected.
 再び図7のステップS31に戻って、シーンチェンジが無く(「S(t)=0」)、かつ、消失点情報VP(t-1)が、前フレームに消失点が有ることを示す場合(消失点情報VP(t-1)の「vp_num>0」)(ステップS31においてNo)、図6の切替部201は画像の入力先を、図6の切替部202は消失点情報の出力元を、フレーム間消失点推定部22へそれぞれ切り替え(ステップS34)、その後、フレーム間消失点推定部22は、入力された処理対象画像F(t)とバッファ203で記憶した1つ前の画像F(t-1)から画像間の特徴点の対応関係を求め、その対応関係と、一つ前の消失点情報VP(t-1)より処理対象画像F(t)における消失点位置を推定し、その結果(消失点情報VP(t))を出力する(ステップS35)。すなわち、消失点推定部20は、比較対象画像F(t-1)の消失点の位置を含む消失点情報VP(t-1)を記憶する記憶手段(図6のバッファ204)を備え、前フレームに消失点が有るか否かは、この記憶手段に前フレームの消失点情報VP(t-1)が記憶されており、この消失点情報VP(t-1)が「vp_num>0」であるか否かで判定される。 Returning to step S31 in FIG. 7 again, when there is no scene change (“S (t) = 0”) and the vanishing point information VP (t−1) indicates that there is a vanishing point in the previous frame ( (“Vp_num> 0” of the vanishing point information VP (t−1)) (No in step S31), the switching unit 201 in FIG. 6 is the input destination of the image, and the switching unit 202 in FIG. 6 is the output source of the vanishing point information. Then, switching to the inter-frame vanishing point estimation unit 22 is performed (step S34), and then the inter-frame vanishing point estimation unit 22 and the input processing target image F (t) and the previous image F (t stored in the buffer 203 ( From t-1), the correspondence between feature points between images is obtained, and the vanishing point position in the processing target image F (t) is estimated from the correspondence and the previous vanishing point information VP (t-1). The result (vanishing point information VP (t)) is output (step Flop S35). That is, the vanishing point estimation unit 20 includes storage means (buffer 204 in FIG. 6) that stores vanishing point information VP (t−1) including the position of the vanishing point of the comparison target image F (t−1). Whether or not the frame has a vanishing point is determined by storing the vanishing point information VP (t−1) of the previous frame in this storage means, and the vanishing point information VP (t−1) is “vp_num> 0”. It is determined by whether or not there is.
(フレーム間消失点推定部22について)
 ここで、フレーム間消失点推定部22の詳細について説明する。図11は、フレーム間消失点推定部22の動作例を説明するためのフロー図である。また、図12は、図11のフローに対応した画像例を示す図である。図11のステップS351において、図6の特徴点検出部221は、図12(A)に示すように、入力された処理対象画像F(t)と一つ前の画像F(t-1)との画像間の対応関係を求めるために用いるNK個の特徴点Ks(s=1,・・・,NK)を検出し、その特徴点Ksの座標(xKs,t,yKs,t)を記述した特徴点情報K(t)を図6の対応点算出部222へ出力する(ステップS351)。
(About the inter-frame vanishing point estimation unit 22)
Here, details of the inter-frame vanishing point estimation unit 22 will be described. FIG. 11 is a flowchart for explaining an operation example of the inter-frame vanishing point estimation unit 22. FIG. 12 is a diagram illustrating an example of an image corresponding to the flow of FIG. In step S351 in FIG. 11, the feature point detection unit 221 in FIG. 6 receives the input processing target image F (t), the previous image F (t−1), as shown in FIG. NK feature points Ks (s = 1,..., NK) used for obtaining the correspondence between the images are detected, and the coordinates (x Ks, t , y Ks, t ) of the feature points Ks are detected. The described feature point information K (t) is output to the corresponding point calculation unit 222 in FIG. 6 (step S351).
 なお、特徴点とは、画素間の色や輝度の変化等に基づいて被写体のエッジの一部や頂点として抽出される点である。例えば、画素(x,y)を中心とした局所領域Sの範囲内のx方向、y方向の輝度の勾配ベクトルGi(x,y)(i=x,y)を用いて表される二次モーメント行列A(式(10))の第一固有値λ1、及び第二固有値λ2を求め、式(11)に示す条件を満たす画素(x,y)を特徴点として検出する。 A feature point is a point extracted as a part or vertex of an object's edge based on a change in color or luminance between pixels. For example, a secondary expressed using a gradient vector Gi (x, y) (i = x, y) of luminance in the x and y directions within the range of the local region S with the pixel (x, y) as the center. A first eigenvalue λ1 and a second eigenvalue λ2 of the moment matrix A (Expression (10)) are obtained, and a pixel (x, y) that satisfies the condition shown in Expression (11) is detected as a feature point.
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 つまり、二次モーメント行列Aの第一固有値λ1、及び第二固有値λ2のうち小さい方の固有値が所定の閾値λthより大きい(または以上)場合に特徴点とするものである(例えば、非特許文献1を参照)。なお、式(10)において係数w(u,v)は、画素(x,y)からx方向にu,y方向にvだけ離れた画素(x+u,y+v)に関する重み係数を表し、例えば、式(12)の条件を満たすように定めた、局所領域Sの範囲内の2次ガウス分布の値を正規化した値を用いる。 That is, a characteristic point is obtained when the smaller eigenvalue of the first eigenvalue λ1 and the second eigenvalue λ2 of the second moment matrix A is greater than (or greater than) the predetermined threshold λth (for example, non-patent literature). 1). In Equation (10), the coefficient w (u, v) represents a weighting coefficient for a pixel (x + u, y + v) that is separated from the pixel (x, y) by u in the x direction and v in the y direction. A value obtained by normalizing the value of the secondary Gaussian distribution within the range of the local region S determined so as to satisfy the condition (12) is used.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 図11のステップS352に進んで、図6の対応点算出部222は、図12(B)に示すように、入力された処理対象画像F(t)と、バッファ203より読み出した一つ前の画像F(t-1)と、ステップS351で取得した処理対象画像F(t)の特徴点情報K(t)とに基づいて、処理対象画像F(t)の各特徴点Ks(s=1,・・・,NK)が一つ前の画像F(t-1)上にある位置(xKs,t -1,yKs,t -1)をオプティカルフローにより算出し、その特徴点Ksの時刻t,時刻t-1における位置を記述した対応点情報Q(t,t-1)を図6の変換行列算出部223へ出力する(ステップS352)。 Proceeding to step S352 in FIG. 11, the corresponding point calculation unit 222 in FIG. 6, as shown in FIG. 12B, inputs the processing target image F (t) and the previous one read from the buffer 203. Based on the image F (t−1) and the feature point information K (t) of the processing target image F (t) acquired in step S351, each feature point Ks (s = 1) of the processing target image F (t). ,..., NK) is calculated by optical flow at a position (x Ks, t −1 , y Ks, t −1 ) where the previous image F (t−1) is on the image F (t−1). Corresponding point information Q (t, t−1) describing the positions at time t and time t−1 is output to the transformation matrix calculation unit 223 in FIG. 6 (step S352).
 なお、処理対象画像F(t)の各特徴点Ksが一つ前の画像F(t-1)上にある位置(xKs,t-1,yKs,t-1)は、例えば、式(13)に示す勾配法によるオプティカルフローの拘束条件を(xKs,t-1,yKs,t-1)について解くことで取得できる(例えば、非特許文献2を参照)。 The position (x Ks, t−1 , y Ks, t−1 ) at which each feature point Ks of the processing target image F (t) is on the previous image F ( t−1 ) is, for example, an expression It can be acquired by solving the constraint condition of the optical flow by the gradient method shown in (13) with respect to (x Ks, t−1 , y Ks, t−1 ) (for example, see Non-Patent Document 2).
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
 ここで、式(13)において、Gi(x,y|t)(i=x,y,t)は画像F(t)の輝度に関するx方向、y方向、t方向(時間方向)の勾配ベクトルを表し、Sは特徴点Ksを中心とする所定サイズの局所領域を表す。 Here, in Expression (13), Gi (x, y | t) (i = x, y, t) is a gradient vector in the x direction, y direction, and t direction (time direction) related to the luminance of the image F (t). S represents a local area of a predetermined size centered on the feature point Ks.
 図11のステップS353に進んで、図6の変換行列算出部223は、ステップS352で取得した対応点情報Q(t,t-1)から、特徴点Ks(s=1,・・・,NK)を一つ前の画像F(t-1)上の位置から、処理対象画像F(t)上の位置へ射影する変換行列Hを算出し、その変換行列Hを記述した情報を図6の消失点位置算出部224へ出力する(ステップS353)。なお、2枚の画像間(F(t),F(t-1))の対応関係は、変換行列Hを用いて式(14)で表すことができる(例えば、非特許文献3を参照)。この式(14)において、記号「~」は同値関係を表し、定数倍の違いを許して等しいことを意味する。 Proceeding to step S353 in FIG. 11, the transformation matrix calculation unit 223 in FIG. 6 uses the feature point Ks (s = 1,..., NK) from the corresponding point information Q (t, t−1) acquired in step S352. ) Is calculated from the position on the previous image F (t−1) to the position on the processing target image F (t), and information describing the conversion matrix H is shown in FIG. The data is output to the vanishing point position calculation unit 224 (step S353). Note that the correspondence between two images (F (t), F (t−1)) can be expressed by Expression (14) using the transformation matrix H (see, for example, Non-Patent Document 3). . In the equation (14), the symbol “˜” represents an equivalence relation, and means that they are equal by allowing a constant multiple difference.
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
 また、変換行列Hは、一般的な変換を表現することができるため、射影変換と呼ばれる。ここで、画像間の対応関係を平行移動として表現できると仮定すると、式(14)は、式(15)として表現される。 The transformation matrix H is called projective transformation because it can express general transformation. Here, if it is assumed that the correspondence between images can be expressed as parallel movement, Expression (14) is expressed as Expression (15).
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016
 式(15)中の係数tx、tyはそれぞれx方向、y方向への移動量を表す。また、画像間の対応関係を平行移動、回転、拡大・縮小を含めたアフィン変換として表現できると仮定すると、式(14)は、式(16)として表現される。 In the equation (15), coefficients tx and ty represent amounts of movement in the x and y directions, respectively. Assuming that the correspondence between images can be expressed as affine transformation including translation, rotation, and enlargement / reduction, Expression (14) is expressed as Expression (16).
Figure JPOXMLDOC01-appb-M000017
Figure JPOXMLDOC01-appb-M000017
 式(16)中の係数a,b,c,dは拡大・縮小、及び回転を表し、係数tx、tyは式(15)と同様である。なお、式(14)、式(15)、式(16)における変換行列Hの各係数hij(i,j=1,2,3)は、各変換モデルの拘束条件と対応点情報Q(t,t-1)から導かれる連立方程式を最小二乗法により解くことで算出する。なお、十分な対応点数が無く、変換行列Hを算出できない場合は、所定の変換行列Hを用いる。 Coefficients a, b, c, and d in Expression (16) represent enlargement / reduction and rotation, and coefficients tx and ty are the same as those in Expression (15). The coefficients h ij (i, j = 1, 2, 3) of the transformation matrix H in the equations (14), (15), and (16) are the constraint conditions and corresponding point information Q ( It is calculated by solving simultaneous equations derived from t, t-1) by the method of least squares. If there is not a sufficient number of corresponding points and the conversion matrix H cannot be calculated, a predetermined conversion matrix H 0 is used.
 図11のステップS354に進んで、図6の消失点位置算出部224は、「時刻tにおける消失点の位置は、一つ前の画像F(t-1)上の消失点位置を、図6の変換行列算出部223で算出した変換行列Hを用いて、処理対象画像F(t)上へ射影した位置にある」と仮定して、時刻tの消失点位置を算出し(ステップ354)、その結果に基づいて消失点情報VP(t)を設定する(ステップS355)。画像F(t-1)上の消失点を変換行列Hにより画像F(t)上に射影したときの画像例を図12(C)に示す。 Proceeding to step S354 in FIG. 11, the vanishing point position calculation unit 224 in FIG. 6 determines that “the vanishing point position at time t is the vanishing point position on the previous image F (t−1). Using the transformation matrix H calculated by the transformation matrix calculation unit 223, the vanishing point position at the time t is calculated (step 354), assuming that the position is projected onto the processing target image F (t). Based on the result, vanishing point information VP (t) is set (step S355). FIG. 12C shows an image example when the vanishing point on the image F (t−1) is projected onto the image F (t) by the transformation matrix H.
 再び図7のステップS36に戻って、図6のバッファ203は、1つ前の画像F(t-1)を削除し、入力された処理対象画像F(t)を記憶する。また、図6のバッファ204は、1つ前の消失点情報VP(t-1)を削除し、フレーム内消失点推定部21、または、フレーム間消失点推定部22より入力された消失点情報VP(t)を記憶して、処理対象画像F(t)における消失点推定の処理を終了する(ステップS36)。 7 again, the buffer 203 in FIG. 6 deletes the previous image F (t−1) and stores the input processing target image F (t). Also, the buffer 204 in FIG. 6 deletes the previous vanishing point information VP (t−1), and the vanishing point information input from the intra-frame vanishing point estimation unit 21 or the inter-frame vanishing point estimation unit 22. VP (t) is stored, and the vanishing point estimation process for the processing target image F (t) is terminated (step S36).
 以上、本実施形態の消失点推定部20によれば、図13に示すように、同一シーン(空間方向、時間方向に相関のある時系列画像群)において、先頭フレームF(t)から同一シーン内で最初に消失点が検出されるフレームF(t+k-1)までは、フレーム内消失点推定手段(フレーム内消失点推定部21)により消失点を推定し、同一シーン内で最初に消失点が検出されるフレームF(t+k-1)の次フレームF(t+k)から同一シーン内の最終フレームF(t+N)までは、フレーム間消失点推定手段(フレーム間消失点推定部22)により消失点を推定するため、フレーム単位にフレーム内消失点推定手段により消失点を推定する場合と比べて、カメラワークにロバストでかつ、消失点の揺れを抑制し安定した消失点の推定が可能となる。 As described above, according to the vanishing point estimation unit 20 of the present embodiment, as shown in FIG. 13, in the same scene (a time series image group correlated in the spatial direction and the time direction), the same from the first frame F (t 0 ). Until the frame F (t 0 + k−1) in which the vanishing point is first detected in the scene, the vanishing point is estimated by the intra-frame vanishing point estimating means (intra-frame vanishing point estimating unit 21). From the next frame F (t 0 + k) of the frame F (t 0 + k−1) where the vanishing point is detected to the last frame F (t 0 + N) in the same scene, the inter-frame vanishing point estimation means (inter-frame Since the vanishing point is estimated by the vanishing point estimation unit 22), the vanishing point is more robust to the camera work than the case where the vanishing point is estimated by the intra-frame vanishing point estimation unit, and the vanishing point fluctuation is suppressed and stabilized. The vanishing point can be estimated.
(奥行モデル生成部30について)
 再び図2に戻って、図1の奥行モデル生成部30は、本発明の奥行モデル生成手段に相当し、消失点推定部20により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する。つまり、奥行モデル生成部30は、消失点情報VP(t)に基づいて、奥行モデルの作成手段(消失点位置から奥行モデルを作成する第1の奥行モデル作成手段、人の視覚特性に基づいた画像内の誘目性を表す顕著度から奥行モデルを作成する第2の奥行モデル作成手段)を選択し、選択した奥行モデル作成手段により、処理対象画像F(t)における各画素の奥行値を設定し、各画素の奥行値を表す奥行モデルD(t)を視点画像生成部40へ出力する(図2のステップS14)。
(About the depth model generation unit 30)
Returning to FIG. 2 again, the depth model generation unit 30 in FIG. 1 corresponds to the depth model generation means of the present invention, and generates different depth models based on whether or not the vanishing point can be estimated by the vanishing point estimation unit 20. To do. That is, the depth model generation unit 30 is based on the vanishing point information VP (t), based on the visual model of the depth model creating means (first depth model creating means for creating the depth model from the vanishing point position, human visual characteristics). Second depth model creating means for creating a depth model from the saliency representing the attractiveness in the image), and the depth value of each pixel in the processing target image F (t) is set by the selected depth model creating means. Then, the depth model D (t) representing the depth value of each pixel is output to the viewpoint image generation unit 40 (step S14 in FIG. 2).
 続いて、本実施形態における奥行モデル生成部30について詳細に説明する。図14に示すように、奥行モデル生成部30は、切替部301、切替部302、領域分割部303、距離算出部304、顕著度算出部305、および奥行値設定部306で構成されている。図15は、奥行モデル生成部30の動作例を説明するためのフロー図である。 Subsequently, the depth model generation unit 30 in the present embodiment will be described in detail. As illustrated in FIG. 14, the depth model generation unit 30 includes a switching unit 301, a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306. FIG. 15 is a flowchart for explaining an operation example of the depth model generation unit 30.
 図15において、図14の奥行モデル生成部30は、入力された消失点情報VP(t)に基づいて奥行モデル作成手段を選択する(ステップS41)。つまり、現フレームに消失点が有る場合(消失点情報VP(t)の「vp_num>0」)(ステップS41においてYes)、図14の切替部301は画像の入力先を領域分割部303へ、図14の切替部302は奥行値設定部306へ入力するデータの出力元を距離算出部304へそれぞれ切り替え、消失点に基づく第1の奥行モデル作成手段が選択される(ステップS42)。 15, the depth model generation unit 30 in FIG. 14 selects a depth model creation means based on the input vanishing point information VP (t) (step S41). That is, when there is a vanishing point in the current frame (“vp_num> 0” of vanishing point information VP (t)) (Yes in step S41), the switching unit 301 in FIG. 14 transfers the image input destination to the region dividing unit 303. The switching unit 302 in FIG. 14 switches the output source of the data input to the depth value setting unit 306 to the distance calculation unit 304, and the first depth model creation means based on the vanishing point is selected (step S42).
 ステップS43に進んで、図14の距離算出部304は、消失点情報VP(t)の消失点の座標と各画素の座標との距離Dist(x,y)を算出し、その結果を記述した距離情報Dist(t)を奥行値設定部306へ出力する(ステップS43)。具体的には、消失点の座標と各画素の座標との距離Dist(x,y)は、式(17)、式(18)、式(19)のいずれかに基づいて算出される。なお、式(17)、式(18)、式(19)中のΔx、Δyはそれぞれ各画素と消失点とのx方向の距離、y方向の距離を表す。 In step S43, the distance calculation unit 304 in FIG. 14 calculates the distance Dist (x, y) between the coordinates of the vanishing point of the vanishing point information VP (t) and the coordinates of each pixel, and describes the result. The distance information Dist (t) is output to the depth value setting unit 306 (step S43). Specifically, the distance Dist (x, y) between the coordinates of the vanishing point and the coordinates of each pixel is calculated based on any one of Expression (17), Expression (18), and Expression (19). In Expressions (17), (18), and (19), Δx and Δy represent the distance in the x direction and the distance in the y direction between each pixel and the vanishing point, respectively.
Figure JPOXMLDOC01-appb-M000018
Figure JPOXMLDOC01-appb-M000018
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000020
Figure JPOXMLDOC01-appb-M000020
 ここで、画面内に消失点VPがある場合の、それぞれ式(17)、式(18)、式(19)に基づく距離情報Dist(t)の一例を図16に示す。図16(A)は、画面内に消失点VPがある一例を表す。図16(B)のBD1aは式(17)、図16(C)のBD1bは式(18)、図16(D)のBD1cは式(19)に基づく距離情報Dist(t)を表している。また、画面外に消失点VPがある場合の、それぞれ式(17)、式(18)、式(19)に基づく距離情報Dist(t)の一例を図17に示す。図17(A)は、画面外に消失点VPがある一例を表す。図17(B)のBD2aは式(17)、図17(C)のBD2bは式(18)、図17(D)のBD2cは式(19)に基づく距離情報Dist(t)を表している。なお、図16及び図17において。白い部分が最も近く、黒くなるにつれて遠くなるものとする。 Here, FIG. 16 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP is present in the screen. FIG. 16A shows an example in which the vanishing point VP is in the screen. BD1a in FIG. 16B represents equation (17), BD1b in FIG. 16C represents equation (18), and BD1c in FIG. 16D represents distance information Dist (t) based on equation (19). . FIG. 17 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP exists outside the screen. FIG. 17A shows an example in which the vanishing point VP is outside the screen. BD2a in FIG. 17B represents equation (17), BD2b in FIG. 17C represents equation (18), and BD2c in FIG. 17D represents distance information Dist (t) based on equation (19). . In FIG. 16 and FIG. It is assumed that the white part is closest and becomes farther away as it becomes blacker.
 ステップS44に進んで、図14の領域分割部303は、処理対象画像F(t)を領域分割(クラスタリング)により、特徴量が類似する(特徴量の値が予め定めた範囲内となる)複数の画素の集合(領域;クラス)に分割する。例えば、領域分割部303は、特徴量空間でのクラスタリングにより画像を複数の領域へ分割する。特徴量空間によるクラスタリングとは、画像空間の各画素を特徴量空間(例えば、色、エッジ、動きベクトル)に写像し、その特徴量空間においてK-means法、Mean-Shift法、又はK最近傍探索法(近似K最近傍探索法)などの手法により行うクラスタリングである。特徴量空間でのクラスタリング処理の終了後、各領域の代表値となる画素値(例えば平均値)により、そのクラス内の画素について、元の画像空間における画素値を置き換え、各領域に対して領域を識別するラベルを各領域内の全画素に付与し、その結果を記述した領域情報R(t)を奥行値設定部306へ出力する(ステップS44)。 Proceeding to step S44, the region dividing unit 303 in FIG. 14 has a plurality of similar feature amounts (feature value values are within a predetermined range) by region division (clustering) of the processing target image F (t). Into a set of pixels (region; class). For example, the region dividing unit 303 divides the image into a plurality of regions by clustering in the feature amount space. The clustering by the feature amount space means that each pixel of the image space is mapped to the feature amount space (for example, color, edge, motion vector), and the K-means method, the Mean-Shift method, or the K nearest neighbor in the feature amount space. Clustering is performed by a technique such as a search method (approximate K nearest neighbor search method). After the clustering processing in the feature amount space is finished, the pixel value in the class is replaced with a pixel value (for example, an average value) that is a representative value of each region, and the pixel value in the original image space is replaced with each region. Is assigned to all pixels in each region, and region information R (t) describing the result is output to the depth value setting unit 306 (step S44).
 ステップS45に進んで、図14の奥行値設定部306は、入力された距離情報Dist(t)と領域情報R(t)に基づいて、各画素の奥行値を設定する。具体的には、式(20)に示すように、領域情報R(t)が示す各領域内にある画素の距離Dist(x,y)の平均値をスケーリングし、基準となる奥行値Dbase(x,y)だけシフトした値を各画素の奥行値D(x,y)として設定する(ステップS45)。 Proceeding to step S45, the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel based on the input distance information Dist (t) and region information R (t). Specifically, as shown in Expression (20), the average value of the distances Dist (x, y) of the pixels in each area indicated by the area information R (t) is scaled, and the depth value D base serving as a reference is scaled. A value shifted by (x, y) is set as the depth value D (x, y) of each pixel (step S45).
Figure JPOXMLDOC01-appb-M000021
Figure JPOXMLDOC01-appb-M000021
 なお、式(20)において、Dmaxは奥行値の上限値、Dminは奥行値の下限値、Distmaxは距離情報Dist(t)の最大値、Distminは距離情報Dist(t)の最小値、Dbase(x,y)は各画素の奥行値の基準値(最遠景とする奥行値)を調整するための所定の定数である。ここで、消失点に基づいた奥行モデルの一例を図18に示す。図18において、画像Aは処理対象画像F(t)の一例を表し、画像Bは領域分割部303において求めた処理対象画像F(t)の領域分割結果(領域分割情報R(t))の一例を表し、画像Cは処理対象画像F(t)の消失点VPの一例を表し、画像Dは距離算出部304において求めた処理対象画像F(t)の距離情報Dist(t)の一例を表し、画像Eは奥行値設定部306において、画像Bの領域分割情報R(t)と画像Dの距離情報Dist(t)に基づいて求めた奥行モデルの一例である。図18の画像Eにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。 In Expression (20), D max is the upper limit value of the depth value, D min is the lower limit value of the depth value, Dist max is the maximum value of the distance information Dist (t), and Dist min is the minimum value of the distance information Dist (t). The value D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). Here, an example of the depth model based on the vanishing point is shown in FIG. In FIG. 18, an image A represents an example of the processing target image F (t), and an image B is a region division result (region division information R (t)) of the processing target image F (t) obtained by the region division unit 303. For example, the image C represents an example of the vanishing point VP of the processing target image F (t), and the image D represents an example of the distance information Dist (t) of the processing target image F (t) obtained by the distance calculation unit 304. The image E is an example of the depth model obtained by the depth value setting unit 306 based on the region division information R (t) of the image B and the distance information Dist (t) of the image D. In the image E of FIG. 18, the bright part represents the near side, and the dark part represents the back.
 再び図15のステップS41に戻って、現フレームに消失点が無い場合(消失点情報VP(t)の「vp_num=0」)(ステップS41においてNo)、図14の切替部301は画像の入力先を顕著度算出部305へ、図14の切替部302は奥行値設定部306へ入力するデータの出力元を顕著度算出部305へそれぞれ切り替え、顕著度に基づく奥行モデル作成手段が選択される(ステップS46)。 Returning to step S41 in FIG. 15 again, when there is no vanishing point in the current frame (“vp_num = 0” in vanishing point information VP (t)) (No in step S41), the switching unit 301 in FIG. The switching unit 302 in FIG. 14 switches the output source of the data input to the saliency calculating unit 306 to the saliency calculating unit 306, and the depth model creating means based on the saliency is selected. (Step S46).
 図14の顕著度算出部305は、入力された処理対象画像F(t)から、人の視覚特性に基づいた画像内の誘目性を表す顕著度M(t)を算出する(ステップS47)。人が注目しやすい部分の例としては、注目画素とその周辺画素との色差が大きい箇所(局所的な色差)、注目する画素と画像全体との色差が大きい箇所、あるいは注目画素を含む局所領域とその周辺領域との色差が大きい箇所(大局的な色差)がある。色差とは、色の知覚的な相違を定量的に表したものであり、色差を評価する色空間として、均等色空間(uniform color space)であるCIELAB色空間(CIE 1976 L*a*b*空間ともいう)を用いる。人の視覚特性に基づき、式(21)により各画素の顕著度M(x,y)を算出する。 The saliency calculating unit 305 in FIG. 14 calculates the saliency M (t) representing the attractiveness in the image based on the human visual characteristics from the input processing target image F (t) (step S47). Examples of portions that are easy for people to pay attention to are locations where the color difference between the pixel of interest and its surrounding pixels is large (local color difference), locations where the color difference between the pixel of interest and the entire image is large, or a local region including the pixel of interest There is a portion (global color difference) where the color difference between the image and the surrounding area is large. Color difference is a quantitative representation of perceptual differences in color. As a color space for evaluating color difference, CIELAB color space (CIE 1976 L * a * b *) is a uniform color space. Also referred to as space). Based on the visual characteristics of the person, the saliency M (x, y) of each pixel is calculated by Equation (21).
Figure JPOXMLDOC01-appb-M000022
Figure JPOXMLDOC01-appb-M000022
Figure JPOXMLDOC01-appb-M000023
Figure JPOXMLDOC01-appb-M000023
Figure JPOXMLDOC01-appb-M000024
Figure JPOXMLDOC01-appb-M000024
 ここで、式(21)において、ΔElocalは局所的な色差を表し、ΔEglobalは大局的な色差を表し、係数α、βは所定の重み係数を表す。つまり、式(21)は顕著度を局所的な色差と大局的な色差との線形和によって表わしている。なお、局所的な色差ΔElocalは式(22)、大局的な色差ΔEglobalは式(23)によって算出される。また、式(22)、式(23)において、L*は明度指数、a*は赤-緑の知覚色度、b*は黄-青の知覚色度を表す。なお、色差を評価する色空間は、CIELUV色空間(CIE 1976 L*u*v*色空間ともいう)を用いてもよい。なお、式(22)中の係数w(u,v)は、式(12)と同一であるため、説明を省略する。 Here, in Expression (21), ΔE local represents a local color difference, ΔE global represents a global color difference, and coefficients α and β represent predetermined weight coefficients. That is, Expression (21) expresses the saliency by a linear sum of the local color difference and the global color difference. The local color difference ΔE local is calculated by equation (22), and the global color difference ΔE global is calculated by equation (23). In Equations (22) and (23), L * represents a lightness index, a * represents red-green perceptual chromaticity, and b * represents yellow-blue perceived chromaticity. Note that a CIELV color space (also referred to as CIE 1976 L * u * v * color space) may be used as the color space for evaluating the color difference. Note that the coefficient w (u, v) in the equation (22) is the same as that in the equation (12), and thus the description thereof is omitted.
 ステップS48に進んで、図14の奥行値設定部306は、入力された顕著度M(t)に基づいて、式(24)の演算により各画素の奥行値を設定し、その結果を記述した奥行モデルD(t)を出力する(ステップS48)。 Proceeding to step S48, the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel by the calculation of Expression (24) based on the input saliency M (t), and describes the result. The depth model D (t) is output (step S48).
 式(24)において、Dmaxは奥行値の上限値、Dminは奥行値の下限値、Mmaxは顕著度M(t)の最大値、Mminは顕著度M(t)の最小値、Dbase(x,y)は各画素の奥行値の基準値(最遠景とする奥行値)を調整するための所定の定数である。つまり、式(24)により各画素の顕著度M(x,y)をスケーリングし、基準となる奥行値Dbase(x,y)だけシフトした値を各画素の奥行値D(x,y)として設定する。ここで、顕著度に基づいた奥行モデルの一例を図19に示す。図19において、画像Aは処理対象画像F(t)の一例を表し、画像Bは顕著度算出部305において求めた処理対象画像F(t)の顕著度M(t)の一例を表し、画像Cは基準となる奥行モデル(Dbase)の一例を表し、画像Dは奥行値設定部306において、画像Cの奥行モデル(Dbase)に画像Bの顕著度M(t)を合成して作成した奥行モデルの一例である。 In Formula (24), D max is the upper limit value of the depth value, D min is the lower limit value of the depth value, M max is the maximum value of the saliency M (t), M min is the minimum value of the saliency M (t), D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). That is, the saliency M (x, y) of each pixel is scaled by the equation (24), and the value shifted by the reference depth value D base (x, y) is the depth value D (x, y) of each pixel. Set as. Here, an example of the depth model based on the saliency is shown in FIG. In FIG. 19, an image A represents an example of the processing target image F (t), an image B represents an example of the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305, and the image C represents an example of a reference depth model (D base ), and the image D is created by combining the depth model (D base ) of the image C with the saliency M (t) of the image B in the depth value setting unit 306. It is an example of a depth model.
 図19の画像Bにおいて、明るい部分(白)が人の注目しやすい部分(誘目性が高い)を表し、暗い部分(黒)が人の注目しにくい部分(誘目性が低い)を表す。また、図19の画像Dにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。図19の画像Dに示すように、顕著度に基づく奥行モデル作成手段は、基準となる奥行の面上(Dbase)に、スケーリングした顕著度を重畳し、注目する領域とその周辺領域との相対的な奥行の違いを強調することによって、疑似的な奥行感を知覚させるものである。 In the image B of FIG. 19, a bright part (white) represents a part that is easily noticed by a person (high attractiveness), and a dark part (black) represents a part that is difficult for human attention (low attractiveness). Moreover, in the image D of FIG. 19, it represents that a bright part is a near side, and a dark part represents that it is the back. As shown in the image D of FIG. 19, the depth model creation means based on the saliency superimposes the scaled saliency on the reference depth surface (D base ), and the region of interest and its surrounding region By emphasizing the difference in relative depth, a pseudo depth feeling is perceived.
 顕著度に基づいて奥行モデルを生成する場合には、基準となる奥行の面上に、顕著度の高い(誘目性が高い)部分の奥行が手前側に、顕著度の低い(誘目性が低い)部分の奥行が奥側となるように設定する。これにより、注目する領域とその周辺領域との相対的な奥行の違いが強調され、擬似的な奥行感を知覚させることができる。換言すれば、注目領域の顕著度が周辺領域の顕著度よりも高い場合には、周辺領域に対して相対的に奥行が手前になるように設定される。また、注目領域の顕著度が周辺領域の顕著度と同等の場合には、相対的に同じ奥行になるように設定される。また、注目領域の顕著度が周辺領域の顕著度よりも低い場合には、周辺領域に対して相対的に奥行が奥になるように設定される。 When the depth model is generated based on the saliency, the depth of the portion with high saliency (high attraction) is on the near side on the reference depth surface, and the saliency is low (low attraction) ) Set the depth of the part to the back side. Thereby, the relative depth difference between the region of interest and the surrounding region is emphasized, and a pseudo depth sensation can be perceived. In other words, when the saliency of the attention area is higher than the saliency of the surrounding area, the depth is set to be relatively closer to the surrounding area. Further, when the saliency of the attention area is equal to the saliency of the surrounding area, the depth is set to be relatively the same. Further, when the saliency of the attention area is lower than the saliency of the peripheral area, the depth is set to be relatively deep with respect to the peripheral area.
 以上のように、奥行モデル生成部30によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを作成することで、幾何的な奥行手掛かりのによる奥行感を強調する奥行モデルを作成することができる。また、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを作成することで、人の注目する部分の奥行感を強調する奥行モデルを作成することができる。 As described above, according to the depth model generation unit 30, when the vanishing point position can be estimated by the geometric depth clue, the depth model of the image is created based on the vanishing point, thereby generating the geometric depth clue. It is possible to create a depth model that emphasizes the sense of depth. In addition, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model of the image is created from the saliency representing the attractiveness in the image based on the visual characteristics of the person, so that It is possible to create a depth model that emphasizes the sense of depth.
(視点画像生成部40について)
 再び図2に戻って、図1の視点画像生成部40は、本発明の視点画像生成手段に相当し、予め設定された想定視聴条件情報に基づいて、奥行モデルD(t)が表す各画素の奥行値から、基準画像F(t)(入力画像;処理対象画像)上の各画素と視点画像Fi(t)(i=l,r;Fr:右眼提示画像、Fl:左眼提示画像)上の対応する画素までのずれ量を表す視差ベクトル(シフト量)を算出し、基準画像F(t)上の画素と、対応する算出した視差ベクトルに基づいて、各視点画像Fi(t)(i=l,r)を生成する(ステップS15)。
(About the viewpoint image generation unit 40)
Returning to FIG. 2 again, the viewpoint image generation unit 40 of FIG. 1 corresponds to the viewpoint image generation means of the present invention, and each pixel represented by the depth model D (t) based on the assumed viewing condition information set in advance. From the depth value of each pixel, each pixel on the reference image F (t) (input image; processing target image) and the viewpoint image Fi (t) (i = 1, r; Fr: right eye presentation image, Fl: left eye presentation image) ) Calculates a disparity vector (shift amount) representing a shift amount to the corresponding pixel on the above, and based on the pixel on the reference image F (t) and the corresponding calculated disparity vector, each viewpoint image Fi (t) (I = 1, r) is generated (step S15).
 ここで、「想定視聴条件情報」とは、視聴者に提示する立体画像(左眼提示画像、右眼提示画像)を生成するための情報であり、立体画像を表示するディスプレイの画素ピッチ(画素間距離)μ、ディスプレイの画像サイズ、視聴者と立体画像を表示するディスプレイまでの距離(想定視距離)f、立体画像の奥行量を表す視差範囲(視差ベクトルの範囲)、基線長t(視点画像Fr(t)の仮想右視点Crと視点画像Fl(t)の仮想左視点Cl間の距離)を表す。 Here, “assumed viewing condition information” is information for generating a stereoscopic image (left-eye presentation image, right-eye presentation image) to be presented to the viewer, and the pixel pitch (pixel) of the display that displays the stereoscopic image. Distance) μ, display image size, distance between viewer and display for displaying stereoscopic image (assumed viewing distance) f, parallax range (range of parallax vector) representing depth of stereoscopic image, baseline length t (viewpoint) This represents the distance between the virtual right viewpoint Cr of the image Fr (t) and the virtual left viewpoint Cl of the viewpoint image Fl (t).
 この想定視聴条件情報に基づいた視点画像を生成するためのカメラ(視点)配置の一例の俯瞰図を図20に示す。図20の例では、平行法による立体画像の撮影を想定し、仮想右視点Cr上のカメラと仮想左視点Cl上のカメラが基準視点Cc上のカメラとx軸方向に平行に配置され、それぞれのカメラは3次元空間上にある注目点Pを観測しているとする。また基準視点Ccの画像面Ic上に投影された注目点Pの位置をXc、仮想左視点Clの画像面Il上に投影された注目点Pの位置をXl、仮想右視点Crの画像面Ir上に投影された注目点Pの位置をXrとする。図20において、各視点と対応する画像面までの距離(焦点距離、あるいは視距離)f、視点から注目点Pまでのz方向の距離Z、基準視点Ccと各仮想視点(Cr,Cl)までのx方向の距離t/2を用いて各画像面上に投影された注目点Pの位置XlとXc,XrとXcの幾何的な関係は、それぞれ式(25)、(26)によって表される。 FIG. 20 shows an overhead view of an example of a camera (viewpoint) arrangement for generating a viewpoint image based on the assumed viewing condition information. In the example of FIG. 20, assuming that a parallel image is captured, a camera on the virtual right viewpoint Cr and a camera on the virtual left viewpoint Cl are arranged in parallel with the camera on the reference viewpoint Cc in the x-axis direction, respectively. Is observing a point of interest P in a three-dimensional space. Further, the position of the attention point P projected on the image plane Ic of the reference viewpoint Cc is Xc, the position of the attention point P projected on the image plane Il of the virtual left viewpoint Cl is X1, and the image plane Ir of the virtual right viewpoint Cr. Let Xr be the position of the point of interest P projected above. In FIG. 20, the distance (focal length or viewing distance) f to the image plane corresponding to each viewpoint, the distance Z in the z direction from the viewpoint to the point of interest P, the reference viewpoint Cc and each virtual viewpoint (Cr, Cl). The geometrical relationships between the positions Xl and Xc, Xr and Xc of the target point P projected on each image plane using the distance t / 2 in the x direction are expressed by equations (25) and (26), respectively. The
Figure JPOXMLDOC01-appb-M000026
Figure JPOXMLDOC01-appb-M000026
Figure JPOXMLDOC01-appb-M000027
Figure JPOXMLDOC01-appb-M000027
 以上から、基準画像F(t)上の画素と視点画像Fi(t)(i=l,r)上の対応する画素までのずれ量を表す視差ベクトル(シフト量)di(i=l,r)は、式(25)、(26)を変形した式(27)、(28)によって導出される。 From the above, the disparity vector (shift amount) di (i = 1, r) representing the amount of deviation between the pixel on the reference image F (t) and the corresponding pixel on the viewpoint image Fi (t) (i = 1, r). ) Is derived from equations (27) and (28) obtained by modifying equations (25) and (26).
Figure JPOXMLDOC01-appb-M000028
Figure JPOXMLDOC01-appb-M000028
Figure JPOXMLDOC01-appb-M000029
Figure JPOXMLDOC01-appb-M000029
 なお、式(27)、(28)中の変数μは、画素ピッチを表す。つまり、基準画像F(t)と相対奥行値である奥行モデルD(t)と奥行モデルD(t)を絶対奥行値Zへ変換する関数(Z=z(D(t)))が与えられれば、式(27)、式(28)に基づいて、各視点画像Fi(t)(i=l,r)を生成することができる。 Note that the variable μ in the equations (27) and (28) represents the pixel pitch. That is, a reference image F (t), a depth model D (t) that is a relative depth value, and a function (Z = z (D (t))) that converts the depth model D (t) into an absolute depth value Z are given. For example, each viewpoint image Fi (t) (i = 1, r) can be generated based on Expression (27) and Expression (28).
 以下では、上記考え方に基づき視点画像生成部40について説明する。図21は、視点画像生成部40の構成例を示すブロック図である。また、図22は、視点画像生成部40の動作例を説明するためのフロー図である。また、図23は、視点画像生成部40における視点画像の生成例を説明するための図である。まず、図21に示すように、視点画像生成部40は、視差ベクトル算出部401、テクスチャシフト部402、ギャップフィリング部(オクルージョン補償部ともいう)403、及びフローティングウィンドウ重畳部404で構成されている。 Hereinafter, the viewpoint image generation unit 40 will be described based on the above concept. FIG. 21 is a block diagram illustrating a configuration example of the viewpoint image generation unit 40. FIG. 22 is a flowchart for explaining an operation example of the viewpoint image generation unit 40. FIG. 23 is a diagram for explaining a viewpoint image generation example in the viewpoint image generation unit 40. First, as illustrated in FIG. 21, the viewpoint image generation unit 40 includes a disparity vector calculation unit 401, a texture shift unit 402, a gap filling unit (also referred to as an occlusion compensation unit) 403, and a floating window superimposing unit 404. .
 図22において、図21の視差ベクトル算出部401は、入力された想定視聴条件情報と奥行モデルD(t)と奥行モデルD(t)を絶対奥行値へ変換する関数(Z=z(D(t)))とに基づいて、式(27)、式(28)から基準画像F(t)上の各画素と各視点画像Fi(t)上の対応する画素までの視差ベクトルdi(i=l,r)を算出し、その結果をテクスチャシフト部402へ出力する(図22のステップS51)。なお、視差ベクトルの算出方法は、式(27)、式(28)に基づいて各画素の視差ベクトルを算出するほかに、図23(B)のLUTに示すように、予め想定視聴条件情報に基づいて設定した奥行値(相対奥行値)から視差ベクトルを導くルックアップテーブルを用いて算出してもよい。なお、図23(B)中のg(D)は、奥行値Dを視差ベクトルへ変換する関数を表す。ここで、図23(B)中の視差ベクトル(シフト量)の開散方向、交差方向について図24を用いて説明する。図24において、ある注目点をPとし、右眼から見てディスプレイ面に投影される注目点PをPr、左眼から見てディスプレイ面に投影される注目点PをPlとする。このとき、開散方向の視差ベクトルは、図24の(A)に示すように、ある注目点Pは、ディスプレイ面の後方に位置し、ディスプレイ面上のPrからPlへの視差ベクトル、あるいはPlからPrへの視差ベクトルの値が正となる場合である。同様に、交差方向の視差ベクトルは、図24の(B)に示すように、ある注目点Pはディスプレイ面の前方に位置し、ディスプレイ面上のPrからPlへの視差ベクトル、あるいはPlからPrへの視差ベクトルの値が負となる場合である。また、視差ベクトルの値がゼロの場合は、注目点Pはディスプレイ面上に位置する。 In FIG. 22, the disparity vector calculation unit 401 in FIG. 21 converts the input assumed viewing condition information, the depth model D (t), and the depth model D (t) into an absolute depth value (Z = z (D ( t))) and the disparity vectors di (i = i = i) from the equations (27) and (28) to the respective pixels on the reference image F (t) and the corresponding pixels on each viewpoint image Fi (t). l, r) is calculated, and the result is output to the texture shift unit 402 (step S51 in FIG. 22). In addition to calculating the disparity vector of each pixel based on Expressions (27) and (28), the disparity vector calculation method is based on the assumed viewing condition information in advance as shown in the LUT of FIG. You may calculate using the lookup table which derives a parallax vector from the depth value (relative depth value) set based on. Note that g (D) in FIG. 23B represents a function for converting the depth value D into a disparity vector. Here, the spreading direction and the crossing direction of the disparity vector (shift amount) in FIG. 23B will be described with reference to FIG. In FIG. 24, let P be a certain point of interest, Pr be the point of interest P projected onto the display surface when viewed from the right eye, and Pl be the point of interest P projected onto the display surface when viewed from the left eye. At this time, as shown in FIG. 24A, the disparity vector in the spreading direction is such that a certain point of interest P is located behind the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl This is a case where the value of the parallax vector from to Pr becomes positive. Similarly, as shown in FIG. 24B, the disparity vector in the cross direction is such that a certain point of interest P is located in front of the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl to Pr. This is a case where the value of the parallax vector to becomes negative. When the value of the parallax vector is zero, the attention point P is located on the display surface.
 続いて、図21のテクスチャシフト部402は、基準画像F(t)の各画素(x,y)を、対応する視差ベクトルdi(i=l,r)に基づいて、各視点画像Fi(t)(i=l,r)と対応する画素(u,v)の画素値として設定し、生成した視点画像をギャップフィリング部403へ出力する(図22のステップS52)。なお、画素値を設定するときは、視差ベクトルの値が開散方向側(例えば、図23(B)のLUT上のd2)の値を有する画素からテクスチャシフトを行う。 Subsequently, the texture shift unit 402 in FIG. 21 converts each pixel (x, y) of the reference image F (t) to each viewpoint image Fi (t) based on the corresponding disparity vector di (i = 1, r). ) (I = 1, r) and the corresponding pixel value of the pixel (u, v), and the generated viewpoint image is output to the gap filling unit 403 (step S52 in FIG. 22). When the pixel value is set, texture shift is performed from a pixel whose disparity vector value has a value on the spreading direction side (for example, d2 on the LUT in FIG. 23B).
 例えば、図23(A)において、基準画像iF、奥行モデルiDより仮想左視点の視点画像Fl(t)(左眼提示画像)を生成する場合を考える。なお、奥行モデルiDは、白部分の奥行値がD1であり、黒部分の奥行値がD2で表されるとする。このとき、図23(B)のLUTに基づいてテクスチャシフトを行うと、まず、図23(A)の奥行値D2を有するレイヤL2の各画素を開散方向へd2だけシフトする。その後、図23(A)の奥行値D1を有するレイヤL1の各画素を交差方向へd1だけシフトすると、画面の左端/右端に位置しない欠損領域Gs1と画面の左端/右端に位置する欠損領域Gl1を有する視点画像oF1が得られる。ここで、欠損領域(オクルージョン領域)とは、図23(A)の視点画像oF1において、それぞれ基準画像上に対応する画素がないため、画素値が設定されていない領域を表す。 For example, in FIG. 23A, a case where a virtual left viewpoint viewpoint image Fl (t) (left eye presentation image) is generated from the reference image iF and the depth model iD is considered. In the depth model iD, the depth value of the white portion is D1 and the depth value of the black portion is represented by D2. At this time, when texture shift is performed based on the LUT in FIG. 23B, first, each pixel of the layer L2 having the depth value D2 in FIG. 23A is shifted by d2 in the spreading direction. After that, when each pixel of the layer L1 having the depth value D1 in FIG. 23A is shifted by d1 in the intersecting direction, a defective region Gs1 that is not located at the left end / right end of the screen and a defective region Gl1 that is located at the left end / right end of the screen A viewpoint image oF1 having is obtained. Here, the missing area (occlusion area) represents an area in which no pixel value is set because there is no corresponding pixel on the reference image in the viewpoint image oF1 in FIG.
 続いて、図21のギャップフィリング部403は、入力された視点画像Fi(t)(i=l,r)において、画面端に位置しない欠損領域(例えば、図23(A)の視点画像oF1のGs1)の画素を、欠損領域周辺に位置する画素群から補間し、補間後の視点画像Fi(t)(i=l,r)をフローティングウィンドウ重畳部404へ出力する(図22のステップS53)。なお、欠損領域の画素の補間方法は、例えば、線形補間、メディアンフィルタ、もしくは公知の画像修復方法(例えば、非特許文献6参照)を用いる。 Subsequently, the gap filling unit 403 in FIG. 21 in the input viewpoint image Fi (t) (i = 1, r), the missing region that is not located at the screen edge (for example, the viewpoint image oF1 in FIG. 23A). The pixel of Gs1) is interpolated from the pixel group located around the defect area, and the interpolated viewpoint image Fi (t) (i = 1, r) is output to the floating window superimposing unit 404 (step S53 in FIG. 22). . Note that, for example, linear interpolation, a median filter, or a known image restoration method (see, for example, Non-Patent Document 6) is used as a method for interpolating a pixel in a defective region.
 続いて、フローティングウィンドウ重畳部404は、入力された視点画像Fi(t)(i=l,r)の両方のうち、画面端に位置する欠損領域(例えば、図23(A)の視点画像oF1のGl1)において、欠損領域の幅の最大値W1を取得する。続いて、それぞれの視点画像の右端、左端へ幅W2(=αW1)のフローティングウィンドウ(黒帯)を挿入し、その結果を出力する(図22のステップS54)。なお、W2は、W1を所定の定数αでスケーリングした値である。また、フローティングウィンドウ挿入後の視点画像は、例えば図23(A)の視点画像oF2である。図23(A)の視点画像oF2では、画面の左端、及び右端にフローティングウィンドウfw1、fw2がそれぞれ挿入されている。なお、フローティングウィンドウを挿入する理由は、左眼/右眼に提示される画像において、ある対象の位置や形状などが極端に異なる場合(例えば、生成した視点画像において画面の左端/右端に位置する欠損領域)、一つの対象として両眼視することができないことが原因で発生する左右の網膜像を交互に知覚する視野闘争を抑制するためである。 Subsequently, the floating window superimposing unit 404 of the input viewpoint images Fi (t) (i = 1, r) has a defect area located at the screen edge (for example, the viewpoint image oF1 in FIG. 23A). In Gl1), the maximum width W1 of the defect region is obtained. Subsequently, a floating window (black band) having a width W2 (= αW1) is inserted into the right end and the left end of each viewpoint image, and the result is output (step S54 in FIG. 22). W2 is a value obtained by scaling W1 with a predetermined constant α. Further, the viewpoint image after the floating window is inserted is, for example, the viewpoint image oF2 of FIG. In the viewpoint image oF2 in FIG. 23A, floating windows fw1 and fw2 are inserted at the left end and the right end of the screen, respectively. The reason for inserting the floating window is that the position or shape of a certain target is extremely different in the image presented to the left eye / right eye (for example, the generated viewpoint image is located at the left end / right end of the screen). This is to suppress a visual field struggle that alternately perceives left and right retinal images caused by the inability to see both eyes as one object.
 このように、本実施形態によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成することにより、幾何的な奥行手掛かりによる奥行感を強調した立体画像を生成することができる。また、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを生成することにより、人の注目する部分の奥行感を強調した立体画像を生成することができる。 As described above, according to the present embodiment, when the vanishing point position can be estimated by the geometric depth cue, the depth model of the image is generated based on the vanishing point, so that the depth sensation by the geometric depth cue can be obtained. An enhanced stereoscopic image can be generated. In addition, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model of the image is generated from the saliency representing the attractiveness in the image based on the human visual characteristics, thereby A stereoscopic image in which the sense of depth is emphasized can be generated.
(奥行モデル生成部30の変形例(奥行モデル生成部30a))
 上記実施形態において、奥行モデル生成部30では、顕著度に基づく奥行モデル作成手段の一例として、式(24)により各画素の顕著度M(x,y)をスケーリングし、基準となる奥行値Dbase(x,y)だけシフトした値を各画素の奥行値D(x,y)として設定する場合について説明したが、本発明はこれに限定されない。例えば、奥行モデル生成部30を、図25に示すように切替部301を取り除き、画像F(t)が領域分割部303、顕著度算出部305へ入力されるように構成を変更してもよい。つまり、奥行モデル生成部30aは、切替部302、領域分割部303、距離算出部304、顕著度算出部305、および奥行値設定部306で構成される。
(Modification of depth model generation unit 30 (depth model generation unit 30a))
In the embodiment described above, the depth model generation unit 30 scales the saliency M (x, y) of each pixel according to the equation (24) as an example of the depth model creation unit based on the saliency, and becomes a reference depth value D. Although the case where the value shifted by base (x, y) is set as the depth value D (x, y) of each pixel has been described, the present invention is not limited to this. For example, the configuration of the depth model generation unit 30 may be changed so that the switching unit 301 is removed as illustrated in FIG. 25 and the image F (t) is input to the region division unit 303 and the saliency calculation unit 305. . That is, the depth model generation unit 30a includes a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306.
 この場合の顕著度に基づく奥行モデル作成手段の動作例について、図25に基づいて説明する。なお、消失点に基づく奥行モデル作成手段の動作は、図15のステップS42~S45と同一の処理のため、ここでの説明を省略する。 An operation example of the depth model creating means based on the saliency in this case will be described with reference to FIG. Note that the operation of the depth model creation means based on the vanishing point is the same as that in steps S42 to S45 in FIG. 15, and will not be described here.
 まず、図25の顕著度算出部305は、図15のステップS47と同様の処理によって、処理対象画像F(t)より顕著度M(t)を算出し、その結果を奥行値設定部306へ出力する(図26のステップS47′)。続いて、図25の領域分割部303は、図15のステップS44と同様の処理によって、処理対象画像F(t)を領域分割し、その結果を記述した領域情報R(t)を奥行値設定部306へ出力する(図26のステップS48′)。 First, the saliency calculating unit 305 in FIG. 25 calculates the saliency M (t) from the processing target image F (t) by the same processing as step S47 in FIG. 15, and the result is sent to the depth value setting unit 306. This is output (step S47 ′ in FIG. 26). Subsequently, the area dividing unit 303 in FIG. 25 divides the processing target image F (t) into areas by processing similar to that in step S44 in FIG. 15, and sets area information R (t) describing the result to the depth value setting. It outputs to the part 306 (step S48 'of FIG. 26).
 その後、図25の奥行値設定部306は、入力された顕著度M(t)と領域情報R(t)に基づいて、式(29)に示すように、領域情報R(t)が示す各領域内にある画素の顕著度M(x,y)の平均値をスケーリングし、基準となる奥行値Dbase(x,y)だけシフトした値を各画素の奥行値D(x,y)として設定する(図26のステップS49′)。 Thereafter, the depth value setting unit 306 in FIG. 25, based on the input saliency M (t) and the area information R (t), each area information R (t) indicates, as shown in Expression (29). The average value of the saliency M (x, y) of the pixels in the region is scaled, and the value shifted by the reference depth value D base (x, y) is used as the depth value D (x, y) of each pixel. Setting is made (step S49 'in FIG. 26).
Figure JPOXMLDOC01-appb-M000030
Figure JPOXMLDOC01-appb-M000030
 なお、式(29)において、Dmaxは奥行値の上限値、Dminは奥行値の下限値、Mmaxは顕著度M(t)の最大値、Mminは顕著度M(t)の最小値、Dbase(x,y)は各画素の奥行値の基準値(最遠景とする奥行値)を調整するための所定の定数である。ここで、変形例における顕著度に基づいた奥行モデルの一例を図27に示す。図27において、画像Aは処理対象画像F(t)の一例を表し、画像Bは領域分割部303において求めた処理対象画像F(t)の領域分割結果(領域情報M(t))の一例を表し、画像Cは顕著度算出部305において求めた処理対象画像F(t)の顕著度M(t)を表し、画像Dは基準となる奥行モデル(Dbase)の一例を表し、画像Eは奥行値設定部306において、画像Bの領域情報R(t)と画像Cの顕著度M(t)と画像Dの基準となる奥行モデル(Dbase)に基づいて求めた奥行モデルの一例である。図27の画像Cにおいて、明るい部分(白)が人の注目しやすい部分(誘目性が高い)を表し、暗い部分(黒)が人の注目しにくい部分(誘目性が低い)を表す。また、図27の画像Eにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。なお、基準となる奥行モデル(Dbase)に関して、図27の画像Dでは、同一の奥行値をもつ平面を一例として挙げたが、これに限定されない。例えば、下記の式(30)に示す平面方程式を予め定めて、各画素の座標(x,y)によって基準となる奥行値Dbase(x,y)を設定してもよい。式(30)によって表される基準となる奥行モデルの一例を図28の画像Aに示す。図28の画像Aは、奥行が下端に近いほど手前となり上端に近いほど奥となるように式(30)の係数a,b,cを設定した基準となる奥行モデルである。図28の画像Aを、図25の画像Dの代わりに入力した場合に作成される奥行モデルD(t)の結果を図28の画像Bに示す。 In Expression (29), D max is the upper limit value of the depth value, D min is the lower limit value of the depth value, M max is the maximum value of the saliency M (t), and M min is the minimum value of the saliency M (t). The value D base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). Here, an example of the depth model based on the saliency in the modification is shown in FIG. In FIG. 27, image A represents an example of the processing target image F (t), and image B is an example of the region division result (region information M (t)) of the processing target image F (t) obtained by the region dividing unit 303. The image C represents the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305, the image D represents an example of a reference depth model (D base ), and the image E Is an example of the depth model obtained by the depth value setting unit 306 based on the region information R (t) of the image B, the saliency M (t) of the image C, and the depth model (D base ) as a reference of the image D. is there. In the image C of FIG. 27, a bright part (white) represents a part that is easily noticed by a person (high attraction), and a dark part (black) represents a part that is difficult for a person to pay attention (low attraction). In the image E of FIG. 27, the bright part represents the near side, and the dark part represents the back. In addition, regarding the reference depth model (D base ), in the image D of FIG. 27, a plane having the same depth value is given as an example, but the present invention is not limited to this. For example, the plane equation shown in the following equation (30) may be determined in advance, and the reference depth value D base (x, y) may be set by the coordinates (x, y) of each pixel. An example of the depth model serving as a reference expressed by the equation (30) is shown in an image A of FIG. Image A in FIG. 28 is a depth model that serves as a reference in which the coefficients a, b, and c in Expression (30) are set so that the depth is closer to the lower end and the depth is closer to the upper end. The result of the depth model D (t) created when the image A in FIG. 28 is input instead of the image D in FIG. 25 is shown in an image B in FIG.
Figure JPOXMLDOC01-appb-M000031
Figure JPOXMLDOC01-appb-M000031
 以上のように奥行モデル生成部30aは、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度と、画像の領域分割結果(領域情報)に基づいて、領域毎に均一の奥行値を設定することで、奥行の前後関係の誤りを抑制した奥行モデルを生成することができる。 As described above, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model generation unit 30a and the saliency representing the attractiveness in the image based on the visual characteristics of the person and the image segmentation result ( By setting a uniform depth value for each region based on (region information), it is possible to generate a depth model that suppresses errors in the depth context.
(シーンチェンジ検出部10の変形例)
 上記実施形態において、シーンチェンジ検出部10では、シーンチェンジ検出に用いる画像特徴量として、輝度ヒストグラムを用いる場合について説明したが、本発明はこれに限定されない。例えば、輝度ヒストグラムの代わりに、各色成分の出現頻度を表すカラーヒストグラム、フレーム間差分の平均誤差、動きベクトルの分布を画像特徴量として用いてもよい。
(Modification of scene change detection unit 10)
In the above embodiment, the scene change detection unit 10 has been described using the luminance histogram as the image feature amount used for scene change detection, but the present invention is not limited to this. For example, instead of the luminance histogram, a color histogram representing the appearance frequency of each color component, an average error of inter-frame differences, and a motion vector distribution may be used as the image feature amount.
(エッジ検出部211の変形例)
 上記実施形態において、エッジ検出部211では、画像空間において局所的にエッジ強度が極大となる点(Local Maxima)をエッジ点として抽出するエッジ検出について説明したが、本発明はこれに限られない。例えば、Canny Edge detectionなどの公知のエッジ検出手法を用いてもよい。また、微分オペレータ(エッジ検出器)として、ソーベルフィルタ(Sobel filter)、プリューウィットフィルタ(Prewitt filter)、LoGフィルタ(Lapracian of Gaussian)、DoGフィルタ(Difference of Gaussian)、などの公知の手法を用いてもよい。
(Modification of the edge detection unit 211)
In the above embodiment, the edge detection unit 211 has described edge detection in which a point (Local Maxima) where the edge intensity is locally maximum in the image space is extracted as an edge point, but the present invention is not limited to this. For example, a known edge detection method such as Canny Edge detection may be used. Further, as a differential operator (edge detector), a known method such as a Sobel filter, a Prewitt filter, a LoG filter (Laplacian of Gaussian), or a DoG filter (Difference of Gaussian) is used. May be.
(消失点同定部213の変形例)
 上記実施形態において、消失点同定部213では、混合モデルに用いる分布モデルとしてガウス分布を用いる場合について説明したが、本発明はこれに限られない。例えば、分布モデルには指数型分布族(ラプラス分布、ベータ分布、ベルヌーイ分布など)を用いてもよい。また、消失点同定部213は、混合モデルに用いるクラス数Kcを予め定めた値とし、次の一例のように値を決定してもよい。消失点同定部213は、クラス数Kcに予め定めたクラス数Kc′を設定し、K-means法により、クラスタリングを行う。その後、消失点同定部213は、クラス間距離が所定閾値以下(または未満)を満たすクラスCiとクラスCjがある場合は、クラスCiとクラスCjとを併合して、新たなクラスCk′とする処理を行う。消失点同定部213は、この処理を、クラス数が一定値へ収束するまで繰り返すことにより、クラス数Kc(≦Kc′)を決定する。なお、消失点同定部213が交点の分布モデルの推定に用いる手法は、混合モデルなどのパラメトリックの推定手法に限定されず、Mean-shift法、K-means法、K最近傍探索法(近似K最近傍探索法)などのノンパラメトリックの推定手法であってもよい。
(Modification of Vanishing Point Identification Unit 213)
In the said embodiment, although the vanishing point identification part 213 demonstrated the case where Gaussian distribution was used as a distribution model used for a mixed model, this invention is not limited to this. For example, an exponential distribution family (Laplace distribution, beta distribution, Bernoulli distribution, etc.) may be used for the distribution model. The vanishing point identifying unit 213 may determine the number of classes Kc used in the mixed model as a predetermined value and determine the value as in the following example. The vanishing point identifying unit 213 sets a predetermined class number Kc ′ as the class number Kc, and performs clustering by the K-means method. Thereafter, when there is a class Ci and a class Cj where the distance between classes satisfies a predetermined threshold value or less (or less), the vanishing point identifying unit 213 merges the class Ci and the class Cj to form a new class Ck ′. Process. The vanishing point identifying unit 213 determines the number of classes Kc (≦ Kc ′) by repeating this process until the number of classes converges to a constant value. Note that the method used by the vanishing point identifying unit 213 to estimate the distribution model of the intersection is not limited to a parametric estimation method such as a mixed model, but is also a Mean-shift method, a K-means method, a K nearest neighbor search method (approximate K). A non-parametric estimation method such as a nearest neighbor search method may be used.
(領域分割部303の変形例)
 上記実施形態において、領域分割部303では、特徴量空間でのクラスタリングを行う場合について説明したが、本発明はこれに限らず、画像空間でのクラスタリングを行ってもよい。画像空間でのクラスタリングとは、特徴量空間に写像せず、元の画像空間において、画素間、または領域を構成する画素群(領域)間の類似度を基に、領域分割を実施する手法である。例えば、領域分割部303は、(a)画素結合法、(b)領域成長法(Region Growing法ともいう)、(c)領域分割統合法(Split&Merge法ともいう)の手法により、画像空間でのクラスタリングを行ってもよい。
(Modification of the area dividing unit 303)
In the above embodiment, the case in which the region dividing unit 303 performs clustering in the feature amount space has been described, but the present invention is not limited to this, and clustering in the image space may be performed. Clustering in the image space is a method for performing region division based on the similarity between pixels or pixel groups (regions) constituting the region in the original image space without mapping to the feature amount space. is there. For example, the region dividing unit 303 uses (a) pixel combination method, (b) region growth method (also referred to as Region Growing method), and (c) region division integration method (also referred to as Split & Merge method) in the image space. Clustering may be performed.
(顕著度算出部305の変形例)
 上記実施形態において、顕著度算出部305が局所的な色差および大局的な色差に基づいて顕著度を算出する場合について説明したが、本発明はこれに限定されず、局所的な色差(式(21)中の第一項ΔElocal)、または、大局的な色差(式(21)中の第二項ΔEglobal)のいずれか一方の指標に基づいて顕著度を算出してもよい。また、赤-緑の知覚色度a*、黄-青の知覚色度b*を用いずに、明度指数であるL*のみを用いて色差を算出してもよい。この場合は、人の視覚特性において、明るさの対比(コントラスト差)が大きい箇所が誘目性の高いことを表す。また、局所的な色差ΔElocal、および大局的な色差ΔEglobalは、CIE方式に基づいて明度の差ΔL*、クロマの差ΔC*、色相の差ΔH*を用いて、それぞれ式(31)、式(32)によって求めてもよい(例えば、非特許文献4を参照)。
(Modification of saliency calculation unit 305)
In the above embodiment, the case where the saliency calculating unit 305 calculates the saliency based on the local color difference and the global color difference has been described. However, the present invention is not limited to this, and the local color difference (expression ( The saliency may be calculated based on one of the indicators of the first term ΔE local ) in 21) or the global color difference (second term ΔE global in equation (21)). Alternatively, the color difference may be calculated using only the lightness index L * without using the red-green perceptual chromaticity a * and the yellow-blue perceptual chromaticity b *. In this case, in a human visual characteristic, a portion having a large brightness contrast (contrast difference) represents high attractiveness. Further, the local color difference ΔE local and the global color difference ΔE global are expressed by the following equation (31), using the lightness difference ΔL *, chroma difference ΔC *, and hue difference ΔH * based on the CIE method, respectively. You may obtain | require by Formula (32) (for example, refer nonpatent literature 4).
Figure JPOXMLDOC01-appb-M000032
Figure JPOXMLDOC01-appb-M000032
Figure JPOXMLDOC01-appb-M000033
Figure JPOXMLDOC01-appb-M000033
 なお、式(31)の係数w(u,v)は、式(12)と同一である。また、式(31)および式(32)中の係数l、c、hは所定の重み係数である。また、顕著度の求め方は、色差に限定されず、色差、エッジ勾配、動きベクトルなど複数の画像特徴量に基づいて顕著度を算出してもよい(例えば、非特許文献5を参照)。 Note that the coefficient w (u, v) in the equation (31) is the same as that in the equation (12). Further, the coefficients l, c, h in the expressions (31) and (32) are predetermined weighting coefficients. The method of obtaining the saliency is not limited to the color difference, and the saliency may be calculated based on a plurality of image feature amounts such as a color difference, an edge gradient, and a motion vector (for example, see Non-Patent Document 5).
(立体画像生成装置1の第一の変形例)
 上記実施形態において、立体画像生成装置1のシーンチェンジ検出部10、消失点推定部20、及び奥行モデル生成部30で入出力の画像サイズは、入力画像F(t)と同一の画像サイズと仮定して説明してきたが、これに限定されない。例えば、演算量の低減、メモリサイズの低減を図るために、シーンチェンジ検出部10、消失点推定部20、及び奥行モデル生成部30に入力する画像を、予め所定の画像サイズへ縮小し、奥行モデル生成部30より出力される奥行モデルを入力画像サイズへ拡大する処理を追加して実施してもよい。つまり、立体画像生成装置1の第一の変形例(立体画像生成装置2)は、図29に示すように、縮小処理部50、シーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、拡大処理部60、視点画像生成部40によって構成される。縮小処理部50は、本発明の縮小画像生成手段に相当し、入力画像F(t)から所定の画像サイズの縮小画像を生成する。そして、生成された縮小画像は、消失点推定部20と奥行モデル生成部30に入力される。拡大処理部60は、本発明の拡大奥行モデル生成手段に相当し、奥行モデル生成部30により生成された縮小画像の奥行モデルから入力画像F(t)と同一画像サイズの拡大奥行モデルを生成する。
(First Modification of Stereoscopic Image Generation Device 1)
In the above embodiment, the input / output image size of the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 of the stereoscopic image generation device 1 is assumed to be the same as the input image F (t). However, the present invention is not limited to this. For example, in order to reduce the amount of computation and the memory size, images input to the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 are reduced in advance to a predetermined image size, and the depth You may add and implement the process which expands the depth model output from the model production | generation part 30 to input image size. That is, the first modification (stereoscopic image generating apparatus 2) of the stereoscopic image generating apparatus 1 includes a reduction processing unit 50, a scene change detecting unit 10, a vanishing point estimating unit 20, and a depth model generating unit as shown in FIG. 30, an enlargement processing unit 60, and a viewpoint image generation unit 40. The reduction processing unit 50 corresponds to reduced image generation means of the present invention, and generates a reduced image having a predetermined image size from the input image F (t). Then, the generated reduced image is input to the vanishing point estimation unit 20 and the depth model generation unit 30. The enlargement processing unit 60 corresponds to the enlarged depth model generation unit of the present invention, and generates an enlarged depth model having the same image size as the input image F (t) from the depth model of the reduced image generated by the depth model generation unit 30. .
 上記立体画像生成装置2の動作例について、図30に基づいて説明する。なお、図29のシーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、及び視点画像生成部40の各動作(図30のステップS63、ステップS64、ステップS65、ステップS67)はそれぞれ前述の図1に示した立体画像生成装置1のシーンチェンジ検出部10、消失点推定部20、奥行モデル推定部30、及び視点画像生成部40の各動作(前述の図2のステップS12、ステップS13、ステップS14、ステップS15)と同一であるため説明を省略する。 An example of the operation of the stereoscopic image generating apparatus 2 will be described with reference to FIG. Each operation (step S63, step S64, step S65, step S67 in FIG. 30) of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model generation unit 30, and the viewpoint image generation unit 40 in FIG. Each operation of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model estimation unit 30, and the viewpoint image generation unit 40 of the stereoscopic image generation apparatus 1 shown in FIG. 1 (step S12 in FIG. Since it is the same as S13, step S14, and step S15), description thereof is omitted.
 図30において、まず、図29の立体画像生成装置2は、入力された時刻tの画像を縮小処理部50、及び視点画像生成部40へ出力する(図30のステップS61)。 30, first, the stereoscopic image generation device 2 in FIG. 29 outputs the input image at time t to the reduction processing unit 50 and the viewpoint image generation unit 40 (step S61 in FIG. 30).
 図29の縮小処理部50は、入力された処理対象画像F(t)を予め定められた画像サイズへ縮小し、縮小画像Fd(t)をシーンチェンジ検出部10、消失点推定部20、及び奥行モデル生成部30へ出力する(図30のステップS62)。なお、画像の縮小は、例えば、ニアレストネイバ法、バイリニア法、バイキュービック法のいずれかの方法を用いて行う。 29 reduces the input processing target image F (t) to a predetermined image size, converts the reduced image Fd (t) to the scene change detection unit 10, the vanishing point estimation unit 20, and It outputs to the depth model production | generation part 30 (step S62 of FIG. 30). Note that image reduction is performed using, for example, a nearest neighbor method, a bilinear method, or a bicubic method.
 図29の拡大処理部60は、入力された奥行モデルD(t)を入力画像F(t)の画像サイズへ拡大し、拡大奥行モデルDu(t)を視点画像生成部40へ出力する(図30のステップS66)。なお、奥行モデルの拡大は、例えば、ニアレストネイバ法、バイリニア法、バイキュービック法のいずれかを用いて行う。 29 enlarges the input depth model D (t) to the image size of the input image F (t), and outputs the enlarged depth model Du (t) to the viewpoint image generation unit 40 (FIG. 30 step S66). The depth model is enlarged using, for example, the nearest neighbor method, the bilinear method, or the bicubic method.
 上記立体画像生成装置2によれば、入力画像より小さい画像サイズの縮小画像を用いてシーンチェンジ検出処理、消失点推定処理、奥行モデル生成処理を行うため、図1の立体画像生成装置1に比べて、メモリサイズの低減、演算量の低減を図ることができる。 According to the stereoscopic image generating apparatus 2, scene change detection processing, vanishing point estimation processing, and depth model generation processing are performed using a reduced image having an image size smaller than the input image, and therefore, compared with the stereoscopic image generating apparatus 1 of FIG. 1. Thus, the memory size and the amount of calculation can be reduced.
(立体画像生成装置1の第二の変形例)
 上記実施形態において、立体画像生成装置1では、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づき画像の奥行モデルを生成し、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度に基づいて画像の奥行モデルを生成している。そのため、奥行モデル生成手段が切り替わる前後のフレームにおいて、時間方向に奥行モデルが異なるため、視差(奥行)の変化が大きくなると考えられる。また、同様にシーンチェンジが発生する前後のフレームにおいても、時間方向に奥行モデルが異なるため、視差(奥行)の変化が大きくなると考えられる。そこで、立体画像生成装置1の第二の変形例(図31の立体画像生成装置3)では、時間方向の視差の変化を低減するために、奥行モデルを時空間方向に平滑化する時空間方向平滑化部70を、奥行モデル生成部30と視点画像生成部40の間に設ける。つまり、立体画像生成装置3は、図31に示すように、シーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、時空間方向平滑化部70、及び視点画像生成部40によって構成される。この時空間方向平滑化部70は、本発明の時空間方向平滑化手段に相当し、図32に示すように、空間方向平滑化部701、時間方向平滑化部702、及びバッファ703によって構成される。時空間方向平滑部70は、奥行モデル生成部30により生成した処理対象画像F(t)の奥行モデルD(t)を空間方向に平滑化し、空間方向に平滑化された画像F(t)の奥行モデルDs(t)と、画像F(t)よりも過去の比較対象画像F(t-1)の時空間方向に平滑化された奥行モデルDt(t-1)とに基づいて、画像F(t)の奥行モデルDs(t)を時間方向に平滑化し、画像F(t)の時空間方向に平滑化された奥行モデルDt(t)を生成する。
(Second Modification of Stereoscopic Image Generation Device 1)
In the above embodiment, when the vanishing point position can be estimated by the geometric depth cue, the stereoscopic image generation apparatus 1 generates a depth model of the image based on the vanishing point and estimates the vanishing point position by the geometric depth cue. When it is not possible, the depth model of the image is generated based on the saliency representing the attractiveness in the image based on the visual characteristics of the person. For this reason, in the frames before and after the depth model generation unit is switched, the depth model is different in the time direction, so that the change in parallax (depth) is considered to be large. Similarly, in the frames before and after the scene change occurs, since the depth model is different in the time direction, it is considered that the change in parallax (depth) increases. Therefore, in the second modification of the stereoscopic image generating device 1 (stereoscopic image generating device 3 in FIG. 31), the spatio-temporal direction in which the depth model is smoothed in the spatiotemporal direction in order to reduce the change in parallax in the temporal direction. A smoothing unit 70 is provided between the depth model generation unit 30 and the viewpoint image generation unit 40. That is, the stereoscopic image generation device 3 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, a spatio-temporal direction smoothing unit 70, and a viewpoint image generation unit 40 as illustrated in FIG. Is done. The spatio-temporal direction smoothing unit 70 corresponds to the spatio-temporal direction smoothing means of the present invention, and includes a spatial direction smoothing unit 701, a time direction smoothing unit 702, and a buffer 703 as shown in FIG. The The spatiotemporal direction smoothing unit 70 smoothes the depth model D (t) of the processing target image F (t) generated by the depth model generation unit 30 in the spatial direction, and the image F (t) smoothed in the spatial direction. Based on the depth model Ds (t) and the depth model Dt (t−1) smoothed in the spatio-temporal direction of the comparison target image F (t−1) in the past than the image F (t), the image F The depth model Ds (t) of (t) is smoothed in the time direction, and the depth model Dt (t) smoothed in the spatio-temporal direction of the image F (t) is generated.
 上記立体画像生成装置3の動作例について、図33、図34に基づいて説明する。なお、図31のシーンチェンジ検出部10、消失点推定部20、奥行モデル生成部30、及び視点画像生成部40の各動作(図33のステップS72、ステップS73、ステップS74、ステップS76)はそれぞれ前述の図1に示した立体画像生成装置1のシーンチェンジ検出部10、消失点推定部20、奥行モデル推定部30、及び視点画像生成部40の各動作(前述の図2のステップS12、ステップS13、ステップS14、ステップS15)と同一であるため説明を省略する。 An example of the operation of the stereoscopic image generating apparatus 3 will be described with reference to FIGS. In addition, each operation | movement (step S72, step S73, step S74, step S76 of FIG. 33) of the scene change detection part 10, the vanishing point estimation part 20, the depth model generation part 30, and the viewpoint image generation part 40 of FIG. Each operation of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model estimation unit 30, and the viewpoint image generation unit 40 of the stereoscopic image generation apparatus 1 shown in FIG. 1 (step S12 in FIG. Since it is the same as S13, step S14, and step S15), description thereof is omitted.
(時空間方向平滑化部70について)
 図31の時空間方向平滑化部70は、入力された処理対象画像F(t)の奥行モデルD(t)に関して、時空間方向に平滑化処理を行い、その結果(平滑化奥行モデルDt(t)を出力する(図33のステップS75)。
(About the spatio-temporal direction smoothing unit 70)
The spatiotemporal direction smoothing unit 70 in FIG. 31 performs a smoothing process in the spatiotemporal direction on the depth model D (t) of the input processing target image F (t), and the result (smoothed depth model Dt ( t) is output (step S75 in FIG. 33).
 具体的には、図32の空間方向平滑部701は、水平方向、垂直方向、または垂直方向、水平方向の順に1次元の平滑化フィルタにより空間方向に奥行モデルD(t)を平滑化し、その結果(奥行モデルDs(t))を時間方向平滑部702へ出力する(図34のステップS81)。なお、1次元の平滑化フィルタは、例えば、1次元のガウシアンフィルタを用いる。 Specifically, the spatial direction smoothing unit 701 in FIG. 32 smoothes the depth model D (t) in the spatial direction using a one-dimensional smoothing filter in the order of the horizontal direction, the vertical direction, or the vertical direction and the horizontal direction. The result (depth model Ds (t)) is output to the time direction smoothing unit 702 (step S81 in FIG. 34). As the one-dimensional smoothing filter, for example, a one-dimensional Gaussian filter is used.
 図32の時間方向平滑部702は、入力された空間方向に平滑化された奥行モデルDs(t)と、バッファ703に記憶された前フレームの平滑化奥行モデルDt(t-1)とに基づいて、下記の式(33)により平滑化奥行モデルDt(t)を生成し、その結果をバッファ703、及び外部へ出力する(図34のステップS82)。なお、式(33)中の係数αは、0~1の間の所定の値である。 32 is based on the input depth model Ds (t) smoothed in the spatial direction and the smoothed depth model Dt (t−1) of the previous frame stored in the buffer 703. Then, the smoothed depth model Dt (t) is generated by the following equation (33), and the result is output to the buffer 703 and the outside (step S82 in FIG. 34). The coefficient α in the equation (33) is a predetermined value between 0 and 1.
Figure JPOXMLDOC01-appb-M000034
Figure JPOXMLDOC01-appb-M000034
 図32のバッファ703は、前フレームの平滑化奥行モデルDt(t-1)を削除し、入力された現フレームの平滑化奥行モデルDt(t)を記憶する(図34のステップS83)。 32 deletes the smoothed depth model Dt (t−1) of the previous frame and stores the input smoothed depth model Dt (t) of the current frame (step S83 in FIG. 34).
 上記立体画像生成装置3によれば、時空間方向に奥行モデルを平滑化することにより、奥行モデル生成手段が切り替わる前後のフレーム、及びシーンチェンジが発生する前後のフレームにおいて、視差(奥行)の変化を低減することができる。 According to the stereoscopic image generating device 3, by changing the depth model in the spatio-temporal direction, the change in parallax (depth) in the frames before and after the depth model generating means is switched and in the frames before and after the scene change occurs. Can be reduced.
 以上、本発明に係る立体画像生成装置の各実施形態を中心に説明してきたが、本発明は、立体画像生成装置1による立体画像生成方法の形態とすることもできる。また、この立体画像生成方法をコンピュータに実行させるためのプログラムの形態としてもよい。 As described above, the embodiments of the stereoscopic image generating apparatus according to the present invention have been mainly described. However, the present invention may be in the form of a stereoscopic image generating method by the stereoscopic image generating apparatus 1. Moreover, it is good also as a form of the program for making a computer perform this stereo image production | generation method.
 つまり、上述した実施形態における立体画像生成装置1の一部をコンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、立体画像生成装置1に内蔵されたコンピュータシステムであって、OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 That is, a part of the stereoscopic image generating apparatus 1 in the above-described embodiment may be realized by a computer. In that case, a program for realizing this control function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” is a computer system built in the stereoscopic image generation apparatus 1 and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
 また、上述した実施形態における立体画像生成装置1の一部、または全部を、LSI(Large Scale Integration)等の集積回路として実現しても良い。立体画像生成装置1の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化しても良い。また、集積回路化の手法はLSIに限らず専用回路、または汎用プロセッサで実現しても良い。また、半導体技術の進歩によりLSIに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いても良い。 Further, a part or all of the stereoscopic image generating apparatus 1 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the stereoscopic image generating apparatus 1 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
 以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to that described above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to
1、2、3…立体画像生成装置、10…シーンチェンジ検出部、20…消失点推定部、21…フレーム内消失点推定部、22…フレーム間消失点推定部、30…奥行モデル生成部、40…視点画像生成部、50…縮小処理部、60…拡大処理部、70…時空間方向平滑化部、101…輝度ヒストグラム生成部、102,203,204、703…バッファ、103…ヒストグラム類似度算出部、104…シーンチェンジ判定部、201,202,301,302…切替部、211…エッジ検出部、212…直線検出部、213…消失点同定部、221…特徴点検出部、222…対応点算出部、223…変換行列算出部、224…消失点位置算出部、303…領域分割部、304…距離算出部、305…顕著度算出部、306…奥行値設定部、401…視差ベクトル算出部、402…テクスチャシフト部、403…ギャップフィリング部、404…フローティングウィンドウ重畳部、701…空間方向平滑部、702…時間方向平滑化部。 1, 2, 3, 3D image generation device, 10, scene change detection unit, 20, vanishing point estimation unit, 21, intra-frame vanishing point estimation unit, 22, interframe vanishing point estimation unit, 30, depth model generation unit, 40: viewpoint image generation unit, 50 ... reduction processing unit, 60 ... enlargement processing unit, 70 ... spatiotemporal direction smoothing unit, 101 ... luminance histogram generation unit, 102, 203, 204, 703 ... buffer, 103 ... histogram similarity Calculation unit 104... Scene change determination unit 201, 202, 301, 302... Switching unit 211 211 Edge detection unit 212 Straight line detection unit 213 Vanishing point identification unit 221 Feature point detection unit 222 Point calculation unit, 223 ... transformation matrix calculation unit, 224 ... vanishing point position calculation unit, 303 ... area division unit, 304 ... distance calculation unit, 305 ... saliency calculation unit, 306 ... depth value setting unit, 01 ... disparity vector calculation unit, 402 ... texture shift unit, 403 ... gap filling section, 404 ... floating window superimposing unit, 701 ... spatial direction smoothing unit, 702 ... time direction smoothing unit.

Claims (13)

  1.  2D画像に両眼立体情報を付加し、3D画像を生成する立体画像生成装置であって、
     処理対象画像から消失点を推定する消失点推定手段と、
     該消失点推定手段により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成手段と、
     該奥行モデル生成手段により生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成手段とを備え、
     前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴とする立体画像生成装置。
    A stereoscopic image generating apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image,
    Vanishing point estimating means for estimating vanishing points from the processing target image;
    A depth model generating means for generating a different depth model based on whether or not the vanishing point can be estimated by the vanishing point estimating means;
    A viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image based on the depth model generated by the depth model generation unit, the processing target image, and the assumed viewing condition information;
    When the vanishing point can be estimated by the vanishing point estimating unit, the depth model generating unit generates a depth model based on the vanishing point, and when the vanishing point cannot be estimated by the vanishing point estimating unit, A three-dimensional image generation apparatus that generates a depth model based on the saliency of each pixel in the processing target image.
  2.  処理対象画像から所定の画像サイズの縮小画像を生成する縮小画像生成手段を備え、
     前記縮小画像を、前記消失点推定手段と前記奥行モデル生成手段の入力とし、該奥行モデル生成手段により生成した前記縮小画像の奥行モデルから前記処理対象画像と同一画像サイズの拡大奥行モデルを生成する拡大奥行モデル生成手段を備えることを特徴とする請求項1に記載の立体画像生成装置。
    A reduced image generating means for generating a reduced image of a predetermined image size from the processing target image;
    The reduced image is input to the vanishing point estimating means and the depth model generating means, and an enlarged depth model having the same image size as the processing target image is generated from the depth model of the reduced image generated by the depth model generating means. The stereoscopic image generation apparatus according to claim 1, further comprising an enlarged depth model generation unit.
  3.  前記奥行モデル生成手段により生成した処理対象画像の奥行モデルを空間方向に平滑化し、該空間方向に平滑化された前記処理対象画像の奥行モデルと、該処理対象画像よりも過去の比較対象画像の時空間方向に平滑化された奥行モデルとに基づいて、前記処理対象画像の奥行モデルを時間方向に平滑化し、前記処理対象画像の時空間方向に平滑化された奥行モデルを生成する時空間方向平滑化手段を備えることを特徴とする請求項1に記載の立体画像生成装置。 The depth model of the processing target image generated by the depth model generation means is smoothed in the spatial direction, the depth model of the processing target image smoothed in the spatial direction, and a comparison target image in the past than the processing target image. Based on the depth model smoothed in the spatiotemporal direction, the depth model of the processing target image is smoothed in the temporal direction, and the depth model smoothed in the spatiotemporal direction of the processing target image is generated. The stereoscopic image generating apparatus according to claim 1, further comprising a smoothing unit.
  4.  前記想定視聴条件情報は、前記3D画像を表示するディスプレイの画素ピッチ、該ディスプレイの画像サイズ、視聴者から前記ディスプレイまでの距離、前記3D画像の奥行量を表す視差範囲、左右の仮想視点間の距離である基線長を含むことを特徴とする請求項1~3のいずれか1項に記載の立体画像生成装置。 The assumed viewing condition information includes the pixel pitch of the display that displays the 3D image, the image size of the display, the distance from the viewer to the display, the parallax range that represents the depth of the 3D image, and between the left and right virtual viewpoints The three-dimensional image generation apparatus according to any one of claims 1 to 3, further comprising a base line length that is a distance.
  5.  前記処理対象画像内の各画素の顕著度は、注目画素とその周辺画素との色差が大きい箇所、あるいは、注目画素と画像全体との色差が大きい箇所、あるいは、注目画素を含む局所領域とその周辺領域との色差が大きい箇所ほど高く算出されることを特徴とする請求項1~4のいずれか1項に記載の立体画像生成装置。 The degree of saliency of each pixel in the processing target image is a location where the color difference between the target pixel and its surrounding pixels is large, a location where the color difference between the target pixel and the entire image is large, or a local region including the target pixel and its The three-dimensional image generation apparatus according to any one of claims 1 to 4, wherein a part having a larger color difference from the surrounding area is calculated to be higher.
  6.  前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度が高い箇所が手前側になるように奥行モデルを生成することを特徴とする請求項5に記載の立体画像生成装置。 When the vanishing point cannot be estimated by the vanishing point estimating unit, the depth model generating unit generates the depth model such that a location where the saliency of each pixel in the processing target image is high is on the near side. The stereoscopic image generating apparatus according to claim 5, wherein the apparatus is a stereoscopic image generating apparatus.
  7.  前記消失点推定手段は、前記処理対象画像内の直線情報から該処理対象画像の消失点を推定するフレーム内消失点推定手段と、前記処理対象画像と該処理対象画像よりも過去の比較対象画像と該比較対象画像における消失点の位置とに基づいて、前記処理対象画像の消失点を推定するフレーム間消失点推定手段とを備えることを特徴とする請求項1~6のいずれか1項に記載の立体画像生成装置。 The vanishing point estimating means includes an intra-frame vanishing point estimating means for estimating a vanishing point of the processing target image from straight line information in the processing target image, and a comparison target image that is earlier than the processing target image and the processing target image. 7. The inter-frame vanishing point estimating means for estimating the vanishing point of the processing target image based on the vanishing point position in the comparison target image. The three-dimensional image generation device described.
  8.  前記処理対象画像と前記比較対象画像との間でシーンチェンジがあったか否かを検出するシーンチェンジ検出手段を備え、該シーンチェンジ検出手段によりシーンチェンジが検出された場合、前記フレーム内消失点推定手段が選択され、前記シーンチェンジ検出手段によりシーンチェンジが検出されない場合、前記フレーム間消失点推定手段が選択されることを特徴とする請求項7に記載の立体画像生成装置。 A scene change detecting means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image, and when a scene change is detected by the scene change detecting means, the in-frame vanishing point estimating means; The stereoscopic image generating apparatus according to claim 7, wherein when a scene change is detected and no scene change is detected by the scene change detection unit, the inter-frame vanishing point estimation unit is selected.
  9.  前記比較対象画像の消失点の位置を含む消失点情報を記憶する記憶手段を備え、該記憶手段に前記比較対象画像の消失点情報が記憶されている場合、前記フレーム間消失点推定手段が選択され、前記記憶手段に前記比較対象画像の消失点情報が記憶されていない場合、前記フレーム内消失点推定手段が選択されることを特徴とする請求項8に記載の立体画像生成装置。 A storage unit that stores vanishing point information including the position of the vanishing point of the comparison target image, and the vanishing point information of the comparison target image is stored in the storage unit; The stereoscopic image generating apparatus according to claim 8, wherein, when the vanishing point information of the comparison target image is not stored in the storage unit, the intra-frame vanishing point estimation unit is selected.
  10.  前記比較対象画像は、前記処理対象画像の1つ前の画像であることを特徴とする請求項7~9のいずれか1項に記載の立体画像生成装置。 10. The three-dimensional image generation apparatus according to claim 7, wherein the comparison target image is an image immediately before the processing target image.
  11.  2D画像に両眼立体情報を付加し、3D画像を生成する立体画像生成装置による立体画像生成方法であって、
     前記立体画像生成装置が、処理対象画像から消失点を推定する消失点推定ステップと、
     該消失点推定ステップにて消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成ステップと、
     該奥行モデル生成ステップにて生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成ステップとを備え、
     前記奥行モデル生成ステップは、前記消失点推定ステップにて消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定ステップにて消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴とする立体画像生成方法。
    A stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image,
    The three-dimensional image generation device, a vanishing point estimation step for estimating a vanishing point from the processing target image,
    A depth model generation step for generating a different depth model based on whether or not the vanishing point could be estimated in the vanishing point estimation step;
    A viewpoint image generation step of generating a right eye presentation image and a left eye presentation image based on the depth model generated in the depth model generation step, the processing target image, and the assumed viewing condition information;
    In the depth model generation step, when the vanishing point can be estimated in the vanishing point estimation step, a depth model is generated based on the vanishing point, and the vanishing point cannot be estimated in the vanishing point estimation step. A depth model is generated based on the saliency of each pixel in the processing target image.
  12.  コンピュータに、請求項11に記載の立体画像生成方法を実行させるためのプログラム。 A program for causing a computer to execute the stereoscopic image generation method according to claim 11.
  13.  請求項12に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the program according to claim 12 is recorded.
PCT/JP2012/059043 2011-06-13 2012-04-03 Apparatus for generating three-dimensional image, method for generating three-dimensional image, program, and recording medium WO2012172853A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011131255A JP5210416B2 (en) 2011-06-13 2011-06-13 Stereoscopic image generating apparatus, stereoscopic image generating method, program, and recording medium
JP2011-131255 2011-06-13

Publications (1)

Publication Number Publication Date
WO2012172853A1 true WO2012172853A1 (en) 2012-12-20

Family

ID=47356851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/059043 WO2012172853A1 (en) 2011-06-13 2012-04-03 Apparatus for generating three-dimensional image, method for generating three-dimensional image, program, and recording medium

Country Status (2)

Country Link
JP (1) JP5210416B2 (en)
WO (1) WO2012172853A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109724586A (en) * 2018-08-21 2019-05-07 南京理工大学 A kind of spacecraft relative pose measurement method of fusion depth map and point cloud
CN111292414A (en) * 2020-02-24 2020-06-16 当家移动绿色互联网技术集团有限公司 Method and device for generating three-dimensional image of object, storage medium and electronic equipment
CN114267067A (en) * 2021-12-24 2022-04-01 北京的卢深视科技有限公司 Face recognition method based on continuous frame images, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015035658A (en) 2013-08-07 2015-02-19 キヤノン株式会社 Image processing apparatus, image processing method, and imaging apparatus
TWI798314B (en) * 2017-12-28 2023-04-11 日商東京威力科創股份有限公司 Data processing device, data processing method, and data processing program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10108220A (en) * 1996-09-26 1998-04-24 Sanyo Electric Co Ltd Device for converting two-dimensional image into three-dimensional image
JPH11239364A (en) * 1996-08-07 1999-08-31 Sanyo Electric Co Ltd Method for stereoscopic sense adjustment and its device
JP2001103513A (en) * 1999-09-27 2001-04-13 Sanyo Electric Co Ltd Method for converting two-dimensional video image into three-dimensional video image
JP2001359119A (en) * 2000-06-15 2001-12-26 Toshiba Corp Stereoscopic video image generating method
JP2003030626A (en) * 2001-07-18 2003-01-31 Toshiba Corp Image processor and method therefor
JP2009038794A (en) * 2007-07-06 2009-02-19 Panasonic Corp Image processor, image processing method, image processing system, program, recording medium, and integrated circuit
JP2011501496A (en) * 2007-10-11 2011-01-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Method and apparatus for processing a depth map

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4133683B2 (en) * 2003-08-26 2008-08-13 シャープ株式会社 Stereoscopic image recording apparatus, stereoscopic image recording method, stereoscopic image display apparatus, and stereoscopic image display method
JP4493631B2 (en) * 2006-08-10 2010-06-30 富士フイルム株式会社 Trimming apparatus and method, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11239364A (en) * 1996-08-07 1999-08-31 Sanyo Electric Co Ltd Method for stereoscopic sense adjustment and its device
JPH10108220A (en) * 1996-09-26 1998-04-24 Sanyo Electric Co Ltd Device for converting two-dimensional image into three-dimensional image
JP2001103513A (en) * 1999-09-27 2001-04-13 Sanyo Electric Co Ltd Method for converting two-dimensional video image into three-dimensional video image
JP2001359119A (en) * 2000-06-15 2001-12-26 Toshiba Corp Stereoscopic video image generating method
JP2003030626A (en) * 2001-07-18 2003-01-31 Toshiba Corp Image processor and method therefor
JP2009038794A (en) * 2007-07-06 2009-02-19 Panasonic Corp Image processor, image processing method, image processing system, program, recording medium, and integrated circuit
JP2011501496A (en) * 2007-10-11 2011-01-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Method and apparatus for processing a depth map

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109724586A (en) * 2018-08-21 2019-05-07 南京理工大学 A kind of spacecraft relative pose measurement method of fusion depth map and point cloud
CN109724586B (en) * 2018-08-21 2022-08-02 南京理工大学 Spacecraft relative pose measurement method integrating depth map and point cloud
CN111292414A (en) * 2020-02-24 2020-06-16 当家移动绿色互联网技术集团有限公司 Method and device for generating three-dimensional image of object, storage medium and electronic equipment
CN114267067A (en) * 2021-12-24 2022-04-01 北京的卢深视科技有限公司 Face recognition method based on continuous frame images, electronic equipment and storage medium

Also Published As

Publication number Publication date
JP2013005025A (en) 2013-01-07
JP5210416B2 (en) 2013-06-12

Similar Documents

Publication Publication Date Title
US10818026B2 (en) Systems and methods for hybrid depth regularization
TWI455062B (en) Method for 3d video content generation
Cheng et al. Spatio-temporally consistent novel view synthesis algorithm from video-plus-depth sequences for autostereoscopic displays
JP2015188234A (en) Depth estimation based on global motion
US20150379720A1 (en) Methods for converting two-dimensional images into three-dimensional images
KR20090080556A (en) Complexity-adaptive 2d-to-3d video sequence conversion
JP5210416B2 (en) Stereoscopic image generating apparatus, stereoscopic image generating method, program, and recording medium
Reel et al. Joint texture-depth pixel inpainting of disocclusion holes in virtual view synthesis
Kim et al. Joint-adaptive bilateral depth map upsampling
Jung A modified model of the just noticeable depth difference and its application to depth sensation enhancement
Chai et al. Roundness-preserving warping for aesthetic enhancement-based stereoscopic image editing
KR101125061B1 (en) A Method For Transforming 2D Video To 3D Video By Using LDI Method
Liu et al. An enhanced depth map based rendering method with directional depth filter and image inpainting
Wang et al. Example-based video stereolization with foreground segmentation and depth propagation
Yang et al. Depth map generation using local depth hypothesis for 2D-to-3D conversion
Sun et al. Seamless view synthesis through texture optimization
KR101619327B1 (en) Joint-adaptive bilateral depth map upsampling
US20130229408A1 (en) Apparatus and method for efficient viewer-centric depth adjustment based on virtual fronto-parallel planar projection in stereoscopic images
Bharathi et al. 2D-to-3D Conversion of Images Using Edge Information
Chen et al. 2D-to-3D conversion system using depth map enhancement
Wei et al. Iterative depth recovery for multi-view video synthesis from stereo videos
KR101608804B1 (en) A Distortion Correction Method of Boundary Lines in Disparity Map for Virtual View Quality Enhancement
KR101626679B1 (en) Method for generating stereoscopic image from 2D image and for medium recording the same
Pourazad et al. Conversion of H. 264-encoded 2D video to 3D format
Liu et al. Stereoscopic view synthesis based on region-wise rendering and sparse representation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12801174

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12801174

Country of ref document: EP

Kind code of ref document: A1