WO2012098803A1 - Image processing device, image processing method, and program - Google Patents

Image processing device, image processing method, and program Download PDF

Info

Publication number
WO2012098803A1
WO2012098803A1 PCT/JP2011/079613 JP2011079613W WO2012098803A1 WO 2012098803 A1 WO2012098803 A1 WO 2012098803A1 JP 2011079613 W JP2011079613 W JP 2011079613W WO 2012098803 A1 WO2012098803 A1 WO 2012098803A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
change
scene
processing apparatus
image processing
Prior art date
Application number
PCT/JP2011/079613
Other languages
French (fr)
Japanese (ja)
Inventor
允宣 中村
岳彦 指田
Original Assignee
コニカミノルタホールディングス株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by コニカミノルタホールディングス株式会社 filed Critical コニカミノルタホールディングス株式会社
Publication of WO2012098803A1 publication Critical patent/WO2012098803A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2004Aligning objects, relative positioning of parts

Definitions

  • the present invention relates to an image processing technique.
  • 3D televisions that use moving images capable of stereoscopic viewing
  • 3D moving images or stereoscopic viewing movies are in the spotlight.
  • 3D television two images obtained by viewing the same object from different viewpoints are used to display an image that can be stereoscopically viewed (also referred to as a 3D image or a stereoscopic image).
  • the position of the pixel indicating the same part of the object is shifted between the image for the left eye and the image for the right eye, and the depth adjustment of the image is performed using the focus adjustment function of the human eye. A feeling is given to the user.
  • the shift amount of the pixel position indicating the same part of the object between the image for the left eye and the image for the right eye is also referred to as “parallax”.
  • Such 3D image technology has been adopted in various video fields.
  • an endoscope apparatus has been proposed that enables stereoscopic viewing of an image over a wide field of view by adjusting parallax detected from a stereo image to fall within the fusion range of human eyes (for example, Patent Document 1).
  • a stereoscopic video processing apparatus that displays a stereoscopic reference image when a stereoscopic video is displayed and the sense of depth is adjusted (for example, Patent Document 2).
  • the parallax is small to some extent, it may be difficult for the user to obtain a sense of depth depending on the size of the screen. That is, even if the object is the same, it may appear different to the user between when it is actually viewed and when it is viewed on the 3D image.
  • JP-A-8-313825 Japanese Patent Laid-Open No. 11-155155 International Publication No. 2003/093023
  • the present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for improving a sense of depth obtained by a user watching a 3D moving image.
  • the image processing apparatus includes a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame.
  • a first acquisition unit that acquires video information including; a change detection unit that detects a change in a scene in the video information; and the reference image of one or more frames after the scene change in the video information;
  • a determination unit that determines a reference shift amount based on a shift amount of a pixel position that indicates the same portion of the object between the corresponding image and a position of the pixel that indicates the same portion of the display target is the reference in the one direction.
  • a second acquisition unit that acquires region image information related to a reference region image and a corresponding region image having a relationship of deviation, and each frame after the change of the scene in the video information, Comprising a reference image, the corresponding image, the reference region image, and said corresponding area image to generate a stereoscopic image information by synthesizing the composite section.
  • An image processing device is the image processing device according to the first aspect, wherein the stereoscopic image information includes the reference image, the corresponding image, the reference region image, and the corresponding region on one screen.
  • Information in a first format that can be displayed at the same time in a manner in which images are superimposed, and one or more of the reference image, the corresponding image, the reference area image, and the corresponding area image on one screen It includes information of at least one of the information of the second format that makes it possible to display the image and one or more remaining images in time sequence.
  • An image processing apparatus is the image processing apparatus according to the first or second aspect, wherein the determination unit detects a first change of a scene by the change detection unit of the video information. Based on a shift amount of a pixel position indicating the same portion of the object between the reference image and the corresponding image of one or more frames in one scene until the second change in the scene is detected A shift amount is determined, and the combining unit combines the reference image, the corresponding image, the reference region image, and the corresponding region image for all the frames in the one scene of the video information to generate a stereoscopic view. Generate image information.
  • An image processing device is the image processing device according to the third aspect, wherein the determination unit includes the reference image and the corresponding image of the first frame in the one scene of the video information.
  • the reference shift amount is determined on the basis of the shift amount of the pixel position indicating the same part of the object between and.
  • An image processing device is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information
  • the reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same part of the object from the image.
  • An image processing apparatus is the image processing apparatus according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule.
  • a region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes:
  • the reference shift amount is determined based on a shift amount of a pixel position indicating the same portion of the object between the reference attention region and the corresponding attention region of the first frame of the scene.
  • An image processing device is the image processing device according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule.
  • a region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes:
  • the reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the scene.
  • An image processing device is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information A representative value of a virtual distance from a virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object from the image, and the representative value of the virtual distance is a predetermined value In this case, the amount of deviation corresponding to the virtual reference plane is determined as the reference amount of deviation.
  • An image processing apparatus is the image processing apparatus according to the third aspect, wherein the determination unit and the correspondence with the reference image of one or more frames in the one scene of the video information The distribution of the first shift amount of the pixel position indicating the same part of the object between the image and the reference image and the corresponding image of one or more frames in the previous scene before the first change in the video information The reference shift amount is determined based on the distribution of the second shift amount of the pixel position indicating the same part of the object between and.
  • An image processing apparatus is the image processing apparatus according to the ninth aspect, wherein the determining unit includes a first deviation representative value relating to the distribution of the first deviation amount, and the second deviation amount.
  • the reference deviation amount is determined on the basis of the second deviation representative value relating to the distribution.
  • An image processing device is the image processing device according to the ninth aspect, wherein the determining unit and the correspondence are included in the reference image included in the frame group of the previous scene in the video information.
  • a first representative value of a virtual distance from the first virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object with respect to the image, and the video information From the second virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image included in the frame group of the scene and the corresponding image
  • a second representative value of the virtual distance is calculated, and a deviation amount corresponding to the second virtual reference plane when the difference between the first representative value and the second representative value is within a predetermined allowable range is Determined as the standard deviation amount.
  • An image processing apparatus is the image processing apparatus according to the ninth aspect, wherein the reference image and the corresponding image of each frame in the video information are determined according to a preset detection rule.
  • a region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: From the first virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the previous scene A first representative value of the virtual distance of the image, and a pixel indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the one scene of the video information A second representative value of a virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the displacement amount of the position, and a difference between the first representative value and the second
  • An image processing device is the image processing device according to the first or second aspect, wherein the determining unit is between each frame in the vicinity of the scene change in the video information.
  • the reference shift amount is determined for each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount, and the second acquisition unit is configured to determine each of the video information in the vicinity of the change in the scene.
  • An image processing device is the image processing device according to any one of the first to thirteenth aspects, wherein the change detection unit is between two or more frames included in the video information. The change of the scene is detected according to the change of the image.
  • An image processing device is the image processing device according to the fourteenth aspect, wherein the change in the image is one or more of a change in luminance, a change in color, and a change in frequency component. including.
  • An image processing device is the image processing device according to the fourteenth aspect, wherein the change detection unit identifies a human face captured by a plurality of frames included in the video information. The change of the scene is detected according to the change of the person between the plurality of frames.
  • An image processing device is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information. The change of the scene is detected according to the change of the attention area where the object is captured.
  • An image processing device is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information.
  • the change in the scene is detected in response to at least one of the appearance and disappearance of the attention area capturing the target object.
  • An image processing device is the image processing device according to any one of the first to eighteenth aspects, wherein the change detection unit is focused on a plurality of frames included in the video information. The change of the scene is detected according to the change of the in-focus area.
  • An image processing device is the image processing device according to any one of the first to nineteenth aspects, wherein the video information includes information related to sound, and the change detection unit includes the change detection unit, The change of the scene is detected according to the change of the voice.
  • An image processing device is the image processing device according to the twentieth aspect, wherein the change in the sound is one or more of a change in volume, a change in frequency component, and a change in voiceprint. including.
  • An image processing apparatus is the image processing apparatus according to any one of the first to twenty-first aspects, wherein the video information includes metadata, and the change detection unit includes the metadata. The change of the scene is detected according to the change of.
  • the image processing device is the image processing device according to the twenty-second aspect, wherein the metadata includes one or more information of caption information, chapter information, and shooting condition information,
  • the change in the metadata includes one or more of a change in caption information, a change in chapter information, and a change in shooting conditions.
  • the image processing method obtains video information including (a) a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame. (B) detecting a scene change in the video information; and (c) one or more frames of the reference image and the corresponding image after the scene change in the video information.
  • a step of determining a reference shift amount based on a shift amount of a pixel position indicating the same part of the object in between, and (d) a position of the pixel indicating the same part of the display object is the reference shift amount in the one direction.
  • the program according to the twenty-fifth aspect is executed by a control unit included in the information processing apparatus, thereby causing the information processing apparatus to function as an image processing apparatus according to any one of the first to twenty-third aspects.
  • the image processing apparatus makes it easy to recognize the difference in parallax in the 3D moving image, so that the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
  • the image processing apparatus even if various display aspects are adopted, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.
  • the image processing device since a common image having a reference parallax is added to one scene of a 3D moving image, an excessive change in the image is suppressed, and the burden on the user's eyes is reduced. Can be reduced, and the amount of computation can be reduced.
  • a process of adding a common image having a reference parallax to a scene of a 3D moving image can be performed in real time.
  • the image processing apparatus can reduce the user's uncomfortable feeling in one scene of the 3D moving image.
  • the image processing apparatus can reduce the user's uncomfortable feeling when the scene changes in the 3D moving image.
  • an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image in accordance with a change in the image.
  • an image having a parallax serving as a reference suitable for a scene can be synthesized into a 3D moving image according to a change in sound.
  • an image having a parallax that is a reference suitable for a scene can be combined with a 3D moving image in accordance with changes in metadata.
  • FIG. 1 is a diagram illustrating an example of a left eye image and a right eye image included in a 3D moving image.
  • FIG. 2 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added.
  • FIG. 3 is a diagram illustrating an example of a left-eye image and a right-eye image included in a 3D moving image.
  • FIG. 4 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added.
  • FIG. 5 is a diagram illustrating a schematic configuration of an information processing system according to an embodiment.
  • FIG. 6 is a block diagram illustrating a functional configuration according to the image processing apparatus.
  • FIG. 1 is a diagram illustrating an example of a left eye image and a right eye image included in a 3D moving image.
  • FIG. 2 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added.
  • FIG. 7 is a diagram for explaining a first change detection method for detecting a change in a scene.
  • FIG. 8 is a diagram for explaining a first change detection method for detecting a change in a scene.
  • FIG. 9 is a diagram for explaining a second change detection method for detecting a change in a scene.
  • FIG. 10 is a diagram for explaining a second change detection method for detecting a change in a scene.
  • FIG. 11 is a diagram for explaining a third change detection method for detecting scene changes.
  • FIG. 12 is a diagram for explaining a third change detection method for detecting a change in a scene.
  • FIG. 13 is a diagram for explaining a fourth change detection method for detecting scene changes.
  • FIG. 14 is a diagram for explaining a fourth change detection method for detecting a change in a scene.
  • FIG. 15 is a diagram for explaining a fifth change detection method for detecting a change in a scene.
  • FIG. 16 is a diagram for explaining a fifth change detection method for detecting a change in a scene.
  • FIG. 17 is a diagram for explaining a sixth change detection method for detecting a change in a scene.
  • FIG. 18 is a diagram for explaining a sixth change detection method for detecting scene changes.
  • FIG. 19 is a diagram for explaining a first determination method of the reference deviation amount.
  • FIG. 20 is a diagram for explaining a first determination method of the reference deviation amount.
  • FIG. 21 is a diagram for explaining a third determination method of the reference deviation amount.
  • FIG. 15 is a diagram for explaining a fifth change detection method for detecting a change in a scene.
  • FIG. 16 is a diagram for explaining a fifth change detection method for detecting a change in a scene
  • FIG. 22 is a diagram for explaining a third determination method of the reference deviation amount.
  • FIG. 23 is a diagram for explaining a fourth determination method of the reference deviation amount.
  • FIG. 24 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 25 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 26 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 27 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 28 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 29 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 30 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added.
  • FIG. 31 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added.
  • FIG. 32 is a flowchart showing the operation of the image processing apparatus.
  • FIG. 33 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added.
  • FIG. 34 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added.
  • FIG. 35 is a diagram for explaining a method of determining a reference deviation amount according to a modification.
  • a stereoscopically viewable video also referred to as a 3D video
  • N is an arbitrary natural number
  • a left-eye N-th image also referred to as an N-th left-eye image
  • a right-eye N-th image also referred to as an N-th right-eye image
  • the one direction is a direction that can match the separation direction of the human left eye and right eye, and is set, for example, in the horizontal direction of the image.
  • the Nth left-eye image GN L and the Nth right-eye image GN R can be acquired, for example, by photographing using a stereo camera.
  • the stereo camera has two cameras corresponding to a human left eye and a right eye.
  • the n-th left eye image Gn L includes regions (also referred to as object regions) O1 L , O2 L , and O3 L that indicate three objects
  • the n-th right eye image Gn R includes three object regions.
  • a state in which O1 R , O2 R , and O3 R are included is shown.
  • the object areas O1 L and O1 R indicate the same person
  • the object areas O2 L and O2 R indicate the same object (here, pylon)
  • the object areas O3 L and O3 R indicate the other same object (here, the ball). .
  • the object areas O1 R , O2 R , and O3 R occupying the n-th right eye image Gn R with respect to the positions of the object areas O1 L , O2 L , and O3 L occupying the n-th left-eye image Gn L are shown. The position is shifted to the left.
  • positions corresponding to the outer edges of the object regions O1 L , O2 L , and O3 L occupying the n-th left eye image Gn L so as to facilitate comparison. are marked with a thin broken line.
  • the amount of deviation between the position of the n-th object region O1 occupy the position and the n right-eye image Gn R object region O1 L occupying the left-eye image Gn L R (also referred to as parallax) it is small to some extent Can be considered.
  • the amount of deviation between the position of the n left-eye image Gn object area occupying L O2 L position and the object region O2 R occupying the n-th right-eye image Gn R can be considered if somewhat less .
  • the amount of deviation between the position of the n left-eye image Gn object area occupying L O3 L position and the object region O3 R occupying the n-th right-eye image Gn R can be considered if somewhat small.
  • the object indicated by the object regions O1 L , O2 L , O3 L , O1 R , O2 R , O3 R Users may have difficulty getting a sense of depth. In other words, even when the object is the same, it is possible to give the user a sense of depth different from that of the actual object when the object is viewed on the 3D image as compared with the case where the object is actually viewed.
  • a reference parallax is applied to the Nth left eye image GN L and the Nth right eye image GN R of each frame.
  • An image also referred to as a reference parallax image
  • the feeling of depth obtained by the user who is viewing the 3D image can be improved by comparison with the reference parallax image.
  • the reference parallax image to be added to the n-th frame includes a first n corresponding region image an In R for the n-th reference region image an In L and the right eye for the left eye.
  • the n-th reference area image In L and the n-th corresponding area image In R are generated according to the reference shift amount, and have a relationship in which the positions of pixels indicating the same portion of the display object are shifted in one direction.
  • the reference deviation amount can be determined based on, for example, the Nth left-eye image GN L and the Nth right-eye image GN R in one or more frames of the scene to which the nth frame belongs.
  • On the left side of FIG. 2 is a position around an image area (also referred to as an n-th left-eye image area) TAn L corresponding to the n-th left-eye image Gn L (an n-th reference combined position Pn L described later).
  • An example of a combined first n reference region image an in L n-th left-eye synthesized image GSn L synthesized is shown.
  • An example of the composite image GSn R for the n-th right eye in which the n-th corresponding region image In R is combined in accordance with () is shown.
  • n left-eye synthesized image GSn L and the n right-eye synthesized image GSn R is displayed as an image combined or the n left-eye synthesized image GSn L and the n right
  • the display of the 3D image can be realized by a mode in which the composite image for eye GSn R is sequentially displayed in a short time.
  • n-th left-eye synthesized image GSn L is displayed as an image of the first field
  • the n-th right-eye synthesized image GSn R interlace display can be employed as the image of the second field.
  • the reference shift amount is changed according to the scene change in the 3D video. That is, the reference parallax related to the reference parallax image added to the 3D moving image can be changed. For this reason, the difference in parallax in the 3D moving image is easily recognized, and the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
  • FIG. 3 shows the (n + m) th left-eye image G (n + m) L and (n + m) th included in the (n + m) th frame after the scene changes from the nth frame shown in FIG. )
  • An example of the right eye image G (n + m) R is shown.
  • the (n + m) left-eye image G (n + m) L includes an object region O4 L related to one person, and the (n + m) right-eye image G (n + m) R relates to the one person.
  • a state including the object region O4 R is shown.
  • the (n + m) th reference composite position P ((n + m) left eye image area TA (n + m) L corresponding to the (n + m) left eye image G (n + m) L.
  • n + m) an example of the (n + m) the reference area image I (n + m) second L were synthesized (n + m) for the left eye synthesized image GS (n + m) L in accordance with the L is shown.
  • the (n + m) th reference area image I (n + m) L and the (n + m) th corresponding area image I (n + m) R can be generated based on the reference shift amount changed due to the change of the scene. That is, the position of the pixel in which the (n + m) th reference area image I (n + m) L and the (n + m) th corresponding area image I (n + m) R indicate the same part of the display object according to the changed reference deviation amount. Have a relationship shifted in one direction.
  • FIG. 5 is a diagram illustrating a schematic configuration of the information processing system 1 according to the embodiment.
  • the information processing system 1 includes a stereo camera 2, an information processing device 4, and a line-of-sight detection sensor 5.
  • the information processing device 4 is connected to the stereo camera 2 and the line-of-sight detection sensor 5 so as to be able to transmit and receive data.
  • the stereo camera 2 has a camera 21 and a camera 22.
  • Each of the cameras 21 and 22 has a function of a digital camera having an image sensor such as a CCD.
  • Each of the cameras 21 and 22 performs a photographing operation of receiving light from the subject and acquiring information indicating the distribution regarding the luminance of the subject as image data by photoelectric conversion.
  • the optical axis of the camera 21 and the optical axis of the camera 22 are separated by a predetermined distance in the horizontal direction.
  • the predetermined distance is also referred to as a base line length, and is set to a distance equivalent to the distance between the center of the average human left eye and the center of the right eye, for example.
  • a stereo image is acquired by performing a photographing operation with the camera 21 and the camera 22 at substantially the same timing.
  • the stereo image includes a set of an image for the left eye and an image for the right eye, and can be displayed so as to enable stereoscopic viewing by the user.
  • Ns sets (Ns is an integer of 2 or more) of stereo images are obtained by performing a plurality of times of continuous shooting with the camera 21 and the camera 22 at a predetermined timing. Can be acquired.
  • the Ns sets of stereo images correspond to images of Ns frames included in the 3D moving image.
  • the line-of-sight detection sensor 5 detects a portion of the screen of the display unit 42 included in the information processing device 4 that is noticed by the user (also referred to as a portion of interest).
  • the display unit 42 and the line-of-sight detection sensor 5 are fixed to each other with a predetermined arrangement relationship.
  • the line-of-sight detection sensor 5 for example, first, an image of the user is obtained by photographing, and then the direction of the line of sight of the user is detected by analyzing the image, and then the user on the screen of the display unit 42 The part of interest is detected.
  • the analysis of the image can be realized, for example, by detecting the orientation of the face using pattern matching, and identifying the white-eye portion and the black-eye portion in both eyes using a color difference.
  • information relating to one or more stereo images obtained by the stereo camera 2 can be transmitted to the information processing device 4 via the communication line 3a.
  • Information relating to shooting conditions in the stereo camera 2 can also be transmitted to the information processing apparatus 4 via the communication line 3a.
  • information related to the target portion obtained by the line-of-sight detection sensor 5 can be transmitted to the information processing apparatus 4 via the communication line 3b.
  • the communication lines 3a and 3b may be wired lines or wireless lines.
  • the information processing apparatus 4 has a function of, for example, a personal computer (personal computer).
  • the information processing apparatus 4 includes an operation unit 41, a display unit 42, an interface (I / F) unit 43, a storage unit 44, an input / output unit 45, and a control unit 46.
  • the operation unit 41 includes, for example, a mouse and a keyboard.
  • the display unit 42 includes, for example, a liquid crystal display.
  • the I / F unit 43 receives information from the stereo camera 2 and the line-of-sight detection sensor 5.
  • the storage unit 44 includes, for example, a hard disk and stores each image obtained by the stereo camera 2. Further, the storage unit 44 stores a program PG1 and the like for realizing various operations in the information processing apparatus 4.
  • the input / output unit 45 includes, for example, a disk drive, can receive the storage medium 6 such as an optical disk, and can exchange data with the control unit 46.
  • the control unit 46 includes a CPU 46 a that functions as a processor and a memory 46 b that can temporarily store information, and controls each unit of the information processing apparatus 4.
  • various functions, various information processing, and the like are realized by reading and executing the program PG1 in the storage unit 44. Data temporarily generated in this information processing is appropriately stored in the memory 46b.
  • the information processing apparatus 4 functions as an image processing apparatus which produces
  • FIG. 6 is a block diagram illustrating a functional configuration of the image processing apparatus realized by the control unit 46.
  • the functional configuration of the image processing apparatus includes a video acquisition unit 461 as a first acquisition unit, a region of interest detection unit 462, a change detection unit 463, a shift amount determination unit 464, and a region image acquisition unit as a second acquisition unit. 465, a signal receiving unit 466, a mode setting unit 467, a position specifying unit 468, and an image composition unit 469 are included.
  • the video acquisition unit 461 acquires video information including information related to the 3D moving image.
  • the 3D moving image includes Ns sets (Ns is an integer of 2 or more) of stereo images, that is, image information of Ns frames.
  • the stereo image of the Nth frame (N is an arbitrary natural number) of the Ns frames has a relationship in which the positions of pixels indicating the same portion of the object are shifted in one direction (here, the horizontal direction). It includes an eye image GN L and an Nth right eye image GN R.
  • the video information may include information related to audio matched to the 3D video and metadata related to the 3D video.
  • the metadata may include one or more information of caption information, chapter information, and shooting condition information.
  • the N-th left-eye image GN L is used as a reference image (also referred to as a reference image), and the N-th right-eye image GN R is an image corresponding to the reference image (also referred to as a corresponding image). It is said.
  • the Nth right eye image GN R may be a reference image
  • the Nth left eye image GN L may be a corresponding image.
  • the attention area detection unit 462 captures the same object that is predicted to attract the user's eyes from the set of the Nth left eye image GN L and the Nth right eye image GN R according to a preset detection rule.
  • a set of attention areas is detected.
  • An attention area set includes an attention area (also referred to as a reference attention area) detected from the Nth left eye image GN L and an attention area (also referred to as a corresponding attention area) detected from the Nth right eye image GN R. And a pair.
  • the change detector 463 detects a scene change in the video information.
  • a change in the scene can be detected based on one or more information of 3D moving image information, audio information, and metadata included in the video information.
  • Changes in the scene can include changes in location and time, replacement of main display objects, changes in the state of display objects, and the like.
  • the change in the state of the display object can include movement of the display object, change of the focused display object, and the like.
  • the deviation amount determination unit 464 determines a reference deviation amount for each scene by following a predetermined determination rule.
  • the reference deviation amount can be determined.
  • the reference shift amount can be determined based on the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames belonging to one scene after the scene change.
  • the reference shift amount is determined based on the shift amount (parallax) of the pixel position indicating the same part of the object between the N-th left eye image GN L and the N-th right eye image GN R. Can be done.
  • the deviation amount determination unit 464 can reduce the calculation amount by using various calculation results in the change detection unit 463.
  • the area image acquisition unit 465 acquires information (also referred to as Nth area image information) related to the Nth reference area image IN L and the Nth corresponding area image IN R.
  • the N-th reference area image IN L and the N-th corresponding area image IN R are the reference in which the position of the pixel indicating the same portion of the display object is determined in one direction (here, the horizontal direction) by the shift amount determination unit 464. There is a relationship of deviation.
  • the Nth area image information related to the Nth reference area image IN L and the Nth corresponding area image IN R can be acquired based on the reference deviation amount.
  • the signal receiving unit 466 receives a signal input to the control unit 46 according to the operation of the operation unit 41 by the user.
  • the mode setting unit 467 sets the position specifying unit 468 to any one of a plurality of modes including the first mode and the second mode.
  • the mode setting unit 467 can set the mode of the position specifying unit 468 in accordance with the signal received by the signal receiving unit 466.
  • the attention area detection unit 462 detects a attention area (also referred to as a detection permission mode) or does not detect a attention area (also referred to as a detection prohibit mode). ).
  • the mode setting unit 467 may set the mode of the position specifying unit 468 based on the Nth left eye image GN L and the Nth right eye image GN R acquired by the video acquisition unit 461.
  • the mode of the attention area detection unit 462 may be set.
  • Position specifying unit 468 the N reference region image IN L is (also referred to as N-th reference the synthetic position) by the position synthesized and PN L, position where the N corresponding region image IN R is synthesized (the N corresponding object synthesizing also referred to) to specify and PN R, the a position.
  • a first mode position specifying unit 468 a mode for specifying the first N reference the combined position PN L and the N corresponding the combined position PN R so as to surround the image
  • the position specifying unit 468 2 mode when a mode for designating a first N reference the combined position PN L and the N corresponding object synthesizing position PN R in accordance with the region of interest can be considered.
  • the N-th reference composite position PN L may be specified in a region around the N-th left-eye image GN L , or the periphery of another image different from the N-th left-eye image GN L. It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered. Further, the first N corresponds the combined position PN R, may be specified in the region of the periphery of the N right-eye image GN R, of the periphery of different separate image and the N right-eye image GN R It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered.
  • the image composition unit 469 synthesizes the N-th left eye image GN L , the N-th right eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN R to create a three-dimensional image.
  • Information of a viewable image also referred to as stereoscopic image information
  • the corresponding area image IN R can be synthesized.
  • the N reference region image IN L is aligned with the first N reference the synthetic position PN L specified by the position specifying unit 468, the first N corresponding region image IN R, is designated by position designator 468 the N corresponds may be aligned with the object to be combined position PN R was.
  • the stereoscopic image information can be visually output on the display unit 42 under the control of the control unit 46.
  • the N-th left eye image GN L , the N-th right eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN. R can be superimposed and displayed at the same time.
  • one or more of the N-th left-eye image GN L , the N-th right-eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN R may be displayed in time sequence.
  • the sense of distance related to the object given to the user by the N-th left-eye image area TAN L and the N-th right-eye image area TAN R is the N-th reference area image IN L and the N-th corresponding area image IN R. Can be enhanced by comparison with the sense of distance related to the display object given to the user. Further, since the difference in parallax in the 3D video can be easily recognized by changing the reference shift amount according to the change in the scene in the 3D video, the sense of depth that can be obtained by the user watching the 3D video can be improved.
  • Attention Area Detection Method As a detection method of the reference attention area and the corresponding attention area in the attention area detection unit 462, for example, one or more area detection methods among the following first to sixth area detection methods may be employed.
  • An image area located in the vicinity of the center of the N-th left-eye image GN L and the N-th right-eye image GN R is detected as the attention area.
  • an image region in which the contour of an object is extracted from the N-th left-eye image GN L by image processing such as Hough transform, and an object located near the center of the N-th left-eye image GN L is used as a reference attention. It can be detected as a region.
  • an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area.
  • a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as a phase-only correlation (POC) method.
  • POC phase-only correlation
  • the N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and an image region indicating a specific type of object such as a person is detected as a region of interest by template matching or the like.
  • the reference region of interest can be detected by template matching or the like targeting the Nth left eye image GN L.
  • an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area.
  • a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
  • a region of interest is detected by analyzing a motion vector for a plurality of stereo images included in the 3D moving image. For example, a region indicating an object having a motion exceeding a certain threshold in a predetermined number of frames can be detected as the reference attention region and the corresponding attention region.
  • the reference region of interest can be detected by analyzing motion vectors for a plurality of Nth left-eye images GN L. Then, an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area.
  • a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
  • An area in which at least one of a specific color and a specific texture different from the surroundings is targeted for the N-th left-eye image GN L and the N-th right-eye image GN R is the reference attention area and the corresponding attention. Detected as a region. For example, an area in which a specific color such as a skin color that is a characteristic color of human being is targeted for the N-th left-eye image GN L can be detected as the reference attention area. In addition, an area in which a specific texture such as a color arrangement of a part (eyes, head hair, eyebrows, mouth, etc.) included in the human head is detected as a reference is targeted for the N-th left-eye image GN L.
  • an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area.
  • a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
  • a case where a plurality of subjects of the same type (for example, humans) are partially overlapped in one image can be considered.
  • information indicating the distance from the viewpoint to the subject that can be obtained from the N-th left eye image GN L and the N-th right eye image GN R can be used to distinguish a plurality of subjects.
  • the information indicating the distance can be obtained from the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R and the baseline length of the stereo camera 2 using the principle of triangulation.
  • the parallax can be obtained by searching for corresponding points such as the POC method.
  • the reference attention region and the corresponding attention region are detected from the N-th left eye image GN L and the N-th right eye image GN R.
  • the N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and one or more sets of image regions indicating a specific type of object such as a person can be detected in advance by template matching or the like. Then, an area corresponding to the target portion of the one or more sets of image areas can be detected as the reference target area and the corresponding target area.
  • the N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and the contour of each object is extracted by image processing such as Hough transform. And the parallax between the N-th left eye image GN L and the N-th right eye image GN R in the region related to each object surrounded by the outline does not change abruptly regardless of the passage of time.
  • the areas are detected as the reference attention area and the corresponding attention area where the person is captured. Note that the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R can be obtained, for example, by searching for corresponding points using the POC method or the like.
  • the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R in the region related to each subject surrounded by the contour is in a predetermined number of frames corresponding to a predetermined time.
  • an area where the parallax between the N-th left eye image GN L and the N-th right eye image GN R is equal to or greater than a predetermined value is a reference attention in which a person is captured.
  • a region that is detected as a region and a corresponding attention region and whose parallax is less than a predetermined value can be detected as a region where a distant view is captured.
  • Detection of a scene change in the change detection unit 463 may be detected based on one or more information of 3D moving image information, audio information, and metadata that may be included in the video information.
  • the change detection unit 463 can detect a change in the scene according to an image change between two or more frames included in the video information. Thereby, according to the change of an image, the image which has the parallax used as the reference suitable for a scene can be synthesize
  • a scene change can be detected by comparing an n + m-th frame (m is a natural number) with an n-th frame before m frames.
  • n-th left-eye image Gn L the (n + m) the left-eye image G (n + m) comparison of L
  • the n-th right-eye image Gn R and the (n + m) the right eye image G (n + m) Any comparison of the comparison with R and both comparisons may be made.
  • m may be 1 or 2 or more.
  • one or more change detection methods among the following first to sixth change detection methods can be employed as the change detection method of the scene based on the information of the 3D moving image in the change detection unit 463.
  • a scene change may be detected in response to one or more of a luminance change, a color change, and a frequency component change between two or more frames included in the video information.
  • the luminance difference integrated value For example, if the value obtained by integrating the difference of the luminance signal in each pixel between the nth frame and the n + mth frame (also referred to as the luminance difference integrated value) exceeds a predetermined threshold value, it is assumed that the scene has changed. Changes in the scene can be detected. In addition, a change in scene may be detected in accordance with a change in distribution related to an integrated value (also referred to as an integrated luminance value) of luminance values of vertical lines in an image. In this case, for example, if the sum of the differences in luminance integrated values of the vertical lines exceeds a predetermined threshold between the nth frame and the (n + m) frame, the scene changes as if the scene has changed. Can be detected.
  • an integrated value also referred to as an integrated luminance value
  • FIG. 7 schematically shows an example of the distribution of integrated luminance values of the n-th left-eye image Gn L shown in FIG. 1
  • FIG. 8 shows the (n + m) th left-hand side shown in FIG.
  • An example of the distribution of integrated luminance values of the eye image G (n + m) L is schematically shown.
  • a change in scene may be detected according to a change in distribution related to an integrated value (also referred to as a color component integrated value) of pixel values of a specific color of a vertical line in an image.
  • an integrated value also referred to as a color component integrated value
  • the scene change is assumed to have occurred. Can be detected.
  • a change in scene is detected by comparing the intensity distribution of spatial frequency components obtained by Fourier transform or the like for the distribution of pixel values between the nth frame and the n + mth frame. Can be done. Specifically, for example, if the spatial frequency range is divided into a predetermined number of bands and the sum of the differences in the intensity of the frequency components in each band exceeds a predetermined threshold, a scene change has occurred. A change in scene can be detected.
  • a change in the scene can be detected in accordance with a change in the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information.
  • a set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
  • the change in the scene can be detected as a change in the scene.
  • a change in the reference attention area may be detected as a change in the scene
  • a change in the corresponding attention area may be detected as a change in the scene.
  • a change in at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame.
  • object regions O1 L to O3 L there are three object regions O1 L to O3 L as reference attention regions between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L.
  • a state is shown in which the captured image is changed to an image in which one object region O4 L as a reference attention region is captured.
  • the change of the reference attention area can be recognized in accordance with the object mismatch between the object area O1 L and the object area O4 L.
  • the mismatch of the display object between the object region O1 L and the object region O4 L can be recognized by searching for corresponding points by template matching, the POC method, or the like.
  • m may be 1 or more, and may be 20 to 50.
  • a scene change can be detected in accordance with the movement of a set of a reference attention area and a corresponding attention area.
  • a scene change may be detected according to the movement of the reference attention area, or a scene change may be detected according to the movement of the corresponding attention area. In this case, it is only necessary to recognize whether at least one of the reference attention area and the corresponding attention area has moved beyond a predetermined amount between the nth frame and the (n + m) th frame.
  • a scene change can be detected in accordance with at least one of the appearance and disappearance of a set of a reference attention area and a corresponding attention area in a plurality of frames included in video information.
  • a set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
  • a scene change can be detected as a scene change has occurred.
  • the appearance and disappearance of the reference attention area may be detected as a scene change
  • the appearance and disappearance of the corresponding attention area may be detected as a scene change.
  • the appearance and disappearance of at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame.
  • the appearance and disappearance of the reference attention area and the appearance and disappearance of the corresponding attention area can be recognized by, for example, detecting a change in the number of the reference attention area and the corresponding attention area, and detecting the movement amount and the moving direction of the object.
  • the detection of the moving amount and the moving direction of the object can be realized by dividing each image into a plurality of blocks and comparing them between the nth frame and the n + m frame.
  • m may be 1 or more, and may be 20 to 50.
  • the captured image shows a state in which an object region O7 L as a reference attention region appears from the left side. That is, a state in which the object is in frame is shown.
  • the appearance of the object region O7 L may be recognized according to the change in the number of object regions, or the appearance of the object region O7 L according to the detection of the movement amount and the movement direction of the object region O7 L. It may be recognized.
  • the first frame of the scene after the change may be a frame when the reference attention area starts to appear or disappear, or may be a frame when the appearance or disappearance of the reference attention area progresses by a predetermined rate.
  • the frame may be a frame at the time when the appearance or disappearance of the reference attention area is completed.
  • the first frame of the changed scene may be a frame at the time when the corresponding attention area starts to appear or disappear, or may be a frame at the time when the appearance or disappearance of the corresponding attention area has progressed by a predetermined rate. It may be a frame when the appearance or disappearance of the corresponding attention area is completed.
  • a change in the scene can be detected in accordance with a change in the size of the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information.
  • a set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
  • a change in the scene can be detected as a change in the scene.
  • the area of the reference region of interest changes beyond a predetermined amount, it may be detected as a scene change, and if the area of the corresponding region of interest changes beyond a predetermined amount, it changes as a scene change. It may be detected.
  • a set of reference attention areas indicating the same object may be detected between the nth frame and the n + m frame by searching for corresponding points by template matching, the POC method, or the like.
  • a set of corresponding attention areas indicating an object may be detected.
  • the predetermined amount may be an absolute amount of the area, or may be an amount calculated by multiplying the area by a predetermined coefficient (for example, 0.5).
  • m may be 1 or more, and may be 20 to 50.
  • a change in the scene can be detected in accordance with a change in the in-focus area in the plurality of frames included in the video information.
  • the recognition of the in-focus area can be executed by targeting at least one of the N-th left eye image GN L and the N-th right eye image GN R , for example.
  • an in-focus area may be recognized by edge extraction processing using Hough transform or the like, or an image is divided into a plurality of areas and a high-frequency component is calculated by Fourier transform related to the distribution of pixel values for each area.
  • the in-focus area may be recognized by analyzing the presence or absence.
  • the presence or absence of high-frequency components is analyzed for each object region by Fourier transform related to the distribution of pixel values. It may be recognized whether it is an area.
  • the scene change can be detected as a scene change.
  • Whether or not the region is a region where a different object is captured can be determined, for example, by searching for corresponding points by template matching, the POC method, or the like.
  • m may be 1 or more, and may be 20 to 50.
  • the change detection unit 463 changes from the object region O6 L in which the object having a different focus region is captured to the object region O7 L.
  • a scene change can be detected as recognized and as a scene change has occurred.
  • a face of a person captured in a plurality of frames included in the video information is recognized, and a scene change can be detected in accordance with the change of the person between the plurality of frames.
  • Recognition of a person's face can be performed, for example, by targeting at least one of the N-th left eye image GN L and the N-th right eye image GN R.
  • a region also referred to as a face region
  • a region in which a human face is captured from an image by template matching or the like is recognized, and then a region (eyes, eyebrows, nose, mouth) from the face region by template matching or the like.
  • Etc. and the face of a person can be recognized by the positional relationship of a plurality of feature points that specify the edge portions of those parts.
  • a scene change can be detected as a scene change has occurred.
  • Whether or not a person is replaced can be determined according to, for example, a change in the positional relationship between a plurality of feature points.
  • m may be 1 or more, and may be 20 to 50.
  • FIGS. 17 and 18 an area where a person is captured between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L is shown as an object area O9 L to an object area O10.
  • the state of changing to L is shown.
  • the scene change can be detected as a scene change.
  • the change detection unit 463 can detect a change in scene according to a change in audio included in the video information.
  • an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image according to a change in sound.
  • a change in the scene can be detected by comparing the sound corresponding to the n + m-th frame (m is a natural number) with the sound corresponding to the n-th frame before m frames.
  • m may be 1 or 2 or more.
  • the scene change detection method based on the audio information in the change detection unit 463 for example, one or more change detection methods among the following seventh to ninth change detection methods can be adopted.
  • the change detection unit 463 can detect a change in scene according to a change in volume obtained from audio information included in the video information. For example, the volume corresponding to the (n + m) th frame (where m is a natural number) is compared with the volume corresponding to the nth frame before m frames, and when the amount of change in volume exceeds a predetermined amount, a scene change has occurred. Scene changes can be detected as if.
  • m may be 1 or 2 or more.
  • the predetermined amount may be, for example, a fixed absolute amount related to the volume, or may be a variable amount calculated from the volume according to a predetermined rule.
  • a calculation rule for multiplying a volume corresponding to the nth frame by a predetermined coefficient (for example, 0.5) or the like may be employed. At this time, when the volume increases or decreases by 50% or more, the scene change can be detected as a scene change.
  • the change detection unit 463 can detect a change in the scene according to the change in the frequency component of the audio based on the audio information included in the video information.
  • the frequency component of the sound corresponding to the (n + m) th (m is a natural number) frame is compared with the frequency component of the sound corresponding to the nth frame before m frames, and the evaluation value related to the frequency component is a predetermined threshold value. If it has changed beyond the range, it can be detected that the scene has changed.
  • m may be 1 or 2 or more.
  • the evaluation value related to the frequency component the sum of the intensities of the frequency components of the audio for each frequency band related to the audio can be adopted.
  • each frequency band related to sound for example, a low frequency band, a high frequency band, and a band therebetween can be adopted.
  • Such a change in the evaluation value relating to the frequency component can correspond to, for example, a change in BGM.
  • the change detection unit 463 can detect a change in a scene according to a change in a voiceprint that can be recognized from sound, based on sound information included in the video information.
  • a voiceprint is recognized by analyzing voice corresponding to the (n + m) th (m is a natural number) frame, and a voiceprint is recognized by analyzing voice corresponding to the nth frame before m frames. Then, the voiceprint related to the n + mth frame and the voiceprint related to the nth frame are compared, and if the person speaking is different, the change in the scene can be detected as a change in the scene.
  • the change detection unit 463 can detect a change in scene according to a change in metadata included in the video information. Thereby, according to the change of metadata, an image having a parallax that is a reference suitable for a scene can be combined with the 3D moving image. For example, a change in a scene can be detected by comparing metadata related to an (n + m) th frame (m is a natural number) with metadata related to an nth frame before m frames. m may be 1 or 2 or more.
  • a detection method of a scene change based on metadata in the change detection unit 463 for example, one or more change detection methods among the following tenth to twelfth change detection methods can be adopted.
  • the change detection unit 463 can detect a change in scene according to a change in subtitle information specified by metadata included in the video information.
  • caption information corresponding to the (n + m) th (m is a natural number) frame is compared with caption information corresponding to the nth frame before m frames, and if a predetermined change occurs in the caption information, A scene change can be detected as a change has occurred.
  • the predetermined change mode may include a mode in which a certain feature of subtitle information changes to a different feature, a mode in which subtitle information changes from a state without subtitle information, and the like.
  • characteristic terms that lead to changes in scenes related to time, place, era, topic, etc. are detected from subtitle information, and changes in characteristic terms and appearance and disappearance of characteristic terms occur. Then, the scene change can be detected as a scene change has occurred.
  • the change detection unit 463 can detect a change in scene according to a change in chapter information specified by metadata included in video information.
  • the chapter information corresponding to the (n + m) th frame (where m is a natural number) is compared with the chapter information corresponding to the nth frame before m frames, and if the chapter information has changed, the scene changes. Changes in the scene can be detected as objects.
  • the change detection unit 463 can detect a change in scene according to a change in shooting conditions specified by the metadata included in the video information.
  • the shooting condition of the (n + m) th frame (m is a natural number) is compared with the shooting condition of the nth frame before m frames, and if the shooting condition changes, the scene changes as if the scene has changed. Can be detected.
  • the imaging conditions for example, parameters (also referred to as imaging parameters) such as focal length, aperture size, and imaging magnification can be adopted.
  • a change in the scene can be detected as a change in the scene.
  • Deviation amount determination method In the shift amount determination unit 464, one or more Nth frames for one scene from when the first change of the scene is detected by the change detection unit 463 of the video information until the second change of the next scene is detected.
  • the reference shift amount is determined based on the shift amount (also referred to as pixel shift amount) of the pixel position indicating the same portion of the object between the Nth left eye image GN L and the Nth right eye image GN R. Can be done. At this time, one reference shift amount common to all frames in one scene can be determined.
  • the pixel shift amount between the first N left-eye image GN L and the N right-eye image GN R is, for example, by the search for corresponding points with POC method, the N left-eye image GN L And the Nth right-eye image GN R can be obtained by detecting a combination of a reference pixel and a corresponding pixel in which the same part of the object is captured.
  • the determination method of the reference deviation amount in the deviation amount determination unit 464 for example, one or more of the following first to fourth determination methods can be adopted.
  • the frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames. Thereby, the uncomfortable feeling which a user receives in one scene of 3D animation can be reduced.
  • the reference The amount of deviation may be determined.
  • the reference shift amount can be determined based on the representative value of the pixel shift amount obtained from the distribution of the pixel shift amount.
  • a representative value of the pixel shift amount at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted.
  • the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount.
  • a predetermined rule for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.
  • the frame group from the (n ⁇ b) th frame (b is a natural number) to the (n ⁇ a) th frame (a is a natural number) to the nth frame is the first scene.
  • the frame group from the (n + a) th frame to the (n + c) th frame including the (n + a) th frame corresponds to the next second scene.
  • the N-th left-eye image GN L is illustrated and the N-th right-eye image GN R is not illustrated from the viewpoint of preventing the illustrated complexity.
  • the (n ⁇ b) th left-eye image G (n ⁇ b) L to the (n ⁇ a) th left-eye image G (na) L included in the first scene are included in the n-th left eye.
  • the first frame group GG1 L to use image Gn L is shown.
  • the second frame group GG2 L to L are shown.
  • FIG. 20 shows the change in the amount of pixel shift with time as a curve. If the times T0 to T1 are the first scene and the times T1 to T2 are the second scene, the representative value RV1 of the pixel shift amount in the first scene is calculated, and the representative of the pixel shift amount in the second scene is calculated. The value RV2 can be calculated.
  • an image having a parallax corresponding to the reference shift amount (reference parallax image) is added to the 3D moving image.
  • the parallax related to the reference parallax image is used as a reference, and the spread before and after the 3D moving image is easily recognized. As a result, the sense of depth that can be obtained by the user watching the 3D video can be improved.
  • Second determination method A shift amount of a pixel position indicating the same part of the object between the N-th left-eye image GN L and the N-th right-eye image GN R of the first frame included in the frame group of one scene in the video information. Based on (pixel shift amount), the reference shift amount is determined.
  • the frame group of one scene may be all frames included in one scene, for example. Thereby, the process which adds the common image which has the parallax used as a reference
  • the reference deviation amount may be determined.
  • the reference shift amount can be determined based on the representative value of the pixel shift amount.
  • a representative value of the pixel shift amount at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted.
  • the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount.
  • a predetermined rule for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.
  • the frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames.
  • the virtual reference plane may be a plane parallel to the screen, for example, and may be a plane orthogonal to the user's line of sight facing the screen.
  • the virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the virtual reference plane.
  • a representative value of the virtual distance at least one value of an average value, a maximum value, a minimum value, a mode value, and a median value of the virtual distance can be adopted.
  • the predetermined value may be set to an arbitrary value according to the operation of the operation unit 41 by the user, or may be set to a fixed value prepared in advance.
  • the reference deviation amount is set according to the maximum value of the virtual distance. If the representative value of the virtual distance is the minimum value of the virtual distance and the predetermined value is 0, the reference deviation amount can be set in accordance with the minimum value of the virtual distance.
  • FIG. 21 shows a scene change occurring between the nth frame and the (n + m) th frame.
  • the change in the average value of the virtual distance of each frame over time when a predetermined virtual reference plane is set is indicated by a curve.
  • the times T0 to T1 are the first scene to which the nth frame belongs
  • the times T1 to T2 are the second scene to which the (n + m) th frame belongs.
  • the average value Mr1 can be calculated as the representative value of the virtual distance in the first scene
  • the average value Mr2 can be calculated as the representative value of the virtual distance in the second scene.
  • the first virtual reference that protrudes the distance MR1 from the predetermined virtual reference plane with respect to the first scene so that the representative value of the virtual distance in the first scene and the representative value of the virtual distance in the second scene become a predetermined value.
  • a plane is set, and a second virtual reference plane that protrudes from the predetermined virtual reference plane by a distance MR2 can be set for the second scene.
  • the shift amount corresponding to the first virtual reference plane is determined as the reference shift amount for the first scene
  • the shift amount corresponding to the second virtual reference plane is determined as the reference shift amount for the second scene.
  • a reference shift amount related to one scene is determined according to an image of a scene before the one scene (previous scene). Specifically, the object is identical between the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames in a certain scene (also referred to as a second scene) of the video information. A distribution of the shift amount (also referred to as a first pixel shift amount) of the pixel position indicating the portion is obtained. Further, the same part of the object is shown between the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames in another scene (also referred to as the first scene) before the second scene. A distribution of pixel position shift amounts (also referred to as second pixel shift amounts) is obtained. Then, a reference shift amount in the second scene is determined based on the distribution of the first pixel shift amount and the distribution of the second pixel shift amount.
  • the one or more frames in the second scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the second scene.
  • the one or more frames in the first scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the first scene.
  • a representative value related to the distribution of the first pixel shift amount also referred to as a first shift representative value
  • a representative value related to the distribution of the second pixel shift amount also referred to as a second shift representative value
  • the reference deviation amount can be determined.
  • the first deviation representative value at least one of the average value, the maximum value, the minimum value, the mode value, and the median value related to the distribution of the first pixel deviation amount is adopted.
  • the second deviation representative value at least one of an average value, a maximum value, a minimum value, a mode value, and a median value related to the distribution of the second pixel deviation amount can be adopted.
  • the reference deviation amount of the second scene can be determined by a predetermined calculation using the first deviation representative value and the second deviation representative value.
  • the predetermined calculation for example, an calculation in which an average value of the first deviation representative value and the second deviation representative value is calculated may be employed. At this time, this average value can be determined as the reference deviation amount of the second scene.
  • FIG. 23 is a diagram for explaining a fourth method for determining the reference deviation amount.
  • FIG. 23 shows the first deviation representative value MP1 related to the first scene (time T0 to T1), the second deviation representative value MP2 related to the second scene (time T1 to T2) next to the first scene, A third deviation representative value MP3 relating to the third scene (time T2 to T3) next to the two scenes is shown.
  • the first deviation representative value MP1 can be set as the reference deviation amount RP1 of the first scene.
  • the average value of the first deviation representative value MP1 and the second deviation representative value MP2 can be set as the reference deviation amount RP2 of the second scene.
  • an average value of the second deviation representative value MP2 and the third deviation representative value MP3 can be set as the reference deviation amount RP3 of the third scene.
  • the reference deviation amount may be determined by the following method. First, a shift amount of a pixel position indicating the same part of an object between the Nth left eye image GN L and the Nth right eye image GN R included in the frame group of the first scene in the video information. A distribution of (pixel shift amount) is obtained. Based on the distribution of pixel shift amounts, a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated. On the other hand, the same part of the object is shown between the Nth left eye image GN L and the Nth right eye image GN R included in the frame group of the second scene next to the first scene in the video information.
  • a distribution of pixel position shift amounts is obtained. Based on the distribution of the pixel shift amount, a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range. A deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.
  • first and second virtual reference planes may be planes parallel to the screen, for example, and may be planes orthogonal to the user's line of sight facing the screen.
  • the virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the first or second virtual reference plane.
  • the reference deviation amount may be determined according to the virtual distance related to the attention area.
  • a displacement amount (pixel displacement amount) of a pixel position indicating the same part of the object between the reference attention region and the corresponding attention region included in the frame group of the first scene in the video information. is obtained.
  • a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated.
  • the amount of pixel position shift (pixel shift amount) indicating the same part of the object between the reference target region and the corresponding target region included in the frame group of the second scene next to the first scene in the video information. ) Distribution is obtained.
  • a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range.
  • a deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.
  • Region Image Information Acquisition Method The acquisition of the N-th region image information regarding the N-th reference region image IN L and the N-th corresponding region image IN R in the region image acquisition unit 465 is realized, for example, by sequentially performing the following first step and second step. Can be done.
  • this image pattern for example, an image pattern showing a specific pattern in which relatively large dots are randomly arranged, an image pattern including an information display column (for example, a data column or a time column) of digital broadcasting, and a device
  • An image pattern including operation buttons or the like can be employed.
  • Second step> The image pattern read out in the first step is used as a base image (for example, the Nth reference area image IN L ), and is determined according to the reference shift amount related to each scene determined by the shift amount determination unit 464.
  • the other image for example, the Nth corresponding region image IN R
  • the Nth reference area image IN L and the Nth corresponding area image IN R are acquired for each scene. That is, the Nth reference area image IN L and the Nth corresponding area image IN R for a scene are constant.
  • a set of image patterns corresponding to a plurality of deviation amounts is stored in advance in the storage unit 44 or the like, and a set of image patterns corresponding to the reference deviation amount determined by the deviation amount determination unit 464 is read out from the storage unit 44 or the like.
  • the Nth reference area image IN L and the Nth corresponding area image IN R may be acquired. That is, it is only necessary that the Nth reference area image IN L and the Nth corresponding area image IN R have a relationship in which a display object such as a predetermined pattern is shifted in the reference displacement amount in one direction.
  • FIGS. 24 and 25 illustrate variations of the Nth reference area image IN L.
  • the Nth reference position is matched with the Nth reference composite position PN L around the Nth left eye image area TAN L corresponding to the nth left eye image GN L in FIG. This is illustrated in the aspect of the composite image GSN L for the Nth left eye in which the region image IN L is arranged.
  • FIG. 24 schematically shows an image pattern having an information display column including a digital broadcast data column Pa1 and a time column Ca1.
  • FIG. 25 schematically shows an image pattern including the operation button group Ba1 and the time column Ta1.
  • the position specifying unit 468 for example, the N reference-be-combined position of a predetermined previously determined PN L and the N corresponding object synthesizing position PN R may be specified, first in accordance with the operation of the operation unit 41 by the user N reference the combined position PN L and the N corresponding object synthesizing position PN R may be specified.
  • the N reference the combined position PN L, for example, and the surrounding region of the N left-eye image GN L, corresponding to the peripheral region in another image and the N left-eye image GN L It can be specified in one or more of a plurality of regions including the region.
  • the first N corresponds the combined position PN R, for example, and the surrounding region of the N right-eye image GN R, region and the N right-eye image GN R corresponding to the peripheral region in another image Can be specified in one or more of a plurality of regions including.
  • the Nth reference combined image is generated in a region surrounding the Nth left eye image region TAN L corresponding to the Nth left eye image GN L.
  • position PN L can be specified.
  • the Nth reference synthesized position PN L is specified so as to include a specific area of the area surrounding the Nth left eye image area TAN L corresponding to the Nth left eye image GN L.
  • an Nth reference combined position PN L that specifies the left and right areas sandwiching the Nth left eye image area TAN L may be designated.
  • the N reference the synthetic position PN L for specifying a region located below the first N left-eye image region TAN L may be specified.
  • the N reference the combined position PN L may be specified width surrounding the periphery of the N left-eye image region TAN L identifies a region non-uniform.
  • the N reference the combined position PN L and the N corresponding the combined position PN R may be a position of identifying all pixels in the region of the predetermined shape of the annular like, the area of a predetermined shape of an annular or the like
  • An identifiable position also referred to as a specific position
  • the specific position may be, for example, a position of a pixel at one or more corners in a region having a predetermined shape.
  • the N-th reference combined position PN L and the N-th corresponding combined position PN according to one or more of the N-th left-eye image GN L and the N-th right-eye image GN R. R can be specified.
  • a region first N reference the combined position PN L is, and the surrounding region of the N left-eye image GN L, the first N left-eye image GN L corresponding to the region of the periphery in another image Can be specified in one or more of a plurality of regions including.
  • the N corresponding object synthesizing position PN R is, and the surrounding region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the area of the surrounding in another image It can be specified in one or more of the plurality of regions to be included.
  • the N reference the combined position PN L is, and the outer edge portion near the region of the N left-eye image GN L, corresponding to the region of the outer edge vicinity in another image and the N left-eye image GN L It may be specified in one or more of a plurality of regions including a region to be performed.
  • the N corresponding object synthesizing position PN R is, and the outer edge portion near the region of the N right-eye image GN R, corresponding to the region of the outer edge vicinity in another image and the N right-eye image GN R It may be specified in one or more of a plurality of regions including a region to be performed.
  • the disparity between the first N left-eye image GN L and the N right-eye image GN R is, if and equal to or less than the second threshold value in the first threshold value or more, PN L and the second N reference-be-combined position N corresponds can aspects be combined position PN R is specified is considered. In this aspect, for example, if the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R is less than the first threshold or exceeds the second threshold, combining position PN L and the N corresponding object synthesizing position PN R is not specified.
  • the number of PN L and the N corresponding object synthesizing position PN R may increase.
  • the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, if and equal to or less than the second threshold value in the first threshold value or more, the N reference the combined position PN L and the N corresponding object synthesizing position PN embodiment R is specified may be considered.
  • the Nth reference composite position PN L and not specified the N corresponding the combined position PN R may be specified.
  • the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, as the smaller the smaller the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R May increase.
  • the attention area detection unit 462 first N N th reference the combined position PN L and in accordance with the detected position of the reference target region and corresponding region of interest from the left-eye image GN L and the N right-eye image GN R in is the N corresponding the combined position PN R may be considered a mode specified.
  • the attention area detection unit 462 detects a set of reference attention areas and corresponding attention areas.
  • the Nth reference combined position PN L can be designated according to the position of the reference region of interest occupying the Nth left eye image GN L. Further, in accordance with the position of the corresponding region of interest occupied by the N-th right-eye image GN R, it is the N corresponding the combined position PN R may be designated.
  • FIG. 29 shows a case where the reference attention area is the object area O1 L and the corresponding attention area is the object area O1 R.
  • the object region O1 L is projected in the first predetermined direction (here, the ⁇ X direction) out of the region F L around the left eye image region TAN L corresponding to the Nth left eye image GN L.
  • position PN1 L indicating a region which is (here, -Y direction) the object region O1 L is the second predetermined direction
  • the position PN2 L indicating an area which is projected on, as the N reference the synthetic position PN L Can be specified.
  • the first N corresponds the combined position PN R may be designated.
  • the arrangement of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed.
  • Kaware the magnitude of the reference target region and corresponding region of interest the size of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed.
  • Kaware the number of reference target region and corresponding region of interest the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed.
  • the image combining unit 469, the N reference region image IN L is, and the surrounding region of the N left-eye image GN L, to the surrounding area in another image and the N left-eye image GN L It can be arranged in one or more of a plurality of regions including corresponding regions. Also includes the N corresponding region image IN R is, and the surrounding region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the area of the surrounding in another image It may be arranged in one or more areas of the plurality of areas.
  • the N left eye combined image GSN L to the N-th image GN L for the left eye and the N reference region image IN L is synthesized are generated, the N right-eye image GN R and the N corresponding region A composite image for the Nth right eye GSN R synthesized with the image IN R can be generated.
  • the N-th left-eye image GN L and the N-th reference area image IN L acquired based on the same reference deviation amount are synthesized for all frames of one scene.
  • the area image IN L is synthesized.
  • the Nth reference region image IN L related to the same reference shift amount is combined with each Nth left eye image GN L of the second frame group GG2 L of the second scene shown in FIG. Is done.
  • the n-th reference region image In L is synthesized with the n-th left-eye image Gn L in the first scene shown in FIG. 21, and is shown in FIG.
  • the (n + m) th reference region image I (n + m) L is synthesized with the (n + m) th left-eye image G (n + m) L in the second scene.
  • the stereoscopic image information generated as described above only needs to include information of at least one of the first format and the second format.
  • the first format the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference region image IN L, and the Nth corresponding region image IN R are superimposed on one screen.
  • the format which can be displayed at the same time in the aspect is included.
  • the second format is one or more of the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference area image IN L , and the Nth corresponding area image IN R in one screen. It includes a format in which an image and one or more remaining images can be displayed in time sequence. Thus, even if various display modes are employed, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.
  • FIG. 32 is a flowchart illustrating an example of an operation flow of the image processing apparatus according to the embodiment. This operation flow is realized by reading and executing the program PG1 in the storage unit 44 by the control unit 46. For example, execution of image processing related to a 3D moving image in the information processing apparatus 4 is requested in accordance with the operation of the operation unit 41 by the user, and this operation flow is started.
  • the video information is acquired by the video acquisition unit 461.
  • step S2 the mode setting unit 467 determines whether or not the attention area detection unit 462 is set to a mode for detecting the attention area. If the mode for detecting the attention area is set, the process proceeds to step S3. If the mode for detecting the attention area is not set, the process proceeds to step S4.
  • step S3 the attention area detection unit 462 detects the reference attention area and the corresponding attention area for the N-th left eye image GN L and the N-th right eye image GN R.
  • step S4 the change detector 463 detects a change in the scene.
  • step S5 the deviation amount determination unit 464 determines a reference deviation amount for each scene.
  • step S6 the area image acquisition unit 465 acquires Nth area image information related to the Nth reference area image IN L and the Nth corresponding area image IN R for each scene.
  • step S ⁇ b> 7 the mode setting unit 467 determines whether a signal related to mode setting of the position specifying unit 468 has been received by the signal receiving unit 466. If a signal related to mode setting is accepted, the process proceeds to step S8. If a signal related to mode setting is not accepted, the process proceeds to step S9.
  • step S ⁇ b> 8 the position specifying unit 468 selects one of a plurality of modes including the first mode and the second mode according to the signal received by the signal receiving unit 466 by the mode setting unit 467. Is set. It is assumed that position setting unit 468 is initially set to a predetermined mode before the mode is set in step S8.
  • step S9 the position specifying unit 468, and the N reference the combined position PN L and the N corresponding object synthesizing position PN R is specified for each scene.
  • step S10 the image synthesizing unit 469 synthesizes the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference region image IN L , and the Nth corresponding region image IN R. As a result, stereoscopic image information is generated, and this operation flow ends.
  • an image having a reference parallax suitable for a scene is synthesized with a 3D moving image according to a change in the scene. For this reason, the difference with the parallax in a 3D moving image becomes easy to be recognized by the comparison with the image which has a standard parallax. As a result, the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
  • the N reference region image IN L is, and the surrounding region of the N left-eye image GN L, the periphery of the other images from the first N left-eye image GN L
  • the Nth corresponding region image IN R is arranged in one or more of a plurality of regions including a region corresponding to the region of the Nth right eye, the region around the Nth right eye image GN R , and the Nth right eye. Although it has been arranged in one or more of a plurality of regions including a region corresponding to the surrounding region in a different image from the work image GN R , the present invention is not limited to this.
  • the Nth reference area image IN L is arranged so as to overlap an area (also referred to as an internal area) of the Nth left eye image GN L
  • the Nth corresponding area image IN R may be arranged so as to overlap the inner region of the N-th right eye image GN R.
  • the N reference region image IN L is, to the N-th left-eye image GN L may be arranged in a region corresponding to the internal region of said N left-eye image GN L in another image
  • the N corresponding region image iN R it may be the first N right-eye image GN R are disposed in a region corresponding to the internal region of said N right-eye image GN R in another image.
  • FIG. 33 The left side of FIG. 33, the n in accordance with the n-th reference the combined position Pn L of the internal region of the n left-eye image area TAn L corresponding to the N left-eye image GN L shown in FIG. 1
  • it is matched with the n-th corresponding combined position Pn R in the inner region of the n-th right eye image area TAn R corresponding to the N-th right eye image GN R shown in FIG.
  • An example of the n corresponding region image an in R n-th right-eye synthesized image GSn R being synthesized is shown.
  • the Nth reference area image IN L and the Nth corresponding area image IN R may be, for example, specific markers.
  • the specific marker may be, for example, a unique marker that can be easily distinguished from an object originally included in the Nth left eye image GN L and the Nth right eye image GN R by the user.
  • Intrinsic features can be realized, for example, by shape, color, texture, and the like.
  • the specific marker may be a marker drawn with CG or the like.
  • examples of the specific marker include various simple shapes such as a stick, a triangle, and an arrow, and various objects such as a vase and a butterfly. This makes it difficult for the user to confuse the object originally included in the N-th left eye image GN L and the N-th right eye image GN R with the specific marker.
  • the specific marker has a specific shape, a specific color, a specific texture, and the like, but is translucent, the Nth left-eye image GN L and the Nth right-eye image GN R Display of an image based on is difficult to be hindered.
  • the position specifying unit 468 may be predetermined and N-th reference the combined position PN L is the N corresponding object synthesizing position PN R is specified but the in response to operation of the operation unit 41 by the user N reference the combined position PN L and the N corresponding object synthesizing position PN R may be specified.
  • the N reference the combined position PN L for example, the internal region of the N left-eye image GN L, a region from the N-th left-eye image GN L corresponding to the internal region of another image Can be specified in one or more of a plurality of regions including.
  • the first N corresponds the combined position PN R, for example, the internal region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the internal region of another image It can be specified in one or more of the plurality of regions to be included.
  • the N-th reference composition position PN L and the N-th corresponding composition composition according to one or more of the N-th left eye image GN L and the N-th right eye image GN R. position PN R may be specified.
  • the N reference the combined position PN L and the N corresponding object synthesizing position PN R is specified in the vicinity of the reference target region and corresponding region of interest can be considered.
  • the vicinity of the attention area for example, the lower left, upper left, lower right, upper right, upper, lower, left, and right positions of the attention area can be considered.
  • a position surrounding the attention area may be considered.
  • the reference target region and object region O1 is the corresponding region of interest L
  • O1 R N-th bottom left of the embodiments reference the combined position PN L and the N corresponding object synthesizing position PN R is specified are contemplated.
  • the size of the region specified by the region identified by the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed May be. Also, if Kaware the number of reference target region and corresponding region of interest, the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R may be changed. Depending on the operation of the operation unit 41 by the user, the size and number of regions specified by the region and the N corresponding the combined position PN R specified by the N reference the combined position PN L may be changed.
  • the Nth reference synthesized position PN L and the Nth corresponding synthesized position PN R are specified in an area that is predicted not to be noticed by the user (non-attention area).
  • the non-attention area include areas different from the reference attention area and the corresponding attention area detected by the attention area detection unit 462.
  • the non-attention area an area in the vicinity of the end of the N-th left-eye image GN L and the N-th right-eye image GN R , an area indicating an object with little motion obtained from the motion vector analysis result, And an area where the color and texture are not conspicuous. This makes it difficult for the display of the area that the user is paying attention to. As a result, it is possible to achieve both the suppression of the visual discomfort and the improvement of the depth feeling that can be obtained by the user watching the 3D video.
  • a certain reference deviation amount is determined for one scene, but the present invention is not limited to this.
  • the reference deviation amount may be changed stepwise or gradually in the vicinity of the scene change. Thereby, the uncomfortable feeling that the user receives when the scene changes in the 3D moving image can be reduced.
  • the reference shift amount with respect to each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount between the frames in the vicinity of the scene change in the video information. May be determined.
  • the region image acquisition unit 465 determines the position of the pixel indicating the same part of the display object in one direction for each frame for each frame in the vicinity of the scene change in the video information.
  • the N-th region image information related to the N-th reference region image IN L and the N-th corresponding region image IN R having a relationship that is shifted by the reference displacement amount can be acquired.
  • the image composition unit 469 for each frame in the vicinity of the scene change in the video information, the Nth left eye image GN L , the Nth right eye image GN R, and the Nth reference region relating to each frame are displayed.
  • the image IN L and the Nth corresponding region image IN R can be combined.
  • the reference deviation amount PR1 is calculated for the first scene (time T0 to T1) by the same determination method as in the above embodiment, and the second scene (time T1 to T1) is calculated.
  • a reference deviation amount PR2 is calculated with respect to T2).
  • the reference deviation amount PR1 is adopted as it is, and for the second scene, the reference deviation amount is increased or decreased by a predetermined amount between the frames in order from the first frame.
  • the reference deviation amount can be determined so as to reach the reference deviation amount PR2.
  • the reference deviation amount is changed stepwise from time T1 to T1 + ⁇ .
  • the predetermined amount may be set in advance, or may be set by dividing the difference between the reference deviation amount PR1 and the reference deviation amount PR2 by a predetermined number of frames (for example, Nf).
  • the predetermined frame number Nf can be set to 30 frames, for example.
  • the reference deviation amount linearly changes from time T1 to T1 + ⁇ .
  • the predetermined amount may be calculated by multiplying a reference deviation amount PR1 related to the first scene by a predetermined coefficient (for example, 0.01). At this time, the reference deviation amount changes nonlinearly from time T1 to T1 + ⁇ .
  • the mode of position designator 468 to be set by the mode setting unit 467, a designation method of the N reference the combined position PN L and the N corresponding the combined position PN R according to the above embodiment, the first modified the N reference the synthesis position according to examples PN L and the N corresponding designation method of the combined position PN R may be performed selectively.
  • the position specifying section 468 if it is set to the first mode, specifying of the N reference the combined position PN L and the N corresponding the combined position PN R according to the embodiment is executed if the position specifying section 468 is set to the second mode, specifying of the N reference the combined position PN L and the N corresponding the combined position PN R according to the first modification may be performed.
  • the position specifying section 468 is set to the second mode, specifying of the N reference the combined position PN L and the N corresponding the combined position PN R according to the first modification may be performed.
  • the N reference the combined position PN L and the N corresponding object synthesizing position PN R of the N reference according to the specified method in the first modification to be combined position PN L and the N corresponding the synthesis according to the embodiment both the specified methods position PN R may be performed simultaneously.
  • the first modified example, and the second modified example the video information including the 3D moving image is originally acquired.
  • video information including a 3D video may be acquired by generating 3D video from the normal video by various methods after video information including the normal video is obtained.

Abstract

An objective of the present invention is to provide a technology whereby the depth sensation which is obtained by a user who sees a 3-D motion picture is improved. To achieve the objective, video information is obtained, which includes, in information of each frame, a reference image and a correspondence image which have a relationship wherein the location of pixels which denote the same portion of an object is offset in one direction. Next, a change in the scene in the video information is detected. Thereafter, a reference offset degree is determined on the basis of the offset degree of the location of the pixels which denote the same portion of the object between the reference image and the correspondence image in one or more frames of the video information after the change of the scene. In such a circumstance, region image information is acquired which concerns a reference region image and a correspondence region image which have a relation wherein the location of pixels which denote the same portion of an object to be displayed is offset by a reference degree in one direction. Then, stereoscopic image information is generated by compositing the reference image, the correspondence image, the reference region image, and the correspondence region image for each frame of the video information after the change of scene.

Description

画像処理装置、画像処理方法、およびプログラムImage processing apparatus, image processing method, and program
 本発明は、画像処理技術に関する。 The present invention relates to an image processing technique.
 昨今、立体視が可能な動画(3D動画とも立体視動画とも言う)を利用したテレビ(3Dテレビとも言う)が脚光を浴びている。3Dテレビでは、同一物体を異なる視点から見た2つの画像が利用されて立体視が可能な画像(3D画像とも立体視画像とも言う)が表示される。 Recently, televisions (also referred to as 3D televisions) that use moving images capable of stereoscopic viewing (also referred to as 3D moving images or stereoscopic viewing movies) are in the spotlight. In 3D television, two images obtained by viewing the same object from different viewpoints are used to display an image that can be stereoscopically viewed (also referred to as a 3D image or a stereoscopic image).
 3D画像の技術では、左眼用の画像と右眼用の画像との間で物体の同一部分を示す画素の位置がずらされており、人間の眼の焦点調節機能が利用されて画像の奥行き感がユーザーに与えられる。なお、左眼用の画像と右眼用の画像との間における物体の同一部分を示す画素の位置のズレ量は「視差」とも称される。 In the 3D image technology, the position of the pixel indicating the same part of the object is shifted between the image for the left eye and the image for the right eye, and the depth adjustment of the image is performed using the focus adjustment function of the human eye. A feeling is given to the user. Note that the shift amount of the pixel position indicating the same part of the object between the image for the left eye and the image for the right eye is also referred to as “parallax”.
 このような3D画像の技術は、種々の映像分野で採用されている。例えば、ステレオ画像から検出された視差が人間の眼の融合範囲に入るように調節されることで広い視野の範囲について画像の立体視が可能となる内視鏡装置が提案されている(例えば、特許文献1)。また、立体映像を表示し、その奥行き感を調整する場合に、基準となる参照用立体画像が表示される立体映像処理装置が提案されている(例えば、特許文献2)。 Such 3D image technology has been adopted in various video fields. For example, an endoscope apparatus has been proposed that enables stereoscopic viewing of an image over a wide field of view by adjusting parallax detected from a stereo image to fall within the fusion range of human eyes (for example, Patent Document 1). In addition, there has been proposed a stereoscopic video processing apparatus that displays a stereoscopic reference image when a stereoscopic video is displayed and the sense of depth is adjusted (for example, Patent Document 2).
 ところで、視差がある程度小さければ、画面の大きさ等によってはユーザーが奥行き感を得難い場合がある。つまり、同一物体であっても、実際に目視する場合と3D画像上で見る場合との間で、ユーザーには異なる様に見えることがある。 By the way, if the parallax is small to some extent, it may be difficult for the user to obtain a sense of depth depending on the size of the screen. That is, even if the object is the same, it may appear different to the user between when it is actually viewed and when it is viewed on the 3D image.
 このような3D画像の技術の状況において、画面に変化を付けたり、面白味を付加するため、あるいは立体視を容易とするために、3次元画像の周囲に枠画像が表示される技術が提案されている(例えば、特許文献3)。この技術では、複数の枠画像から使用する枠画像を選択することが可能である。 In such 3D image technology, a technique has been proposed in which a frame image is displayed around a 3D image in order to change the screen, add interest, or facilitate stereoscopic viewing. (For example, Patent Document 3). With this technique, it is possible to select a frame image to be used from a plurality of frame images.
特開平8-313825号公報JP-A-8-313825 特開平11-155155号公報Japanese Patent Laid-Open No. 11-155155 国際公開第2003/092303号International Publication No. 2003/093023
 しかしながら、上記特許文献3の技術であっても、3D動画における奥行き感がユーザーには十分に得られない場合がある。 However, even with the technique of the above-mentioned Patent Document 3, the user may not be able to obtain a sufficient depth feeling in the 3D video.
 本発明は、上記課題に鑑みてなされたものであり、3D動画を見ているユーザーが得られる奥行き感を向上させる技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for improving a sense of depth obtained by a user watching a 3D moving image.
 上記課題を解決するために、第1の態様に係る画像処理装置は、物体の同一部分を示す画素の位置が一方向にずれている関係を有する基準画像と対応画像とを各フレームの情報において含む映像情報を取得する第1取得部と、前記映像情報における場面の変化を検出する変化検出部と、前記映像情報のうちの前記場面の変化の後における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定する決定部と、表示対象物の同一部分を示す画素の位置が前記一方向に前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得する第2取得部と、前記映像情報のうちの前記場面の変化の後における各フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成する合成部と、を備える。 In order to solve the above-described problem, the image processing apparatus according to the first aspect includes a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame. A first acquisition unit that acquires video information including; a change detection unit that detects a change in a scene in the video information; and the reference image of one or more frames after the scene change in the video information; A determination unit that determines a reference shift amount based on a shift amount of a pixel position that indicates the same portion of the object between the corresponding image and a position of the pixel that indicates the same portion of the display target is the reference in the one direction. A second acquisition unit that acquires region image information related to a reference region image and a corresponding region image having a relationship of deviation, and each frame after the change of the scene in the video information, Comprising a reference image, the corresponding image, the reference region image, and said corresponding area image to generate a stereoscopic image information by synthesizing the composite section.
 第2の態様に係る画像処理装置は、第1の態様に係る画像処理装置であって、前記立体視画像情報が、一画面において前記基準画像と前記対応画像と前記基準領域画像と前記対応領域画像とを重畳している態様で同時期に表示可能とする第1形式の情報、および一画面において前記基準画像と前記対応画像と前記基準領域画像と前記対応領域画像とのうちの1以上の画像と残余の1以上の画像とを時間順次に表示可能とする第2形式の情報のうちの少なくとも一方の形式の情報を含む。 An image processing device according to a second aspect is the image processing device according to the first aspect, wherein the stereoscopic image information includes the reference image, the corresponding image, the reference region image, and the corresponding region on one screen. Information in a first format that can be displayed at the same time in a manner in which images are superimposed, and one or more of the reference image, the corresponding image, the reference area image, and the corresponding area image on one screen It includes information of at least one of the information of the second format that makes it possible to display the image and one or more remaining images in time sequence.
 第3の態様に係る画像処理装置は、第1または第2の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記変化検出部によって場面の第1変化が検出されてから場面の第2変化が検出されるまでの一場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定し、前記合成部が、前記映像情報のうちの前記一場面における全フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成する。 An image processing apparatus according to a third aspect is the image processing apparatus according to the first or second aspect, wherein the determination unit detects a first change of a scene by the change detection unit of the video information. Based on a shift amount of a pixel position indicating the same portion of the object between the reference image and the corresponding image of one or more frames in one scene until the second change in the scene is detected A shift amount is determined, and the combining unit combines the reference image, the corresponding image, the reference region image, and the corresponding region image for all the frames in the one scene of the video information to generate a stereoscopic view. Generate image information.
 第4の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記一場面における最初のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、前記基準ズレ量を決定する。 An image processing device according to a fourth aspect is the image processing device according to the third aspect, wherein the determination unit includes the reference image and the corresponding image of the first frame in the one scene of the video information. The reference shift amount is determined on the basis of the shift amount of the pixel position indicating the same part of the object between and.
 第5の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づいて、前記基準ズレ量を決定する。 An image processing device according to a fifth aspect is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same part of the object from the image.
 第6の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、を更に備え、前記決定部が、前記映像情報のうちの前記一場面の最初のフレームの前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量に基づいて、前記基準ズレ量を決定する。 An image processing apparatus according to a sixth aspect is the image processing apparatus according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: The reference shift amount is determined based on a shift amount of a pixel position indicating the same portion of the object between the reference attention region and the corresponding attention region of the first frame of the scene.
 第7の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、を更に備え、前記決定部が、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づいて、前記基準ズレ量を決定する。 An image processing device according to a seventh aspect is the image processing device according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the scene.
 第8の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、仮想基準面から前記物体の表面までの仮想距離の代表値を算出し、前記仮想距離の代表値が所定値となる場合における前記仮想基準面に対応するズレ量を、前記基準ズレ量として決定する。 An image processing device according to an eighth aspect is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information A representative value of a virtual distance from a virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object from the image, and the representative value of the virtual distance is a predetermined value In this case, the amount of deviation corresponding to the virtual reference plane is determined as the reference amount of deviation.
 第9の態様に係る画像処理装置は、第3の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記一場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置の第1ズレ量の分布と、前記映像情報のうちの前記第1変化前の前場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置の第2ズレ量の分布とに基づいて、前記基準ズレ量を決定する。 An image processing apparatus according to a ninth aspect is the image processing apparatus according to the third aspect, wherein the determination unit and the correspondence with the reference image of one or more frames in the one scene of the video information The distribution of the first shift amount of the pixel position indicating the same part of the object between the image and the reference image and the corresponding image of one or more frames in the previous scene before the first change in the video information The reference shift amount is determined based on the distribution of the second shift amount of the pixel position indicating the same part of the object between and.
 第10の態様に係る画像処理装置は、第9の態様に係る画像処理装置であって、前記決定部が、前記第1ズレ量の分布に係る第1ズレ代表値と、前記第2ズレ量の分布に係る第2ズレ代表値とに基づいて、前記基準ズレ量を決定する。 An image processing apparatus according to a tenth aspect is the image processing apparatus according to the ninth aspect, wherein the determining unit includes a first deviation representative value relating to the distribution of the first deviation amount, and the second deviation amount. The reference deviation amount is determined on the basis of the second deviation representative value relating to the distribution.
 第11の態様に係る画像処理装置は、第9の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記前場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第1仮想基準面から前記物体の表面までの仮想距離の第1代表値を算出するとともに、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第2仮想基準面から前記物体の表面までの仮想距離の第2代表値を算出し、前記第1代表値と前記第2代表値との差が所定の許容範囲内となる場合における前記第2仮想基準面に対応するズレ量を、前記基準ズレ量として決定する。 An image processing device according to an eleventh aspect is the image processing device according to the ninth aspect, wherein the determining unit and the correspondence are included in the reference image included in the frame group of the previous scene in the video information. A first representative value of a virtual distance from the first virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object with respect to the image, and the video information From the second virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image included in the frame group of the scene and the corresponding image A second representative value of the virtual distance is calculated, and a deviation amount corresponding to the second virtual reference plane when the difference between the first representative value and the second representative value is within a predetermined allowable range is Determined as the standard deviation amount.
 第12の態様に係る画像処理装置は、第9の態様に係る画像処理装置であって、予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、を更に備え、前記決定部が、前記映像情報のうちの前記前場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第1仮想基準面から前記物体の表面までの仮想距離の第1代表値を算出するとともに、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第2仮想基準面から前記物体の表面までの仮想距離の第2代表値を算出し、前記第1代表値と前記第2代表値との差が所定の許容範囲内となる場合における前記第2仮想基準面に対応するズレ量を、前記基準ズレ量として決定する。 An image processing apparatus according to a twelfth aspect is the image processing apparatus according to the ninth aspect, wherein the reference image and the corresponding image of each frame in the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: From the first virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the previous scene A first representative value of the virtual distance of the image, and a pixel indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the one scene of the video information A second representative value of a virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the displacement amount of the position, and a difference between the first representative value and the second representative value is a predetermined tolerance A deviation amount corresponding to the second virtual reference plane when it falls within the range is determined as the reference deviation amount.
 第13の態様に係る画像処理装置は、第1または第2の態様に係る画像処理装置であって、前記決定部が、前記映像情報のうちの前記場面の変化の近傍における各フレーム間において、前記基準ズレ量の差分が所定量以下となるように、各フレームに対して前記基準ズレ量を決定し、前記第2取得部が、前記映像情報のうちの前記場面の変化の近傍における前記各フレームに対して、表示対象物の同一部分を示す画素の位置が、前記一方向に、前記決定部によって前記各フレームに対して決定された前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得し、前記合成部が、前記映像情報のうちの前記場面の変化の近傍における前記各フレームについて、該各フレームに係る前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成する。 An image processing device according to a thirteenth aspect is the image processing device according to the first or second aspect, wherein the determining unit is between each frame in the vicinity of the scene change in the video information. The reference shift amount is determined for each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount, and the second acquisition unit is configured to determine each of the video information in the vicinity of the change in the scene. A reference region image having a relationship in which a position of a pixel indicating the same portion of the display object is shifted in the one direction with respect to the frame in the reference shift amount determined for each frame by the determination unit; Region image information related to the corresponding region image is acquired, and the combining unit, for each frame in the vicinity of the scene change in the video information, the reference image related to each frame, the pair Image, the reference region image, and the corresponding area image synthesizing.
 第14の態様に係る画像処理装置は、第1から第13の何れか1つの態様に係る画像処理装置であって、前記変化検出部が、前記映像情報に含まれる2以上のフレームの間における画像の変化に応じて、前記場面の変化を検出する。 An image processing device according to a fourteenth aspect is the image processing device according to any one of the first to thirteenth aspects, wherein the change detection unit is between two or more frames included in the video information. The change of the scene is detected according to the change of the image.
 第15の態様に係る画像処理装置は、第14の態様に係る画像処理装置であって、前記画像の変化が、輝度の変化、色の変化、および周波数成分の変化のうちの1以上の変化を含む。 An image processing device according to a fifteenth aspect is the image processing device according to the fourteenth aspect, wherein the change in the image is one or more of a change in luminance, a change in color, and a change in frequency component. including.
 第16の態様に係る画像処理装置は、第14の態様に係る画像処理装置であって、前記変化検出部が、前記映像情報に含まれる複数のフレームで捉えられている人物の顔を識別し、前記複数のフレームの間における人物の入れ替わりに応じて、前記場面の変化を検出する。 An image processing device according to a sixteenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit identifies a human face captured by a plurality of frames included in the video information. The change of the scene is detected according to the change of the person between the plurality of frames.
 第17の態様に係る画像処理装置は、第14の態様に係る画像処理装置であって、前記変化検出部が、前記映像情報に含まれる複数のフレームにおける、ユーザーの目を惹くものと予測される物体を捉えた注目領域の変化に応じて、前記場面の変化を検出する。 An image processing device according to a seventeenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information. The change of the scene is detected according to the change of the attention area where the object is captured.
 第18の態様に係る画像処理装置は、第14の態様に係る画像処理装置であって、前記変化検出部が、前記映像情報に含まれる複数のフレームにおける、ユーザーの目を惹くものと予測される物体を捉えた注目領域の出現および消失の少なくとも一方に応じて、前記場面の変化を検出する。 An image processing device according to an eighteenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information. The change in the scene is detected in response to at least one of the appearance and disappearance of the attention area capturing the target object.
 第19の態様に係る画像処理装置は、第1から第18の何れか1つの態様に係る画像処理装置であって、前記変化検出部が、前記映像情報に含まれる複数のフレームにおける、焦点が合っている合焦領域の変更に応じて、前記場面の変化を検出する。 An image processing device according to a nineteenth aspect is the image processing device according to any one of the first to eighteenth aspects, wherein the change detection unit is focused on a plurality of frames included in the video information. The change of the scene is detected according to the change of the in-focus area.
 第20の態様に係る画像処理装置は、第1から第19の何れか1つの態様に係る画像処理装置であって、前記映像情報が、音声に係る情報を含み、前記変化検出部が、前記音声の変化に応じて、前記場面の変化を検出する。 An image processing device according to a twentieth aspect is the image processing device according to any one of the first to nineteenth aspects, wherein the video information includes information related to sound, and the change detection unit includes the change detection unit, The change of the scene is detected according to the change of the voice.
 第21の態様に係る画像処理装置は、第20の態様に係る画像処理装置であって、前記音声の変化が、音量の変化、周波数成分の変化、および声紋の変化のうちの1以上の変化を含む。 An image processing device according to a twenty-first aspect is the image processing device according to the twentieth aspect, wherein the change in the sound is one or more of a change in volume, a change in frequency component, and a change in voiceprint. including.
 第22の態様に係る画像処理装置は、第1から第21の何れか1つの態様に係る画像処理装置であって、前記映像情報が、メタデータを含み、前記変化検出部が、前記メタデータの変化に応じて、前記場面の変化を検出する。 An image processing apparatus according to a twenty-second aspect is the image processing apparatus according to any one of the first to twenty-first aspects, wherein the video information includes metadata, and the change detection unit includes the metadata. The change of the scene is detected according to the change of.
 第23の態様に係る画像処理装置は、第22の態様に係る画像処理装置であって、前記メタデータが、字幕情報、チャプター情報、および撮影条件の情報のうちの1以上の情報を含み、前記メタデータの変化が、字幕情報の変化、チャプター情報の変化、および撮影条件の変化のうちの1以上の変化を含む。 The image processing device according to a twenty-third aspect is the image processing device according to the twenty-second aspect, wherein the metadata includes one or more information of caption information, chapter information, and shooting condition information, The change in the metadata includes one or more of a change in caption information, a change in chapter information, and a change in shooting conditions.
 第24の態様に係る画像処理方法は、(a)物体の同一部分を示す画素の位置が一方向にずれている関係を有する基準画像と対応画像とを各フレームの情報において含む映像情報を取得する工程と、(b)前記映像情報における場面の変化を検出する工程と、(c)前記映像情報のうちの前記場面の変化の後における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定する工程と、(d)表示対象物の同一部分を示す画素の位置が前記一方向に前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得する工程と、(e)前記映像情報のうちの前記場面の変化の後における各フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成する工程と、を備える。 The image processing method according to the twenty-fourth aspect obtains video information including (a) a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame. (B) detecting a scene change in the video information; and (c) one or more frames of the reference image and the corresponding image after the scene change in the video information. A step of determining a reference shift amount based on a shift amount of a pixel position indicating the same part of the object in between, and (d) a position of the pixel indicating the same part of the display object is the reference shift amount in the one direction. Step of obtaining region image information related to the reference region image and the corresponding region image having a shifted relationship, and (e) for each frame after the change of the scene of the video information, the reference image, the Corresponding image, the standard Pass image, and and a step of generating a stereoscopic image information by synthesizing the corresponding region image.
 第25の態様に係るプログラムは、情報処理装置に含まれる制御部において実行されることにより、前記情報処理装置を、第1から第23の何れか1つの態様に係る画像処理装置として機能させる。 The program according to the twenty-fifth aspect is executed by a control unit included in the information processing apparatus, thereby causing the information processing apparatus to function as an image processing apparatus according to any one of the first to twenty-third aspects.
 第1から第23の何れの態様に係る画像処理装置によっても、3D動画における視差の違いが認識され易くなるため、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 The image processing apparatus according to any one of the first to twenty-third aspects makes it easy to recognize the difference in parallax in the 3D moving image, so that the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
 第2の態様に係る画像処理装置によれば、種々の表示態様が採用されても、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 According to the image processing apparatus according to the second aspect, even if various display aspects are adopted, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.
 第3の態様に係る画像処理装置によれば、3D動画の一場面に対して基準となる視差を有する共通の画像が付加されるため、画像の過度な変化が抑制され、ユーザーの眼の負担が軽減され得るとともに、演算量の低減も図られ得る。 According to the image processing device according to the third aspect, since a common image having a reference parallax is added to one scene of a 3D moving image, an excessive change in the image is suppressed, and the burden on the user's eyes is reduced. Can be reduced, and the amount of computation can be reduced.
 第4および第6の何れの態様に係る画像処理装置によっても、基準となる視差を有する共通の画像を3D動画の一場面に付加する処理がリアルタイムで行われ得る。 Also by the image processing apparatus according to any of the fourth and sixth aspects, a process of adding a common image having a reference parallax to a scene of a 3D moving image can be performed in real time.
 第5および第7の何れの態様に係る画像処理装置によっても、3D動画の一場面においてユーザーが受ける違和感が低減され得る。 The image processing apparatus according to any of the fifth and seventh aspects can reduce the user's uncomfortable feeling in one scene of the 3D moving image.
 第8から第13の何れの態様に係る画像処理装置によっても、3D動画において場面が変化する際にユーザーが受ける違和感が低減され得る。 The image processing apparatus according to any of the eighth to thirteenth aspects can reduce the user's uncomfortable feeling when the scene changes in the 3D moving image.
 第14から第19の何れの態様に係る画像処理装置によっても、画像の変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。 Also by the image processing apparatus according to any of the fourteenth to nineteenth aspects, an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image in accordance with a change in the image.
 第20および第21の何れの態様に係る画像処理装置によっても、音声の変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。 Even with the image processing apparatus according to any of the twentieth and twenty-first aspects, an image having a parallax serving as a reference suitable for a scene can be synthesized into a 3D moving image according to a change in sound.
 第22および第23の何れの態様に係る画像処理装置によっても、メタデータの変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。 Also with the image processing device according to any of the twenty-second and twenty-third aspects, an image having a parallax that is a reference suitable for a scene can be combined with a 3D moving image in accordance with changes in metadata.
 第24の態様に係る画像処理方法および第25の態様に係るプログラムの何れによっても、第1の態様に係る画像処理装置と同様な効果が得られる。 According to any of the image processing method according to the twenty-fourth aspect and the program according to the twenty-fifth aspect, an effect similar to that of the image processing apparatus according to the first aspect is obtained.
図1は、3D動画に含まれる左眼用画像および右眼用画像の一例を示す図である。FIG. 1 is a diagram illustrating an example of a left eye image and a right eye image included in a 3D moving image. 図2は、基準領域画像と対応領域画像とが付加された3D動画を例示する図である。FIG. 2 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added. 図3は、3D動画に含まれる左眼用画像および右眼用画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of a left-eye image and a right-eye image included in a 3D moving image. 図4は、基準領域画像と対応領域画像とが付加された3D画像を例示する図である。FIG. 4 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. 図5は、一実施形態に係る情報処理システムの概略構成を示す図である。FIG. 5 is a diagram illustrating a schematic configuration of an information processing system according to an embodiment. 図6は、画像処理装置に係る機能的な構成を例示するブロック図である。FIG. 6 is a block diagram illustrating a functional configuration according to the image processing apparatus. 図7は、場面の変化を検出する第1変化検出方法を説明するための図である。FIG. 7 is a diagram for explaining a first change detection method for detecting a change in a scene. 図8は、場面の変化を検出する第1変化検出方法を説明するための図である。FIG. 8 is a diagram for explaining a first change detection method for detecting a change in a scene. 図9は、場面の変化を検出する第2変化検出方法を説明するための図である。FIG. 9 is a diagram for explaining a second change detection method for detecting a change in a scene. 図10は、場面の変化を検出する第2変化検出方法を説明するための図である。FIG. 10 is a diagram for explaining a second change detection method for detecting a change in a scene. 図11は、場面の変化を検出する第3変化検出方法を説明するための図である。FIG. 11 is a diagram for explaining a third change detection method for detecting scene changes. 図12は、場面の変化を検出する第3変化検出方法を説明するための図である。FIG. 12 is a diagram for explaining a third change detection method for detecting a change in a scene. 図13は、場面の変化を検出する第4変化検出方法を説明するための図である。FIG. 13 is a diagram for explaining a fourth change detection method for detecting scene changes. 図14は、場面の変化を検出する第4変化検出方法を説明するための図である。FIG. 14 is a diagram for explaining a fourth change detection method for detecting a change in a scene. 図15は、場面の変化を検出する第5変化検出方法を説明するための図である。FIG. 15 is a diagram for explaining a fifth change detection method for detecting a change in a scene. 図16は、場面の変化を検出する第5変化検出方法を説明するための図である。FIG. 16 is a diagram for explaining a fifth change detection method for detecting a change in a scene. 図17は、場面の変化を検出する第6変化検出方法を説明するための図である。FIG. 17 is a diagram for explaining a sixth change detection method for detecting a change in a scene. 図18は、場面の変化を検出する第6変化検出方法を説明するための図である。FIG. 18 is a diagram for explaining a sixth change detection method for detecting scene changes. 図19は、基準ズレ量の第1決定方法を説明するための図である。FIG. 19 is a diagram for explaining a first determination method of the reference deviation amount. 図20は、基準ズレ量の第1決定方法を説明するための図である。FIG. 20 is a diagram for explaining a first determination method of the reference deviation amount. 図21は、基準ズレ量の第3決定方法を説明するための図である。FIG. 21 is a diagram for explaining a third determination method of the reference deviation amount. 図22は、基準ズレ量の第3決定方法を説明するための図である。FIG. 22 is a diagram for explaining a third determination method of the reference deviation amount. 図23は、基準ズレ量の第4決定方法を説明するための図である。FIG. 23 is a diagram for explaining a fourth determination method of the reference deviation amount. 図24は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 24 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図25は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 25 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図26は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 26 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図27は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 27 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図28は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 28 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図29は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 29 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図30は、第N基準領域画像が付加された第N左眼用画像を例示する図である。FIG. 30 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. 図31は、基準領域画像と対応領域画像とが付加された3D画像を例示する図である。FIG. 31 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. 図32は、画像処理装置の動作を示すフローチャートである。FIG. 32 is a flowchart showing the operation of the image processing apparatus. 図33は、基準領域画像と対応領域画像とが付加された3D動画を例示する図である。FIG. 33 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added. 図34は、基準領域画像と対応領域画像とが付加された3D画像を例示する図である。FIG. 34 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. 図35は、一変形例に係る基準ズレ量の決定方法を説明するための図である。FIG. 35 is a diagram for explaining a method of determining a reference deviation amount according to a modification.
 以下、本発明の実施形態を図面に基づいて説明する。図面では同様な構成および機能を有する部分に同じ符号が付され、重複説明が省略される。また、各図面は模式的に示されたものであり、例えば、各図面における画像上の表示物のサイズおよび位置関係等は正確に図示されたものではない。また、画像データと、画像データに基づいて表示される画像とをまとめて適宜「画像」と総称する。更に、各図面では、画像の左上の画素が原点とされ、画像の長辺に沿った方向(ここでは横方向)がX軸方向とされ、画像の短辺に沿った方向(ここでは縦方向)がY軸方向とされる。そして、各画像の右方向が+X方向とされ、各画像の下方向が+Y方向とされる。なお、図1~図4、図9~図18、図21、図24~図29、図31、図33、図34には、直交するXYの2軸が付されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, parts having similar configurations and functions are denoted by the same reference numerals, and redundant description is omitted. Each drawing is schematically shown, and for example, the size and positional relationship of the display object on the image in each drawing are not accurately shown. In addition, the image data and the images displayed based on the image data are collectively referred to as “images” as appropriate. Further, in each drawing, the upper left pixel of the image is the origin, the direction along the long side of the image (here horizontal direction) is the X axis direction, and the direction along the short side of the image (here vertical direction) ) Is the Y-axis direction. The right direction of each image is the + X direction, and the downward direction of each image is the + Y direction. 1 to 4, FIG. 9 to FIG. 18, FIG. 21, FIG. 24 to FIG. 29, FIG. 31, FIG. 33, and FIG.
 <(1)一実施形態>
  <(1-1)一実施形態に係る処理の概要>
 立体視が可能な動画(3D動画とも言う)が表示されるために、例えば、N番目(Nは任意の自然数)のフレームに対し、表示対象物の同一部分を示す画素の位置が一方向にずれた関係を有する左眼用のN番目の画像(第N左眼用画像とも言う)GNLと右眼用のN番目の画像(第N右眼用画像とも言う)GNRとが準備される。
<(1) One Embodiment>
<(1-1) Outline of Processing According to One Embodiment>
Since a stereoscopically viewable video (also referred to as a 3D video) is displayed, for example, the position of a pixel indicating the same portion of the display object is unidirectional with respect to the Nth (N is an arbitrary natural number) frame. A left-eye N-th image (also referred to as an N-th left-eye image) GN L and a right-eye N-th image (also referred to as an N-th right-eye image) GN R are prepared. The
 ここで、一方向は、人間の左眼と右眼との離隔方向に合致し得る方向であり、例えば、画像の横方向に設定される。第N左眼用画像GNLと第N右眼用画像GNRは、例えば、ステレオカメラを用いた撮影で取得され得る。ステレオカメラは人間の左眼と右眼とに相当する2つのカメラを有する。 Here, the one direction is a direction that can match the separation direction of the human left eye and right eye, and is set, for example, in the horizontal direction of the image. The Nth left-eye image GN L and the Nth right-eye image GN R can be acquired, for example, by photographing using a stereo camera. The stereo camera has two cameras corresponding to a human left eye and a right eye.
 図1には、n番目(N=n)のフレームに係る第n左眼用画像GnLと第n右眼用画像GnRとが示されている。具体的には、第n左眼用画像GnLが3つの物体を示す領域(物体領域とも言う)O1L,O2L,O3Lを含み、第n右眼用画像GnRが3つの物体領域O1R,O2R,O3Rを含む様子が示されている。物体領域O1L,O1Rは同一人物を示し、物体領域O2L,O2Rは同一物(ここではパイロン)を示し、物体領域O3L,O3Rは他の同一物(ここではボール)を示す。ここでは、第n左眼用画像GnLを占める物体領域O1L,O2L,O3Lの位置に対して、第n右眼用画像GnRを占める物体領域O1R,O2R,O3Rの位置が左にずれている。なお、図1では、第n右眼用画像GnRにおいて、比較が容易となるように、第n左眼用画像GnLを占める物体領域O1L,O2L,O3Lの外縁に対応する位置に細い破線が付されている。 1 is, n-th n-th left-eye image Gn L according to the frame of the (N = n) and the n-th right-eye image Gn R is shown. Specifically, the n-th left eye image Gn L includes regions (also referred to as object regions) O1 L , O2 L , and O3 L that indicate three objects, and the n-th right eye image Gn R includes three object regions. A state in which O1 R , O2 R , and O3 R are included is shown. The object areas O1 L and O1 R indicate the same person, the object areas O2 L and O2 R indicate the same object (here, pylon), and the object areas O3 L and O3 R indicate the other same object (here, the ball). . Here, the object areas O1 R , O2 R , and O3 R occupying the n-th right eye image Gn R with respect to the positions of the object areas O1 L , O2 L , and O3 L occupying the n-th left-eye image Gn L are shown. The position is shifted to the left. In FIG. 1, in the n-th right eye image Gn R , positions corresponding to the outer edges of the object regions O1 L , O2 L , and O3 L occupying the n-th left eye image Gn L so as to facilitate comparison. Are marked with a thin broken line.
 ところで、例えば、第n左眼用画像GnLを占める物体領域O1Lの位置と第n右眼用画像GnRを占める物体領域O1Rの位置とのズレ量(視差とも言う)がある程度小さい場合が考えられる。同様に、第n左眼用画像GnLを占める物体領域O2Lの位置と第n右眼用画像GnRを占める物体領域O2Rの位置とのズレ量(視差)がある程度小さい場合が考えられる。更に、第n左眼用画像GnLを占める物体領域O3Lの位置と第n右眼用画像GnRを占める物体領域O3Rの位置とのズレ量(視差)がある程度小さい場合が考えられる。 Incidentally, for example, when the amount of deviation between the position of the n-th object region O1 occupy the position and the n right-eye image Gn R object region O1 L occupying the left-eye image Gn L R (also referred to as parallax) it is small to some extent Can be considered. Similarly, the amount of deviation between the position of the n left-eye image Gn object area occupying L O2 L position and the object region O2 R occupying the n-th right-eye image Gn R (parallax) can be considered if somewhat less . Furthermore, the amount of deviation between the position of the n left-eye image Gn object area occupying L O3 L position and the object region O3 R occupying the n-th right-eye image Gn R (parallax) can be considered if somewhat small.
 これらの場合、3D画像が表示される画面の大きさを含む各種の表示条件等によっては、物体領域O1L,O2L,O3L,O1R,O2R,O3Rで示される物体に対してユーザーが奥行き感を得難くなり得る。換言すれば、同一物体であっても、実際に目視する場合と比較して、3D画像上で物体を見る場合には、実際の物体とは異なる奥行き感をユーザーに与え得る。 In these cases, depending on various display conditions including the size of the screen on which the 3D image is displayed, the object indicated by the object regions O1 L , O2 L , O3 L , O1 R , O2 R , O3 R Users may have difficulty getting a sense of depth. In other words, even when the object is the same, it is possible to give the user a sense of depth different from that of the actual object when the object is viewed on the 3D image as compared with the case where the object is actually viewed.
 このような問題に対して、本実施形態では、図2で示されるように、各フレームの第N左眼用画像GNLおよび第N右眼用画像GNRに対して、基準となる視差を有する表示対象物に係る画像(基準視差画像とも言う)が付加される。これにより、その基準視差画像との比較によって、3D画像を見ているユーザーが得られる奥行き感が向上し得る。 With respect to such a problem, in the present embodiment, as shown in FIG. 2, a reference parallax is applied to the Nth left eye image GN L and the Nth right eye image GN R of each frame. An image (also referred to as a reference parallax image) related to the display object is added. Thereby, the feeling of depth obtained by the user who is viewing the 3D image can be improved by comparison with the reference parallax image.
 例えば、n番目のフレーム(第nフレームとも言う)に付加される基準視差画像は、左眼用の第n基準領域画像InLと右眼用の第n対応領域画像InRとを含む。第n基準領域画像InLおよび第n対応領域画像InRは、基準ズレ量に応じて生成され、表示対象物の同一部分を示す画素の位置が一方向にずれた関係を有する。基準ズレ量は、例えば、第nフレームが属する場面の1以上のフレームにおける第N左眼用画像GNLと第N右眼用画像GNRとに基づいて決定され得る。 For example, the reference parallax image to be added to the n-th frame (also referred to as n-th frame) includes a first n corresponding region image an In R for the n-th reference region image an In L and the right eye for the left eye. The n-th reference area image In L and the n-th corresponding area image In R are generated according to the reference shift amount, and have a relationship in which the positions of pixels indicating the same portion of the display object are shifted in one direction. The reference deviation amount can be determined based on, for example, the Nth left-eye image GN L and the Nth right-eye image GN R in one or more frames of the scene to which the nth frame belongs.
 図2の左側には、第n左眼用画像GnLに相当する画像領域(第n左眼用画像領域とも言う)TAnLの周囲の位置(後述する第n基準被合成位置PnL)に合わせて第n基準領域画像InLが合成された第n左眼用合成画像GSnLの一例が示されている。また、図2の右側には、第n右眼用画像GnRに相当する画像領域(第n右眼用画像領域とも言う)TAnRの周囲の位置(後述する第n対応被合成位置PnR)に合わせて第n対応領域画像InRが合成された第n右眼用合成画像GSnRの一例が示されている。 On the left side of FIG. 2 is a position around an image area (also referred to as an n-th left-eye image area) TAn L corresponding to the n-th left-eye image Gn L (an n-th reference combined position Pn L described later). An example of a combined first n reference region image an in L n-th left-eye synthesized image GSn L synthesized is shown. Further, on the right side of FIG. 2, positions around an image region (also referred to as an n-th right eye image region) TAn R corresponding to the n-th right eye image Gn R (an n-th corresponding combined position Pn R described later). An example of the composite image GSn R for the n-th right eye in which the n-th corresponding region image In R is combined in accordance with () is shown.
 そして、例えば、第n左眼用合成画像GSnLと第n右眼用合成画像GSnRとが合成された一画像として表示される態様、または第n左眼用合成画像GSnLと第n右眼用合成画像GSnRとが短時間の間で順次に表示される態様等によって、3D画像の表示が実現され得る。なお、例えば、第n左眼用合成画像GSnLが第1フィールドの画像として表示され、第n右眼用合成画像GSnRが第2フィールドの画像として表示されるインターレース方式が採用され得る。 Then, for example, aspect and the n left-eye synthesized image GSn L and the n right-eye synthesized image GSn R is displayed as an image combined or the n left-eye synthesized image GSn L and the n right, The display of the 3D image can be realized by a mode in which the composite image for eye GSn R is sequentially displayed in a short time. Incidentally, for example, n-th left-eye synthesized image GSn L is displayed as an image of the first field, the n-th right-eye synthesized image GSn R interlace display can be employed as the image of the second field.
 また、本実施形態では、3D動画における場面の変化に応じて、基準ズレ量が変更される。つまり、3D動画に付加される基準視差画像に係る基準となる視差が変更され得る。このため、3D動画における視差の違いが認識され易くなり、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 In the present embodiment, the reference shift amount is changed according to the scene change in the 3D video. That is, the reference parallax related to the reference parallax image added to the 3D moving image can be changed. For this reason, the difference in parallax in the 3D moving image is easily recognized, and the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
 例えば、図3には、図1で示されたn番目のフレームから場面が変化した後の(n+m)番目のフレームに含まれる第(n+m)左眼用画像G(n+m)Lと第(n+m)右眼用画像G(n+m)Rの一例が示されている。そして、第(n+m)左眼用画像G(n+m)Lが1人の人物に係る物体領域O4Lを含み、第(n+m)右眼用画像G(n+m)Rが該1人の人物に係る物体領域O4Rを含む様子が示されている。 For example, FIG. 3 shows the (n + m) th left-eye image G (n + m) L and (n + m) th included in the (n + m) th frame after the scene changes from the nth frame shown in FIG. ) An example of the right eye image G (n + m) R is shown. The (n + m) left-eye image G (n + m) L includes an object region O4 L related to one person, and the (n + m) right-eye image G (n + m) R relates to the one person. A state including the object region O4 R is shown.
 図4の左側には、第(n+m)左眼用画像G(n+m)Lに相当する第(n+m)左眼用画像領域TA(n+m)Lの周囲の第(n+m)基準被合成位置P(n+m)Lに合わせて第(n+m)基準領域画像I(n+m)Lが合成された第(n+m)左眼用合成画像GS(n+m)Lの一例が示されている。また、図4の右側には、第(n+m)右眼用画像G(n+m)Rに相当する第(n+m)右眼用画像領域TA(n+m)Rの周囲の第(n+m)対応被合成位置P(n+m)Rに合わせて第(n+m)対応領域画像I(n+m)Rが合成された第(n+m)右眼用合成画像GS(n+m)Rの一例が示されている。 On the left side of FIG. 4, the (n + m) th reference composite position P ((n + m) left eye image area TA (n + m) L corresponding to the (n + m) left eye image G (n + m) L. n + m) an example of the (n + m) the reference area image I (n + m) second L were synthesized (n + m) for the left eye synthesized image GS (n + m) L in accordance with the L is shown. Further, on the right side of FIG. 4, the (n + m) th corresponding composite position around the (n + m) th right eye image area TA (n + m) R corresponding to the (n + m) th right eye image G (n + m) R. P (n + m) an example of in accordance with the R first (n + m) corresponding area image I (n + m) the R was synthesized (n + m) the right-eye synthesized image GS (n + m) R is shown.
 なお、第(n+m)基準領域画像I(n+m)Lおよび第(n+m)対応領域画像I(n+m)Rは、場面の変化によって変更された基準ズレ量に基づいて生成され得る。つまり、第(n+m)基準領域画像I(n+m)Lと第(n+m)対応領域画像I(n+m)Rとが、変更後の基準ズレ量に応じて表示対象物の同一部分を示す画素の位置が一方向にずれた関係を有する。 Note that the (n + m) th reference area image I (n + m) L and the (n + m) th corresponding area image I (n + m) R can be generated based on the reference shift amount changed due to the change of the scene. That is, the position of the pixel in which the (n + m) th reference area image I (n + m) L and the (n + m) th corresponding area image I (n + m) R indicate the same part of the display object according to the changed reference deviation amount. Have a relationship shifted in one direction.
  <(1-2)情報処理システムの概略構成>
 図5は、一実施形態に係る情報処理システム1の概略構成を示す図である。
<(1-2) Schematic configuration of information processing system>
FIG. 5 is a diagram illustrating a schematic configuration of the information processing system 1 according to the embodiment.
 情報処理システム1は、ステレオカメラ2と情報処理装置4と視線検出センサ5とを備える。情報処理装置4は、ステレオカメラ2と視線検出センサ5とに対してデータの送受信が可能に接続されている。 The information processing system 1 includes a stereo camera 2, an information processing device 4, and a line-of-sight detection sensor 5. The information processing device 4 is connected to the stereo camera 2 and the line-of-sight detection sensor 5 so as to be able to transmit and receive data.
 ステレオカメラ2は、カメラ21とカメラ22とを有する。各カメラ21,22は、例えば、CCD等の撮像素子を有するデジタルカメラの機能を備える。各カメラ21,22では、被写体からの光を受光して、光電変換によって被写体の輝度に関する分布を示す情報を画像データとして取得する撮影動作が行われる。カメラ21の光軸とカメラ22の光軸は、例えば、水平方向に所定距離離隔されている。所定距離は、基線長とも称され、例えば、平均的な人間の左眼の中心と右眼の中心との離隔距離と等価な距離に設定される。 The stereo camera 2 has a camera 21 and a camera 22. Each of the cameras 21 and 22 has a function of a digital camera having an image sensor such as a CCD. Each of the cameras 21 and 22 performs a photographing operation of receiving light from the subject and acquiring information indicating the distribution regarding the luminance of the subject as image data by photoelectric conversion. For example, the optical axis of the camera 21 and the optical axis of the camera 22 are separated by a predetermined distance in the horizontal direction. The predetermined distance is also referred to as a base line length, and is set to a distance equivalent to the distance between the center of the average human left eye and the center of the right eye, for example.
 ステレオカメラ2では、略同一のタイミングでカメラ21とカメラ22とによって撮影動作が行われることで、ステレオ画像が取得される。ステレオ画像は、左眼用画像と右眼用画像との組を含み、ユーザーによって立体視が可能となる様に表示され得る。なお、左眼用画像および右眼用画像は、横方向にA個の画素(Aは自然数、例えば、A=1280)および縦方向にB個の画素(Bは自然数、例えば、B=960)を含むマトリックス状の画素の配列をそれぞれ有している。 In the stereo camera 2, a stereo image is acquired by performing a photographing operation with the camera 21 and the camera 22 at substantially the same timing. The stereo image includes a set of an image for the left eye and an image for the right eye, and can be displayed so as to enable stereoscopic viewing by the user. The left-eye image and right-eye image have A pixels in the horizontal direction (A is a natural number, for example, A = 1280) and B pixels in the vertical direction (B is a natural number, for example, B = 960). Each having an array of pixels in a matrix shape.
 また、ステレオカメラ2では、例えば、所定のタイミングでカメラ21とカメラ22とによって時間的に連続して複数回の撮影が行われることで、Ns組(Nsは2以上の整数)のステレオ画像が取得され得る。このNs組のステレオ画像は、3D動画に含まれるNs個のフレームの画像に相当する。 In the stereo camera 2, for example, Ns sets (Ns is an integer of 2 or more) of stereo images are obtained by performing a plurality of times of continuous shooting with the camera 21 and the camera 22 at a predetermined timing. Can be acquired. The Ns sets of stereo images correspond to images of Ns frames included in the 3D moving image.
 視線検出センサ5は、情報処理装置4に含まれる表示部42の画面上のうちのユーザーが注目している部分(注目部分とも言う)を検出する。表示部42と視線検出センサ5とは、所定の配置関係を有して相互に固定されている。視線検出センサ5では、例えば、まず、撮影によってユーザーの画像が得られ、次に、その画像が解析されることでユーザーの視線の方向が検出され、そして、表示部42の画面上におけるユーザーによる注目部分が検出される。画像の解析は、例えば、パターンマッチングを用いた顔の向きの検出、および色の違いを用いた両眼における白眼の部分と黒眼の部分の識別等によって実現され得る。 The line-of-sight detection sensor 5 detects a portion of the screen of the display unit 42 included in the information processing device 4 that is noticed by the user (also referred to as a portion of interest). The display unit 42 and the line-of-sight detection sensor 5 are fixed to each other with a predetermined arrangement relationship. In the line-of-sight detection sensor 5, for example, first, an image of the user is obtained by photographing, and then the direction of the line of sight of the user is detected by analyzing the image, and then the user on the screen of the display unit 42 The part of interest is detected. The analysis of the image can be realized, for example, by detecting the orientation of the face using pattern matching, and identifying the white-eye portion and the black-eye portion in both eyes using a color difference.
 なお、ステレオカメラ2で得られる1以上のステレオ画像に係る情報は、通信回線3aを介して情報処理装置4に送信され得る。ステレオカメラ2における撮影条件に係る情報も通信回線3aを介して情報処理装置4に送信され得る。また、視線検出センサ5で得られる注目部分に係る情報は、通信回線3bを介して情報処理装置4に送信され得る。通信回線3a,3bは、有線の回線であっても無線の回線であっても良い。 Note that information relating to one or more stereo images obtained by the stereo camera 2 can be transmitted to the information processing device 4 via the communication line 3a. Information relating to shooting conditions in the stereo camera 2 can also be transmitted to the information processing apparatus 4 via the communication line 3a. In addition, information related to the target portion obtained by the line-of-sight detection sensor 5 can be transmitted to the information processing apparatus 4 via the communication line 3b. The communication lines 3a and 3b may be wired lines or wireless lines.
 情報処理装置4は、例えばパーソナルコンピュータ(パソコン)の機能を有する。この情報処理装置4は、操作部41、表示部42、インターフェース(I/F)部43、記憶部44、入出力部45、および制御部46を備える。 The information processing apparatus 4 has a function of, for example, a personal computer (personal computer). The information processing apparatus 4 includes an operation unit 41, a display unit 42, an interface (I / F) unit 43, a storage unit 44, an input / output unit 45, and a control unit 46.
 操作部41は、例えば、マウスおよびキーボード等を含む。表示部42は、例えば、液晶ディスプレイ等を備える。I/F部43は、ステレオカメラ2および視線検出センサ5からの情報を受信する。記憶部44は、例えばハードディスク等を有し、ステレオカメラ2で得られる各画像等を記憶する。また、記憶部44には、情報処理装置4において各種動作を実現するためのプログラムPG1等が格納される。入出力部45は、例えば、ディスクドライブを備え、光ディスク等の記憶媒体6を受け付けて、制御部46との間でデータの授受を行い得る。 The operation unit 41 includes, for example, a mouse and a keyboard. The display unit 42 includes, for example, a liquid crystal display. The I / F unit 43 receives information from the stereo camera 2 and the line-of-sight detection sensor 5. The storage unit 44 includes, for example, a hard disk and stores each image obtained by the stereo camera 2. Further, the storage unit 44 stores a program PG1 and the like for realizing various operations in the information processing apparatus 4. The input / output unit 45 includes, for example, a disk drive, can receive the storage medium 6 such as an optical disk, and can exchange data with the control unit 46.
 制御部46は、プロセッサーとして働くCPU46aと、情報を一時的に記憶し得るメモリ46bとを有し、情報処理装置4の各部を制御する。また、制御部46では、記憶部44内のプログラムPG1が読み込まれて実行されることで、各種機能および各種情報処理等が実現される。この情報処理において一時的に生成されるデータは、メモリ46bに適宜記憶される。そして、制御部46による制御によって、情報処理装置4は、3D動画を生成する画像処理装置として働き、更に、3D動画を表示部42に表示させる画像表示システムとしても働く。なお、制御部46は、記憶媒体6に記憶されているプログラムを、入出力部45を介して記憶部44等に格納させ得る。 The control unit 46 includes a CPU 46 a that functions as a processor and a memory 46 b that can temporarily store information, and controls each unit of the information processing apparatus 4. In the control unit 46, various functions, various information processing, and the like are realized by reading and executing the program PG1 in the storage unit 44. Data temporarily generated in this information processing is appropriately stored in the memory 46b. And by control by the control part 46, the information processing apparatus 4 functions as an image processing apparatus which produces | generates a 3D moving image, and also functions as an image display system which displays a 3D moving image on the display part 42. FIG. Note that the control unit 46 can store the program stored in the storage medium 6 in the storage unit 44 or the like via the input / output unit 45.
  <(1-3)画像処理装置の機能的な構成>
 図6は、制御部46で実現される画像処理装置に係る機能的な構成を例示するブロック図である。画像処理装置に係る機能的な構成には、第1取得部としての映像取得部461、注目領域検出部462、変化検出部463、ズレ量決定部464、第2取得部としての領域画像取得部465、信号受付部466、モード設定部467、位置指定部468、および画像合成部469が含まれる。
<(1-3) Functional Configuration of Image Processing Device>
FIG. 6 is a block diagram illustrating a functional configuration of the image processing apparatus realized by the control unit 46. The functional configuration of the image processing apparatus includes a video acquisition unit 461 as a first acquisition unit, a region of interest detection unit 462, a change detection unit 463, a shift amount determination unit 464, and a region image acquisition unit as a second acquisition unit. 465, a signal receiving unit 466, a mode setting unit 467, a position specifying unit 468, and an image composition unit 469 are included.
 映像取得部461は、3D動画に係る情報を含む映像情報を取得する。3D動画は、Ns組(Nsは2以上の整数)のステレオ画像、すなわちNsフレームの画像情報を含む。NsフレームのうちのN番目(Nは任意の自然数)のフレームのステレオ画像は、物体の同一部分を示す画素の位置が一方向(ここでは、横方向)にずれている関係を有する第N左眼用画像GNLと第N右眼用画像GNRとを含む。また、映像情報は、3D動画に合わせた音声に係る情報、および3D動画に係るメタデータを含み得る。メタデータには、字幕情報、チャプター情報、および撮影条件の情報のうちの1以上の情報が含まれ得る。 The video acquisition unit 461 acquires video information including information related to the 3D moving image. The 3D moving image includes Ns sets (Ns is an integer of 2 or more) of stereo images, that is, image information of Ns frames. The stereo image of the Nth frame (N is an arbitrary natural number) of the Ns frames has a relationship in which the positions of pixels indicating the same portion of the object are shifted in one direction (here, the horizontal direction). It includes an eye image GN L and an Nth right eye image GN R. In addition, the video information may include information related to audio matched to the 3D video and metadata related to the 3D video. The metadata may include one or more information of caption information, chapter information, and shooting condition information.
 なお、本実施形態では、第N左眼用画像GNLが基準となる画像(基準画像とも言う)とされ、第N右眼用画像GNRが基準画像に対応する画像(対応画像とも言う)とされる。但し、第N右眼用画像GNRが基準画像とされ、第N左眼用画像GNLが対応画像とされても良い。 In the present embodiment, the N-th left-eye image GN L is used as a reference image (also referred to as a reference image), and the N-th right-eye image GN R is an image corresponding to the reference image (also referred to as a corresponding image). It is said. However, the Nth right eye image GN R may be a reference image, and the Nth left eye image GN L may be a corresponding image.
 注目領域検出部462は、予め設定された検出ルールに従って、第N左眼用画像GNLと第N右眼用画像GNRとの組からユーザーの目を惹くものと予測される同一物体を捉えた注目領域の組を検出する。注目領域の組には、第N左眼用画像GNLから検出される注目領域(基準注目領域とも言う)と第N右眼用画像GNRから検出される注目領域(対応注目領域とも言う)との組が含まれる。 The attention area detection unit 462 captures the same object that is predicted to attract the user's eyes from the set of the Nth left eye image GN L and the Nth right eye image GN R according to a preset detection rule. A set of attention areas is detected. An attention area set includes an attention area (also referred to as a reference attention area) detected from the Nth left eye image GN L and an attention area (also referred to as a corresponding attention area) detected from the Nth right eye image GN R. And a pair.
 変化検出部463は、映像情報における場面の変化を検出する。変化検出部463では、例えば、映像情報に含まれる3D動画の情報、音声情報、およびメタデータのうちの1以上の情報に基づいて場面の変化が検出され得る。場面の変化には、場所および時刻等の変化、主な表示対象物の入れ替わり、および表示対象物の状態の変化等が含まれ得る。表示対象物の状態の変化には、表示対象物の移動、および合焦している表示対象物の変更等が含まれ得る。 The change detector 463 detects a scene change in the video information. In the change detection unit 463, for example, a change in the scene can be detected based on one or more information of 3D moving image information, audio information, and metadata included in the video information. Changes in the scene can include changes in location and time, replacement of main display objects, changes in the state of display objects, and the like. The change in the state of the display object can include movement of the display object, change of the focused display object, and the like.
 ズレ量決定部464は、所定の決定ルールに従うことで、場面毎に基準ズレ量を決定する。ズレ量決定部464では、例えば、変化検出部463による場面の変化の検出に応答して、基準ズレ量が決定され得る。このとき、基準ズレ量が、場面の変化後の一場面に属する1以上のフレームの第N左眼用画像GNLと第N右眼用画像GNRとに基づいて決定され得る。具体的には、第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(視差)に基づいて、基準ズレ量が決定され得る。なお、ズレ量決定部464では、変化検出部463における各種演算結果が利用されることで、演算量が低減され得る。 The deviation amount determination unit 464 determines a reference deviation amount for each scene by following a predetermined determination rule. In the deviation amount determination unit 464, for example, in response to detection of a scene change by the change detection unit 463, the reference deviation amount can be determined. At this time, the reference shift amount can be determined based on the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames belonging to one scene after the scene change. Specifically, the reference shift amount is determined based on the shift amount (parallax) of the pixel position indicating the same part of the object between the N-th left eye image GN L and the N-th right eye image GN R. Can be done. The deviation amount determination unit 464 can reduce the calculation amount by using various calculation results in the change detection unit 463.
 領域画像取得部465は、第N基準領域画像INLと第N対応領域画像INRとに係る情報(第N領域画像情報とも言う)を取得する。第N基準領域画像INLと第N対応領域画像INRとは、表示対象物の同一部分を示す画素の位置が一方向(ここでは横方向)に、ズレ量決定部464で決定された基準ズレ量ずれている関係を有する。領域画像取得部465では、例えば、場面毎に、基準ズレ量に基づいて、第N基準領域画像INLと第N対応領域画像INRとに係る第N領域画像情報が取得され得る。 The area image acquisition unit 465 acquires information (also referred to as Nth area image information) related to the Nth reference area image IN L and the Nth corresponding area image IN R. The N-th reference area image IN L and the N-th corresponding area image IN R are the reference in which the position of the pixel indicating the same portion of the display object is determined in one direction (here, the horizontal direction) by the shift amount determination unit 464. There is a relationship of deviation. In the area image acquisition unit 465, for example, for each scene, the Nth area image information related to the Nth reference area image IN L and the Nth corresponding area image IN R can be acquired based on the reference deviation amount.
 信号受付部466は、ユーザーによる操作部41の操作に応じて制御部46に入力される信号を受け付ける。 The signal receiving unit 466 receives a signal input to the control unit 46 according to the operation of the operation unit 41 by the user.
 モード設定部467は、第1モードと第2モードとを含む複数のモードのうちの何れか1つのモードに位置指定部468を設定する。モード設定部467は、例えば、信号受付部466で受け付けられた信号に応じて、位置指定部468のモードが設定され得る。また、例えば、信号受付部466で受け付けられた信号に応じて、注目領域検出部462が、注目領域を検出するモード(検出許可モードとも言う)または注目領域を検出しないモード(検出禁止モードとも言う)に設定され得る。なお、モード設定部467では、映像取得部461で取得された第N左眼用画像GNLと第N右眼用画像GNRとに基づいて、位置指定部468のモードが設定されても良いし、注目領域検出部462のモードが設定されても良い。 The mode setting unit 467 sets the position specifying unit 468 to any one of a plurality of modes including the first mode and the second mode. For example, the mode setting unit 467 can set the mode of the position specifying unit 468 in accordance with the signal received by the signal receiving unit 466. Further, for example, in accordance with a signal received by the signal receiving unit 466, the attention area detection unit 462 detects a attention area (also referred to as a detection permission mode) or does not detect a attention area (also referred to as a detection prohibit mode). ). The mode setting unit 467 may set the mode of the position specifying unit 468 based on the Nth left eye image GN L and the Nth right eye image GN R acquired by the video acquisition unit 461. The mode of the attention area detection unit 462 may be set.
 位置指定部468は、第N基準領域画像INLが合成される位置(第N基準被合成位置とも言う)PNLと、第N対応領域画像INRが合成される位置(第N対応被合成位置とも言う)PNRと、を指定する。また、例えば、位置指定部468の第1モードが、画像を囲むように第N基準被合成位置PNLおよび第N対応被合成位置PNRを指定するモードであり、位置指定部468の第2モードが、注目領域に応じて第N基準被合成位置PNLおよび第N対応被合成位置PNRを指定するモードである場合が考えられる。 Position specifying unit 468, the N reference region image IN L is (also referred to as N-th reference the synthetic position) by the position synthesized and PN L, position where the N corresponding region image IN R is synthesized (the N corresponding object synthesizing also referred to) to specify and PN R, the a position. Further, for example, a first mode position specifying unit 468, a mode for specifying the first N reference the combined position PN L and the N corresponding the combined position PN R so as to surround the image, the position specifying unit 468 2 mode, when a mode for designating a first N reference the combined position PN L and the N corresponding object synthesizing position PN R in accordance with the region of interest can be considered.
 なお、第N基準被合成位置PNLは、第N左眼用画像GNLの周囲の領域において指定されても良いし、第N左眼用画像GNLとは異なる別の画像の該周囲の領域に対応する領域において指定されても良い。この別の画像としては、例えば、インターレース方式の動画における別のフィールド画像等が考えられ得る。また、第N対応被合成位置PNRは、第N右眼用画像GNRの周囲の領域において指定されても良いし、第N右眼用画像GNRとは異なる別の画像の該周囲の領域に対応する領域において指定されても良い。この別の画像としては、例えば、インターレース方式の動画における別のフィールド画像等が考えられ得る。 The N-th reference composite position PN L may be specified in a region around the N-th left-eye image GN L , or the periphery of another image different from the N-th left-eye image GN L. It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered. Further, the first N corresponds the combined position PN R, may be specified in the region of the periphery of the N right-eye image GN R, of the periphery of different separate image and the N right-eye image GN R It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered.
 画像合成部469は、第N左眼用画像GNLと、第N右眼用画像GNRと、第N基準領域画像INLと、第N対応領域画像INRとを合成することで、立体視が可能な画像の情報(立体視画像情報とも言う)を生成する。例えば、場面の変化によって区切られ得る各場面を対象として、各フレームについて、第N左眼用画像GNLと、第N右眼用画像GNRと、第N基準領域画像INLと、第N対応領域画像INRとが合成され得る。ここでは、第N基準領域画像INLは、位置指定部468によって指定された第N基準被合成位置PNLに合わせて配置され、第N対応領域画像INRは、位置指定部468によって指定された第N対応被合成位置PNRに合わせて配置され得る。 The image composition unit 469 synthesizes the N-th left eye image GN L , the N-th right eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN R to create a three-dimensional image. Information of a viewable image (also referred to as stereoscopic image information) is generated. For example, for each scene that can be segmented by a change in scene, for each frame, an Nth left eye image GN L , an Nth right eye image GN R , an Nth reference region image IN L, and an Nth The corresponding area image IN R can be synthesized. Here, the N reference region image IN L is aligned with the first N reference the synthetic position PN L specified by the position specifying unit 468, the first N corresponding region image IN R, is designated by position designator 468 the N corresponds may be aligned with the object to be combined position PN R was.
 立体視画像情報は、制御部46の制御により、表示部42において可視的に出力され得る。例えば、表示部42では、立体視画像情報に基づいて、第N左眼用画像GNLと、第N右眼用画像GNRと、第N基準領域画像INLと、第N対応領域画像INRとが重畳されて同時期に表示され得る。また、例えば、表示部42では、第N左眼用画像GNLと、第N右眼用画像GNRと、第N基準領域画像INLと、第N対応領域画像INRのうちの1以上の画像と残余の1以上の画像とが時間順次に表示されても良い。 The stereoscopic image information can be visually output on the display unit 42 under the control of the control unit 46. For example, in the display unit 42, based on the stereoscopic image information, the N-th left eye image GN L , the N-th right eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN. R can be superimposed and displayed at the same time. Further, for example, in the display unit 42, one or more of the N-th left-eye image GN L , the N-th right-eye image GN R , the N-th reference region image IN L , and the N-th corresponding region image IN R. These images and the remaining one or more images may be displayed in time sequence.
 このとき、第N左眼用画像領域TANLと第N右眼用画像領域TANRとがユーザーに与える物体に係る距離感が、第N基準領域画像INLと第N対応領域画像INRとがユーザーに与える表示物に係る距離感との比較によって高められ得る。そして、3D動画における場面の変化に応じた基準ズレ量の変更により、3D動画における視差の違いが認識され易くなるため、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 At this time, the sense of distance related to the object given to the user by the N-th left-eye image area TAN L and the N-th right-eye image area TAN R is the N-th reference area image IN L and the N-th corresponding area image IN R. Can be enhanced by comparison with the sense of distance related to the display object given to the user. Further, since the difference in parallax in the 3D video can be easily recognized by changing the reference shift amount according to the change in the scene in the 3D video, the sense of depth that can be obtained by the user watching the 3D video can be improved.
   <(1-3-1)注目領域の検出方法>
 注目領域検出部462における基準注目領域および対応注目領域の検出方法としては、例えば、下記の第1~6領域検出方法のうちの1以上の領域検出方法が採用され得る。
<(1-3-1) Attention Area Detection Method>
As a detection method of the reference attention area and the corresponding attention area in the attention area detection unit 462, for example, one or more area detection methods among the following first to sixth area detection methods may be employed.
    <(1-3-1-1)第1領域検出方法>
 第N左眼用画像GNLおよび第N右眼用画像GNRの中央部近傍に位置する画像領域が注目領域として検出される。例えば、ハフ変換等の画像処理により第N左眼用画像GNLにおいて物体の輪郭が抽出され、第N左眼用画像GNLの中央部近傍に位置する物体が捉えられた画像領域が基準注目領域として検出され得る。そして、第N右眼用画像GNRにおいて基準注目領域に対応する画像領域が対応注目領域として検出され得る。ここでは、位相限定相関(POC)法等の対応点の探索によって、例えば、基準注目領域に対応する対応注目領域が検出され得る。
<(1-3-1-1) First Area Detection Method>
An image area located in the vicinity of the center of the N-th left-eye image GN L and the N-th right-eye image GN R is detected as the attention area. For example, an image region in which the contour of an object is extracted from the N-th left-eye image GN L by image processing such as Hough transform, and an object located near the center of the N-th left-eye image GN L is used as a reference attention. It can be detected as a region. Then, an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as a phase-only correlation (POC) method.
    <(1-3-1-2)第2領域検出方法>
 第N左眼用画像GNLおよび第N右眼用画像GNRが対象とされ、テンプレートマッチング等によって人物等といった特定種類の物体を示す画像領域が注目領域として検出される。例えば、第N左眼用画像GNLを対象としたテンプレートマッチング等によって基準注目領域が検出され得る。そして、第N右眼用画像GNRにおいて基準注目領域に対応する画像領域が対応注目領域として検出され得る。ここでは、POC法等の対応点の探索によって、例えば、基準注目領域に対応する対応注目領域が検出され得る。
<(1-3-1-2) Second area detection method>
The N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and an image region indicating a specific type of object such as a person is detected as a region of interest by template matching or the like. For example, the reference region of interest can be detected by template matching or the like targeting the Nth left eye image GN L. Then, an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
    <(1-3-1-3)第3領域検出方法>
 3D動画に含まれた複数のステレオ画像が対象とされた動きベクトルの解析によって、注目領域が検出される。例えば、所定数のフレームにおいて或る閾値を超える動きを有する物体を示す領域が基準注目領域および対応注目領域として検出され得る。具体的には、複数の第N左眼用画像GNLを対象とした動きベクトルの解析によって、基準注目領域が検出され得る。そして、第N右眼用画像GNRにおいて基準注目領域に対応する画像領域が対応注目領域として検出され得る。ここでは、POC法等の対応点の探索によって、例えば、基準注目領域に対応する対応注目領域が検出され得る。
<(1-3-1-3) Third Area Detection Method>
A region of interest is detected by analyzing a motion vector for a plurality of stereo images included in the 3D moving image. For example, a region indicating an object having a motion exceeding a certain threshold in a predetermined number of frames can be detected as the reference attention region and the corresponding attention region. Specifically, the reference region of interest can be detected by analyzing motion vectors for a plurality of Nth left-eye images GN L. Then, an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
    <(1-3-1-4)第4領域検出方法>
 第N左眼用画像GNLおよび第N右眼用画像GNRが対象とされ、周囲とは異なる特定色および特定のテクスチャーのうちの少なくとも一方が検知された領域が、基準注目領域および対応注目領域として検出される。例えば、第N左眼用画像GNLが対象とされて、人間の特徴的な色である肌色等といった特定色が検知された領域が基準注目領域として検出され得る。また、第N左眼用画像GNLが対象とされて、人間の頭部に含まれる部位(目、頭髪、眉毛、口等)の色配列等といった特定のテクスチャーが検知された領域が基準注目領域として検出されても良い。そして、第N右眼用画像GNRにおいて基準注目領域に対応する画像領域が対応注目領域として検出され得る。ここでは、POC法等の対応点の探索によって、例えば、基準注目領域に対応する対応注目領域が検出され得る。
<(1-3-1-4) Fourth Area Detection Method>
An area in which at least one of a specific color and a specific texture different from the surroundings is targeted for the N-th left-eye image GN L and the N-th right-eye image GN R is the reference attention area and the corresponding attention. Detected as a region. For example, an area in which a specific color such as a skin color that is a characteristic color of human being is targeted for the N-th left-eye image GN L can be detected as the reference attention area. In addition, an area in which a specific texture such as a color arrangement of a part (eyes, head hair, eyebrows, mouth, etc.) included in the human head is detected as a reference is targeted for the N-th left-eye image GN L. It may be detected as a region. Then, an image area corresponding to the reference attention area in the N-th right eye image GN R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.
 なお、1つの画像において同じ種類の複数の被写体(例えば、人間)が部分的に重なり合っている状態が捉えられている場合が考えられる。この場合、例えば、第N左眼用画像GNLと第N右眼用画像GNRから求められ得る視点から被写体までの距離を示す情報が利用されて、複数の被写体が区別され得る。距離を示す情報は、三角測量の原理が利用されて、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差およびステレオカメラ2の基線長から求められ得る。視差は、POC法等の対応点の探索によって求められ得る。 A case where a plurality of subjects of the same type (for example, humans) are partially overlapped in one image can be considered. In this case, for example, information indicating the distance from the viewpoint to the subject that can be obtained from the N-th left eye image GN L and the N-th right eye image GN R can be used to distinguish a plurality of subjects. The information indicating the distance can be obtained from the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R and the baseline length of the stereo camera 2 using the principle of triangulation. The parallax can be obtained by searching for corresponding points such as the POC method.
    <(1-3-1-5)第5領域検出方法>
 視線検出センサ5で得られる注目部分に係る情報に基づいて、第N左眼用画像GNLおよび第N右眼用画像GNRから基準注目領域および対応注目領域が検出される。例えば、第N左眼用画像GNLおよび第N右眼用画像GNRが対象とされて、テンプレートマッチング等によって人物等の特定種類の物体を示す一組以上の画像領域が予め検出され得る。そして、この一組以上の画像領域のうちの注目部分に対応する領域が、基準注目領域および対応注目領域として検出され得る。
<(1-3-1-5) Fifth Region Detection Method>
Based on the information relating to the attention portion obtained by the line-of-sight detection sensor 5, the reference attention region and the corresponding attention region are detected from the N-th left eye image GN L and the N-th right eye image GN R. For example, the N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and one or more sets of image regions indicating a specific type of object such as a person can be detected in advance by template matching or the like. Then, an area corresponding to the target portion of the one or more sets of image areas can be detected as the reference target area and the corresponding target area.
    <(1-3-1-6)第6領域検出方法>
 第N左眼用画像GNLおよび第N右眼用画像GNRが対象とされ、ハフ変換等の画像処理によって各物体の輪郭の抽出が行われる。そして、輪郭で囲まれた各物体に係る領域のうち、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差が、時間の経過に拘わらず、急激に変化しない領域が、人物が捉えられた基準注目領域および対応注目領域として検出される。なお、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差は、例えば、POC法等を用いた対応点の探索によって求められ得る。
<(1-3-1-6) Sixth Region Detection Method>
The N-th left-eye image GN L and the N-th right-eye image GN R are targeted, and the contour of each object is extracted by image processing such as Hough transform. And the parallax between the N-th left eye image GN L and the N-th right eye image GN R in the region related to each object surrounded by the outline does not change abruptly regardless of the passage of time. The areas are detected as the reference attention area and the corresponding attention area where the person is captured. Note that the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R can be obtained, for example, by searching for corresponding points using the POC method or the like.
 例えば、マラソンの走者がカメラによって前方または後方から動画撮影される際、カメラが走者に追従していれば、走者とカメラ(視点)との距離が時間の経過に拘わらず急激には変化しない場合がある。この場合、輪郭で囲まれた各被写体に係る領域のうち、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差が、所定時間に対応する所定数のフレームにおいて所定範囲内でしか変化しない領域(視差微変領域とも言う)が存在し得る。そこで、例えば、各視差微変領域のうち、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差が所定値以上である領域が、人物が捉えられた基準注目領域および対応注目領域として検出され、視差が所定値未満である領域が、遠景が捉えられた領域として検出され得る。 For example, when a marathon runner is video-recorded from the front or rear by a camera, if the camera follows the runner, the distance between the runner and the camera (viewpoint) will not change abruptly over time There is. In this case, the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R in the region related to each subject surrounded by the contour is in a predetermined number of frames corresponding to a predetermined time. There may be an area that changes only within a predetermined range (also referred to as a parallax slightly changing area). Therefore, for example, in each parallax slightly changing region, an area where the parallax between the N-th left eye image GN L and the N-th right eye image GN R is equal to or greater than a predetermined value is a reference attention in which a person is captured. A region that is detected as a region and a corresponding attention region and whose parallax is less than a predetermined value can be detected as a region where a distant view is captured.
   <(1-3-2)場面の変化の検出方法>
 変化検出部463における場面の変化の検出は、映像情報に含まれ得る3D動画の情報、音声情報、メタデータのうちの1以上の情報に基づいて検出され得る。
<(1-3-2) Scene change detection method>
Detection of a scene change in the change detection unit 463 may be detected based on one or more information of 3D moving image information, audio information, and metadata that may be included in the video information.
    <(1-3-2-1)3D動画の情報に基づく検出>
 変化検出部463は、映像情報に含まれる2以上のフレームの間における画像の変化に応じて、場面の変化を検出し得る。これにより、画像の変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。
<(1-3-2-1) Detection based on 3D video information>
The change detection unit 463 can detect a change in the scene according to an image change between two or more frames included in the video information. Thereby, according to the change of an image, the image which has the parallax used as the reference suitable for a scene can be synthesize | combined with a 3D moving image.
 例えば、n+m番目(mは自然数)のフレームがmフレーム前のn番目のフレームと比較されることで、場面の変化が検出され得る。このとき、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの比較、第n右眼用画像GnRと第(n+m)右眼用画像G(n+m)Rとの比較、およびその両方の比較のうちの何れの比較が行われても良い。mは1であっても良いし、2以上の値であっても良い。 For example, a scene change can be detected by comparing an n + m-th frame (m is a natural number) with an n-th frame before m frames. At this time, the n-th left-eye image Gn L the (n + m) the left-eye image G (n + m) comparison of L, the n-th right-eye image Gn R and the (n + m) the right eye image G (n + m) Any comparison of the comparison with R and both comparisons may be made. m may be 1 or 2 or more.
 変化検出部463における3D動画の情報に基づく場面の変化の検出方法としては、例えば、次の第1~6変化検出方法のうちの1以上の変化検出方法が採用され得る。 For example, one or more change detection methods among the following first to sixth change detection methods can be employed as the change detection method of the scene based on the information of the 3D moving image in the change detection unit 463.
     <(1-3-2-1-1)第1変化検出方法>
 映像情報に含まれる2以上のフレームの間における輝度の変化、色の変化、および周波数成分の変化のうちの1以上の変化に応じて、場面の変化が検出され得る。
<(1-3-2-1-1) First Change Detection Method>
A scene change may be detected in response to one or more of a luminance change, a color change, and a frequency component change between two or more frames included in the video information.
 例えば、第nフレームと第n+mフレームとの間で、各画素における輝度信号の差分が積算された値(輝度差分積算値とも言う)が所定の閾値を超えれば、場面の変化が生じたものとして場面の変化が検出され得る。また、画像における垂直ラインの輝度値の積算値(輝度積算値とも言う)に係る分布の変化に応じて、場面の変化が検出されても良い。この場合、例えば、第nフレームと第(n+m)フレームとの間で、各垂直ラインの輝度積算値の差分の総和が所定の閾値を超えれば、場面の変化が生じたものとして場面の変化が検出され得る。 For example, if the value obtained by integrating the difference of the luminance signal in each pixel between the nth frame and the n + mth frame (also referred to as the luminance difference integrated value) exceeds a predetermined threshold value, it is assumed that the scene has changed. Changes in the scene can be detected. In addition, a change in scene may be detected in accordance with a change in distribution related to an integrated value (also referred to as an integrated luminance value) of luminance values of vertical lines in an image. In this case, for example, if the sum of the differences in luminance integrated values of the vertical lines exceeds a predetermined threshold between the nth frame and the (n + m) frame, the scene changes as if the scene has changed. Can be detected.
 図7に、図1で示された第n左眼用画像GnLの積算輝度値の分布の一例が模式的に示されており、図8に、図1で示された第(n+m)左眼用画像G(n+m)Lの積算輝度値の分布の一例が模式的に示されている。このように、積算輝度値の分布に大きな変化が生じる場合には、第n+mフレームと第nフレームとの間で、積算輝度値の分布の差分が所定の閾値を超え、場面の変化が生じたものとして場面の変化が検出され得る。 FIG. 7 schematically shows an example of the distribution of integrated luminance values of the n-th left-eye image Gn L shown in FIG. 1, and FIG. 8 shows the (n + m) th left-hand side shown in FIG. An example of the distribution of integrated luminance values of the eye image G (n + m) L is schematically shown. Thus, when a large change occurs in the distribution of the integrated luminance value, the difference in the distribution of the integrated luminance value exceeds a predetermined threshold between the n + m frame and the nth frame, and a scene change occurs. Scene changes can be detected as if.
 また、例えば、第nフレームと第n+mフレームとの間で、各画素における特定色の画素値の差分が積算された値(色成分差分積算値とも言う)が所定の閾値を超えれば、場面の変化が生じたものとして場面の変化が検出され得る。また、画像における垂直ラインの特定色の画素値の積算値(色成分積算値とも言う)に係る分布の変化に応じて、場面の変化が検出されても良い。このとき、例えば、第nフレームと第(n+m)フレームとの間で、各垂直ラインの色成分積算値の差分の総和が所定の閾値を超えれば、場面の変化が生じたものとして場面の変化が検出され得る。 Further, for example, if a value obtained by integrating the difference of pixel values of a specific color in each pixel (also referred to as a color component difference integrated value) exceeds a predetermined threshold between the nth frame and the n + m frame, A scene change can be detected as a change has occurred. In addition, a change in scene may be detected according to a change in distribution related to an integrated value (also referred to as a color component integrated value) of pixel values of a specific color of a vertical line in an image. At this time, for example, if the sum of the differences of the color component integrated values of each vertical line exceeds a predetermined threshold between the nth frame and the (n + m) frame, the scene change is assumed to have occurred. Can be detected.
 また、例えば、第nフレームと第n+mフレームとの間で、画素値の分布が対象とされたフーリエ変換等によって得られる空間周波数の成分の強度分布が比較されることで、場面の変化が検出され得る。具体的には、例えば、空間周波数の範囲が所定数の帯域に区分されて、各帯域における周波数成分の強度の差分が積算された値が所定の閾値を超えれば、場面の変化が生じたものとして場面の変化が検出され得る。 In addition, for example, a change in scene is detected by comparing the intensity distribution of spatial frequency components obtained by Fourier transform or the like for the distribution of pixel values between the nth frame and the n + mth frame. Can be done. Specifically, for example, if the spatial frequency range is divided into a predetermined number of bands and the sum of the differences in the intensity of the frequency components in each band exceeds a predetermined threshold, a scene change has occurred. A change in scene can be detected.
     <(1-3-2-1-2)第2変化検出方法>
 映像情報に含まれた複数のフレームにおける、基準注目領域と対応注目領域との組の変化に応じて、場面の変化が検出され得る。基準注目領域と対応注目領域との組は、注目領域検出部462によって検出され得る。
<(1-3-2-1-2) Second change detection method>
A change in the scene can be detected in accordance with a change in the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
 例えば、基準注目領域と対応注目領域との組が変更されれば、場面の変化が生じたものとして場面の変化が検出され得る。ここでは、基準注目領域の変更が場面の変化として検出されても良いし、対応注目領域の変更が場面の変化として検出されても良い。この場合、例えば、第nフレームと第n+mフレームとの間で、基準注目領域および対応注目領域のうちの少なくとも一方の変更が認識されれば良い。 For example, if the set of the reference attention area and the corresponding attention area is changed, the change in the scene can be detected as a change in the scene. Here, a change in the reference attention area may be detected as a change in the scene, and a change in the corresponding attention area may be detected as a change in the scene. In this case, for example, a change in at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame.
 図9および図10には、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間で、基準注目領域としての3つ物体領域O1L~O3Lが捉えられた画像が、基準注目領域としての1つの物体領域O4Lが捉えられた画像に変更されている様子が示されている。この場合、物体領域O1Lと物体領域O4Lとの間における物体の不一致に応じて、基準注目領域の変更が認識され得る。物体領域O1Lと物体領域O4Lとの間における表示対象物の不一致は、テンプレートマッチングおよびPOC法等による対応点の探索によって認識され得る。なお、ここでは、mは1以上であれば良く、20~50であっても良い。 9 and 10, there are three object regions O1 L to O3 L as reference attention regions between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L. A state is shown in which the captured image is changed to an image in which one object region O4 L as a reference attention region is captured. In this case, the change of the reference attention area can be recognized in accordance with the object mismatch between the object area O1 L and the object area O4 L. The mismatch of the display object between the object region O1 L and the object region O4 L can be recognized by searching for corresponding points by template matching, the POC method, or the like. Here, m may be 1 or more, and may be 20 to 50.
 また、例えば、基準注目領域と対応注目領域との組の移動に応じて場面の変化が検出され得る。ここでは、基準注目領域の移動に応じて場面の変化が検出されても良いし、対応注目領域の移動に応じて場面の変化が検出されても良い。この場合、第nフレームと第n+mフレームとの間で、基準注目領域および対応注目領域のうちの少なくとも一方が所定量を超えて移動しているか否かが認識されれば良い。 Also, for example, a scene change can be detected in accordance with the movement of a set of a reference attention area and a corresponding attention area. Here, a scene change may be detected according to the movement of the reference attention area, or a scene change may be detected according to the movement of the corresponding attention area. In this case, it is only necessary to recognize whether at least one of the reference attention area and the corresponding attention area has moved beyond a predetermined amount between the nth frame and the (n + m) th frame.
     <(1-3-2-1-3)第3変化検出方法>
 映像情報に含まれた複数のフレームにおける、基準注目領域と対応注目領域との組の出現および消失の少なくとも一方に応じて、場面の変化が検出され得る。基準注目領域と対応注目領域との組は、注目領域検出部462によって検出され得る。
<(1-3-2-1-3) Third change detection method>
A scene change can be detected in accordance with at least one of the appearance and disappearance of a set of a reference attention area and a corresponding attention area in a plurality of frames included in video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
 例えば、基準注目領域と対応注目領域との組が出現および消失すれば、場面の変化が生じたものとして場面の変化が検出され得る。ここでは、基準注目領域の出現および消失が場面の変化として検出されても良いし、対応注目領域の出現および消失が場面の変化として検出されても良い。この場合、例えば、第nフレームと第n+mフレームとの間で、基準注目領域および対応注目領域のうちの少なくとも一方の領域の出現および消失が認識されれば良い。基準注目領域の出現および消失、ならびに対応注目領域の出現および消失は、例えば、基準注目領域および対応注目領域の数の変化の検出、ならびに物体の移動量と移動方向の検出によって認識され得る。物体の移動量と移動方向の検出は、第nフレームと第n+mフレームとの間で、各画像が複数のブロックに分割された上で比較されることで実現され得る。なお、ここでは、mは1以上であれば良く、20~50であっても良い。 For example, if a set of a reference attention area and a corresponding attention area appears and disappears, a scene change can be detected as a scene change has occurred. Here, the appearance and disappearance of the reference attention area may be detected as a scene change, and the appearance and disappearance of the corresponding attention area may be detected as a scene change. In this case, for example, the appearance and disappearance of at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame. The appearance and disappearance of the reference attention area and the appearance and disappearance of the corresponding attention area can be recognized by, for example, detecting a change in the number of the reference attention area and the corresponding attention area, and detecting the movement amount and the moving direction of the object. The detection of the moving amount and the moving direction of the object can be realized by dividing each image into a plurality of blocks and comparing them between the nth frame and the n + m frame. Here, m may be 1 or more, and may be 20 to 50.
 図11および図12には、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間で、基準注目領域としての2つの物体領域O5L,O6Lが捉えられた画像に、基準注目領域としての物体領域O7Lが左方から出現している様子が示されている。つまり、物体がフレームインしている様子が示されている。この場合、例えば、物体領域の数の変化に応じて物体領域O7Lの出現が認識されても良いし、物体領域O7Lの移動量と移動方向の検出に応じて物体領域O7Lの出現が認識されても良い。 11 and 12, there are two object regions O5 L and O6 L as reference attention regions between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L. The captured image shows a state in which an object region O7 L as a reference attention region appears from the left side. That is, a state in which the object is in frame is shown. In this case, for example, the appearance of the object region O7 L may be recognized according to the change in the number of object regions, or the appearance of the object region O7 L according to the detection of the movement amount and the movement direction of the object region O7 L. It may be recognized.
 なお、変化後の場面の最初のフレームは、基準注目領域が出現または消失し始めた時点のフレームとされて良いし、基準注目領域の出現または消失が所定割合進行した時点のフレームとされても良いし、基準注目領域の出現または消失が完了した時点のフレームとされて良い。また、変化後の場面の最初のフレームは、対応注目領域が出現または消失し始めた時点のフレームとされて良いし、対応注目領域の出現または消失が所定割合進行した時点のフレームとされても良いし、対応注目領域の出現または消失が完了した時点のフレームとされて良い。 The first frame of the scene after the change may be a frame when the reference attention area starts to appear or disappear, or may be a frame when the appearance or disappearance of the reference attention area progresses by a predetermined rate. The frame may be a frame at the time when the appearance or disappearance of the reference attention area is completed. Further, the first frame of the changed scene may be a frame at the time when the corresponding attention area starts to appear or disappear, or may be a frame at the time when the appearance or disappearance of the corresponding attention area has progressed by a predetermined rate. It may be a frame when the appearance or disappearance of the corresponding attention area is completed.
     <(1-3-2-1-4)第4変化検出方法>
 映像情報に含まれた複数のフレームにおける、基準注目領域と対応注目領域との組の大きさの変化に応じて、場面の変化が検出され得る。基準注目領域と対応注目領域との組は、注目領域検出部462によって検出され得る。
<(1-3-2-1-4) Fourth Change Detection Method>
A change in the scene can be detected in accordance with a change in the size of the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.
 例えば、基準注目領域および対応注目領域の面積が所定量を超えて変化すれば、場面の変化が生じたものとして場面の変化が検出され得る。ここでは、基準注目領域の面積が所定量を超えて変化していれば場面の変化として検出されても良いし、対応注目領域の面積が所定量を超えて変化していれば場面の変化として検出されても良い。この場合、例えば、第nフレームと第n+mフレームとの間で、テンプレートマッチングおよびPOC法等による対応点の探索によって、同一の物体を示す基準注目領域の組が検出されても良いし、同一の物体を示す対応注目領域の組が検出されても良い。そして、基準注目領域の組および対応注目領域の組のうちの少なくとも一方の組において、面積の変化が所定量を超えているか否か認識され得る。所定量は、例えば、面積の絶対量であっても良いし、面積に所定の係数(例えば、0.5)が乗じられることで算出される量であっても良い。なお、ここでは、mは1以上であれば良く、20~50であっても良い。 For example, if the areas of the reference attention area and the corresponding attention area change beyond a predetermined amount, a change in the scene can be detected as a change in the scene. Here, if the area of the reference region of interest changes beyond a predetermined amount, it may be detected as a scene change, and if the area of the corresponding region of interest changes beyond a predetermined amount, it changes as a scene change. It may be detected. In this case, for example, a set of reference attention areas indicating the same object may be detected between the nth frame and the n + m frame by searching for corresponding points by template matching, the POC method, or the like. A set of corresponding attention areas indicating an object may be detected. Then, in at least one of the group of reference attention areas and the group of corresponding attention areas, it can be recognized whether or not the area change exceeds a predetermined amount. For example, the predetermined amount may be an absolute amount of the area, or may be an amount calculated by multiplying the area by a predetermined coefficient (for example, 0.5). Here, m may be 1 or more, and may be 20 to 50.
 図13および図14には、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間で、基準注目領域としての1つの物体領域O8Lが拡大している様子が示されている。この場合、例えば、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間における物体領域O8Lの面積の変化が所定量を超えていれば、場面の変化が生じたものとして場面の変化が検出され得る。 13 and 14, between the and the n left-eye image Gn L the (n + m) the left-eye image G (n + m) L, expanding one object region O8 L as reference region of interest The situation is shown. In this case, for example, if a change in the area of the object region O8 L between the and the n left-eye image Gn L the (n + m) the left-eye image G (n + m) L exceeds the predetermined amount, the scene A scene change can be detected as a change has occurred.
     <(1-3-2-1-5)第5変化検出方法>
 映像情報に含まれた複数のフレームにおける、焦点が合っている合焦領域の変化に応じて、場面の変化が検出され得る。合焦領域の認識は、例えば、第N左眼用画像GNLおよび第N右眼用画像GNRのうちの少なくとも一方が対象とされて実行され得る。例えば、ハフ変換等を用いたエッジの抽出処理によって合焦領域が認識されても良いし、画像が複数の領域に分割された上で領域毎に画素値の分布に係るフーリエ変換によって高周波成分の有無等が解析されることで合焦領域が認識されても良い。また、テンプレートマッチング等によって物体が捉えられた領域(物体領域)が検出された上で、物体領域毎に、画素値の分布に係るフーリエ変換によって高周波成分の有無等が解析されることで合焦領域であるか否かが認識されても良い。
<(1-3-2-1-5) Fifth Change Detection Method>
A change in the scene can be detected in accordance with a change in the in-focus area in the plurality of frames included in the video information. The recognition of the in-focus area can be executed by targeting at least one of the N-th left eye image GN L and the N-th right eye image GN R , for example. For example, an in-focus area may be recognized by edge extraction processing using Hough transform or the like, or an image is divided into a plurality of areas and a high-frequency component is calculated by Fourier transform related to the distribution of pixel values for each area. The in-focus area may be recognized by analyzing the presence or absence. In addition, after the region (object region) where the object is captured by template matching or the like is detected, the presence or absence of high-frequency components is analyzed for each object region by Fourier transform related to the distribution of pixel values. It may be recognized whether it is an area.
 そして、例えば、第nフレームと第n+mフレームとの間で、合焦領域が異なる物体が捉えられた領域に変化していれば、場面の変化が生じたものとして場面の変化が検出され得る。異なる物体が捉えられた領域か否かについては、例えば、テンプレートマッチングおよびPOC法等による対応点の探索によって判定され得る。なお、ここでは、mは1以上であれば良く、20~50であっても良い。 And, for example, if the focus area changes to an area where an object with a different focus is captured between the nth frame and the n + mth frame, the scene change can be detected as a scene change. Whether or not the region is a region where a different object is captured can be determined, for example, by searching for corresponding points by template matching, the POC method, or the like. Here, m may be 1 or more, and may be 20 to 50.
 図15および図16には、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間で、合焦領域が、物体領域O6Lから物体領域O7Lへと変化している様子が示されている。この場合、例えば、第nフレームと第n+mフレームとの間で、変化検出部463によって、合焦領域が異なる物体が捉えられた物体領域O6Lから物体領域O7Lへと変化していることが認識され、場面の変化が生じたものとして場面の変化が検出され得る。 15 and 16, the focusing area between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L is changed from the object area O6 L to the object area O7 L. It is shown that it is changing. In this case, for example, between the nth frame and the (n + m) th frame, the change detection unit 463 changes from the object region O6 L in which the object having a different focus region is captured to the object region O7 L. A scene change can be detected as recognized and as a scene change has occurred.
     <(1-3-2-1-6)第6変化検出方法>
 映像情報に含まれた複数のフレームで捉えられている人物の顔が認識され、該複数のフレームの間における人物の入れ替わりに応じて、場面の変化が検出され得る。人物の顔の認識は、例えば、第N左眼用画像GNLおよび第N右眼用画像GNRのうちの少なくとも一方が対象とされて実行され得る。具体的には、例えば、テンプレートマッチング等によって画像から人物の顔が捉えられた領域(顔領域とも言う)が認識され、次に、テンプレートマッチング等によって顔領域から部位(目、眉毛、鼻、口等)が認識され、それらの部位のエッジ部分を特定する複数の特徴点の位置関係によって人物の顔が認識され得る。
<(1-3-2-1-6) Sixth Change Detection Method>
A face of a person captured in a plurality of frames included in the video information is recognized, and a scene change can be detected in accordance with the change of the person between the plurality of frames. Recognition of a person's face can be performed, for example, by targeting at least one of the N-th left eye image GN L and the N-th right eye image GN R. Specifically, for example, a region (also referred to as a face region) in which a human face is captured from an image by template matching or the like is recognized, and then a region (eyes, eyebrows, nose, mouth) from the face region by template matching or the like. Etc.) and the face of a person can be recognized by the positional relationship of a plurality of feature points that specify the edge portions of those parts.
 例えば、第nフレームと第n+mフレームとの間で、捉えられている人物が入れ替わっていれば、場面の変化が生じたものとして場面の変化が検出され得る。人物が入れ替わっているか否かは、例えば、複数の特徴点の位置関係の変化に応じて判定され得る。なお、ここでは、mは1以上であれば良く、20~50であっても良い。 For example, if the captured person is switched between the nth frame and the (n + m) th frame, a scene change can be detected as a scene change has occurred. Whether or not a person is replaced can be determined according to, for example, a change in the positional relationship between a plurality of feature points. Here, m may be 1 or more, and may be 20 to 50.
 図17および図18には、第n左眼用画像GnLと第(n+m)左眼用画像G(n+m)Lとの間で、人物を捉えた領域が、物体領域O9Lから物体領域O10Lへと変化している様子が示されている。この場合、第nフレームと第(n+m)フレームとの間で、捉えられている人物が入れ替わっているため、場面の変化が生じたものとして場面の変化が検出され得る。 In FIGS. 17 and 18, an area where a person is captured between the n-th left eye image Gn L and the (n + m) left-eye image G (n + m) L is shown as an object area O9 L to an object area O10. The state of changing to L is shown. In this case, since the captured person is switched between the nth frame and the (n + m) th frame, the scene change can be detected as a scene change.
    <(1-3-2-2)音声情報に基づく検出>
 変化検出部463は、映像情報に含まれる音声の変化に応じて、場面の変化を検出し得る。これにより、音声の変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。例えば、n+m番目(mは自然数)のフレームに対応する音声が、mフレーム前のn番目のフレームに対応する音声と比較されることで、場面の変化が検出され得る。mは1であっても良いし、2以上の値であっても良い。
<(1-3-2-2) Detection based on audio information>
The change detection unit 463 can detect a change in scene according to a change in audio included in the video information. As a result, an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image according to a change in sound. For example, a change in the scene can be detected by comparing the sound corresponding to the n + m-th frame (m is a natural number) with the sound corresponding to the n-th frame before m frames. m may be 1 or 2 or more.
 変化検出部463における音声情報に基づく場面の変化の検出方法としては、例えば、次の第7~9変化検出方法のうちの1以上の変化検出方法が採用され得る。 As the scene change detection method based on the audio information in the change detection unit 463, for example, one or more change detection methods among the following seventh to ninth change detection methods can be adopted.
     <(1-3-2-2-1)第7変化検出方法>
 変化検出部463は、映像情報に含まれる音声情報から得られる音量の変化に応じて、場面の変化を検出し得る。例えば、n+m番目(mは自然数)のフレームに対応する音量が、mフレーム前のn番目のフレームに対応する音量と比較され、音量の変化量が所定量を超えると、場面の変化が生じたものとして場面の変化が検出され得る。ここで、mは1であっても良いし、2以上の値であっても良い。所定量は、例えば、音量に係る固定された絶対量であっても良いし、音量から所定ルールに従って算出される可変の量であっても良い。所定ルールとしては、例えば、n番目のフレームに対応する音量に所定の係数(例えば0.5)が乗じられる算出ルール等が採用され得る。このとき、音量が5割以上増減した時点で、場面の変化が生じたものとして場面の変化が検出され得る。
<(1-3-2-2-1) Seventh Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in volume obtained from audio information included in the video information. For example, the volume corresponding to the (n + m) th frame (where m is a natural number) is compared with the volume corresponding to the nth frame before m frames, and when the amount of change in volume exceeds a predetermined amount, a scene change has occurred. Scene changes can be detected as if. Here, m may be 1 or 2 or more. The predetermined amount may be, for example, a fixed absolute amount related to the volume, or may be a variable amount calculated from the volume according to a predetermined rule. As the predetermined rule, for example, a calculation rule for multiplying a volume corresponding to the nth frame by a predetermined coefficient (for example, 0.5) or the like may be employed. At this time, when the volume increases or decreases by 50% or more, the scene change can be detected as a scene change.
     <(1-3-2-2-2)第8変化検出方法>
 変化検出部463は、映像情報に含まれる音声情報に基づいて、音声の周波数成分の変化に応じて、場面の変化を検出し得る。
<(1-3-2-2-2) Eighth Change Detection Method>
The change detection unit 463 can detect a change in the scene according to the change in the frequency component of the audio based on the audio information included in the video information.
 例えば、n+m番目(mは自然数)のフレームに対応する音声の周波数成分が、mフレーム前のn番目のフレームに対応する音声の周波数成分と比較され、周波数成分に係る評価値が、所定の閾値を超えて変化していれば、場面の変化が生じたものとして場面の変化が検出され得る。ここで、mは1であっても良いし、2以上の値であっても良い。周波数成分に係る評価値としては、音声に係る各周波数帯域についての音声の周波数成分の強度の総和等が採用され得る。音声に係る各周波数帯域としては、例えば、低周波帯域、高周波帯域、およびその間の帯域等が採用され得る。このような周波数成分に係る評価値の変化は、例えば、BGMの変化等に対応し得る。 For example, the frequency component of the sound corresponding to the (n + m) th (m is a natural number) frame is compared with the frequency component of the sound corresponding to the nth frame before m frames, and the evaluation value related to the frequency component is a predetermined threshold value. If it has changed beyond the range, it can be detected that the scene has changed. Here, m may be 1 or 2 or more. As the evaluation value related to the frequency component, the sum of the intensities of the frequency components of the audio for each frequency band related to the audio can be adopted. As each frequency band related to sound, for example, a low frequency band, a high frequency band, and a band therebetween can be adopted. Such a change in the evaluation value relating to the frequency component can correspond to, for example, a change in BGM.
     <(1-3-2-2-3)第9変化検出方法>
 変化検出部463は、映像情報に含まれる音声情報に基づいて、音声から認識され得る声紋の変化に応じて、場面の変化を検出し得る。
<(1-3-2-2-3) Ninth Change Detection Method>
The change detection unit 463 can detect a change in a scene according to a change in a voiceprint that can be recognized from sound, based on sound information included in the video information.
 例えば、n+m番目(mは自然数)のフレームに対応する音声の解析によって声紋が認識され、mフレーム前のn番目のフレームに対応する音声の解析によって声紋が認識される。そして、n+m番目のフレームに係る声紋とn番目のフレームに係る声紋とが比較され、発声している人物が異なっていれば、場面の変化が生じたものとして場面の変化が検出され得る。 For example, a voiceprint is recognized by analyzing voice corresponding to the (n + m) th (m is a natural number) frame, and a voiceprint is recognized by analyzing voice corresponding to the nth frame before m frames. Then, the voiceprint related to the n + mth frame and the voiceprint related to the nth frame are compared, and if the person speaking is different, the change in the scene can be detected as a change in the scene.
    <(1-3-2-3)メタデータに基づく検出>
 変化検出部463は、映像情報に含まれるメタデータの変化に応じて、場面の変化を検出し得る。これにより、メタデータの変化に応じて、場面に適した基準となる視差を有する画像が3D動画に合成され得る。例えば、n+m番目(mは自然数)のフレームに係るメタデータが、mフレーム前のn番目のフレームに係るメタデータと比較されることで、場面の変化が検出され得る。mは1であっても良いし、2以上の値であっても良い。
<Detection based on (1-3-2-2-3) metadata>
The change detection unit 463 can detect a change in scene according to a change in metadata included in the video information. Thereby, according to the change of metadata, an image having a parallax that is a reference suitable for a scene can be combined with the 3D moving image. For example, a change in a scene can be detected by comparing metadata related to an (n + m) th frame (m is a natural number) with metadata related to an nth frame before m frames. m may be 1 or 2 or more.
 変化検出部463におけるメタデータに基づく場面の変化の検出方法としては、例えば、次の第10~12変化検出方法のうちの1以上の変化検出方法が採用され得る。 As a detection method of a scene change based on metadata in the change detection unit 463, for example, one or more change detection methods among the following tenth to twelfth change detection methods can be adopted.
     <(1-3-2-3-1)第10変化検出方法>
 変化検出部463は、映像情報に含まれるメタデータによって特定される字幕情報の変化に応じて、場面の変化を検出し得る。
<(1-3-2-2-1) Tenth Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in subtitle information specified by metadata included in the video information.
 例えば、n+m番目(mは自然数)のフレームに対応する字幕情報と、mフレーム前のn番目のフレームに対応する字幕情報とが比較され、字幕情報に所定の変化が生じていれば、場面の変化が生じたものとして場面の変化が検出され得る。所定の変化の態様には、字幕情報の或る特徴が異なる特徴に変化する態様、および字幕情報がない状態から字幕情報がある状態に変化する態様等が含まれ得る。 For example, caption information corresponding to the (n + m) th (m is a natural number) frame is compared with caption information corresponding to the nth frame before m frames, and if a predetermined change occurs in the caption information, A scene change can be detected as a change has occurred. The predetermined change mode may include a mode in which a certain feature of subtitle information changes to a different feature, a mode in which subtitle information changes from a state without subtitle information, and the like.
 具体的には、字幕情報から、時刻、場所、時代、および話題等に関する場面の変化につながる特徴的な用語が検出され、特徴的な用語における変化ならびに特徴的な用語の出現および消失が生じていれば、場面の変化が生じたものとして場面の変化が検出され得る。 Specifically, characteristic terms that lead to changes in scenes related to time, place, era, topic, etc. are detected from subtitle information, and changes in characteristic terms and appearance and disappearance of characteristic terms occur. Then, the scene change can be detected as a scene change has occurred.
     <(1-3-2-3-2)第11変化検出方法>
 変化検出部463は、映像情報に含まれるメタデータによって特定されるチャプター情報の変化に応じて、場面の変化を検出し得る。
<(1-3-2-2-3) Eleventh Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in chapter information specified by metadata included in video information.
 例えば、n+m番目(mは自然数)のフレームに対応するチャプター情報と、mフレーム前のn番目のフレームに対応するチャプター情報とが比較され、チャプター情報が変化していれば、場面の変化が生じたものとして場面の変化が検出され得る。 For example, the chapter information corresponding to the (n + m) th frame (where m is a natural number) is compared with the chapter information corresponding to the nth frame before m frames, and if the chapter information has changed, the scene changes. Changes in the scene can be detected as objects.
     <(1-3-2-3-3)第12変化検出方法>
 変化検出部463は、映像情報に含まれるメタデータによって特定される撮影条件の変化に応じて、場面の変化を検出し得る。
<(1-3-2-2-3-3) Twelfth Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in shooting conditions specified by the metadata included in the video information.
 例えば、n+m番目(mは自然数)のフレームの撮影条件と、mフレーム前のn番目のフレームの撮影条件とが比較され、撮影条件が変化すれば、場面の変化が生じたものとして場面の変化が検出され得る。撮影条件としては、例えば、焦点距離、絞りの大きさ、および撮影倍率等といったパラメータ(撮影パラメータとも言う)が採用され得る。 For example, the shooting condition of the (n + m) th frame (m is a natural number) is compared with the shooting condition of the nth frame before m frames, and if the shooting condition changes, the scene changes as if the scene has changed. Can be detected. As the imaging conditions, for example, parameters (also referred to as imaging parameters) such as focal length, aperture size, and imaging magnification can be adopted.
 より具体的には、例えば、所定割合(例えば50%)を越えるような撮影パラメータの増減が生じれば、場面の変化が生じたものとして場面の変化が検出され得る。 More specifically, for example, if an increase / decrease in shooting parameters exceeding a predetermined ratio (for example, 50%) occurs, a change in the scene can be detected as a change in the scene.
   <(1-3-3)ズレ量の決定方法>
 ズレ量決定部464では、映像情報のうちの変化検出部463によって場面の第1変化が検出されてから次の場面の第2変化が検出されるまでの一場面について、1以上の第Nフレームの第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量とも言う)に基づいて、基準ズレ量が決定され得る。このとき、一場面の全フレームに共通する1つの基準ズレ量が決定され得る。
<(1-3-3) Deviation amount determination method>
In the shift amount determination unit 464, one or more Nth frames for one scene from when the first change of the scene is detected by the change detection unit 463 of the video information until the second change of the next scene is detected. The reference shift amount is determined based on the shift amount (also referred to as pixel shift amount) of the pixel position indicating the same portion of the object between the Nth left eye image GN L and the Nth right eye image GN R. Can be done. At this time, one reference shift amount common to all frames in one scene can be determined.
 これにより、一場面の3D動画に対して基準となる視差を有する共通の画像が付加される。その結果、画像の過度な変化が抑制され、ユーザーの眼の負担が軽減され得るとともに、演算量の低減も図られ得る。 Thereby, a common image having a reference parallax is added to the 3D moving image of one scene. As a result, an excessive change in the image can be suppressed, the burden on the user's eyes can be reduced, and the amount of calculation can be reduced.
 なお、第N左眼用画像GNLと第N右眼用画像GNRとの間における画素ズレ量は、例えば、POC法等を用いた対応点の探索によって、第N左眼用画像GNLと第N右眼用画像GNRとの間において物体の同一部分が捉えられた基準画素と対応画素との組合せが検出されることで求められ得る。 The pixel shift amount between the first N left-eye image GN L and the N right-eye image GN R is, for example, by the search for corresponding points with POC method, the N left-eye image GN L And the Nth right-eye image GN R can be obtained by detecting a combination of a reference pixel and a corresponding pixel in which the same part of the object is captured.
 ズレ量決定部464における基準ズレ量の決定方法としては、例えば、次の第1~4決定方法のうちの1以上の決定方法が採用され得る。 As the determination method of the reference deviation amount in the deviation amount determination unit 464, for example, one or more of the following first to fourth determination methods can be adopted.
    <(1-3-3-1)第1決定方法>
 映像情報のうちの一場面のフレーム群に含まれる第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布に基づいて、基準ズレ量が決定される。一場面のフレーム群は、一場面に含まれる全てのフレームであっても良いし、一場面に含まれる全てのフレームのうちの一部のフレームであっても良い。一部のフレームは、時間的に連続するフレームであっても良いし、離散的にサンプリングされたフレームであっても良い。これにより、3D動画の一場面においてユーザーが受ける違和感が低減され得る。
<(1-3-3-1) First Determination Method>
A displacement amount (pixel displacement amount) of a pixel position indicating the same portion of the object between the N-th left eye image GN L and the N-th right eye image GN R included in the frame group of one scene of the video information. ) Based on the distribution is determined. The frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames. Thereby, the uncomfortable feeling which a user receives in one scene of 3D animation can be reduced.
 また、映像情報のうちの一場面のフレーム群に含まれる基準注目領域と対応注目領域との間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布に基づいて、基準ズレ量が決定されても良い。 In addition, based on the distribution of pixel position shift amounts (pixel shift amounts) indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of one scene of the video information, the reference The amount of deviation may be determined.
 例えば、画素ズレ量の分布から得られる画素ズレ量の代表値に基づいて、基準ズレ量が決定され得る。画素ズレ量の代表値としては、画素ズレ量に係る平均値、最大値、最小値、最頻値、および中央値のうちの少なくとも1つの値が採用され得る。この場合、画素ズレ量の代表値が、そのまま基準ズレ量として採用されても良いし、画素ズレ量の代表値から所定ルールの演算によってずらされた値が基準ズレ量として採用されても良い。所定ルールの演算としては、例えば、画素ズレ量の代表値に所定の係数(例えば0.8等)が乗じられる演算等が採用され得る。 For example, the reference shift amount can be determined based on the representative value of the pixel shift amount obtained from the distribution of the pixel shift amount. As a representative value of the pixel shift amount, at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted. In this case, the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount. As the calculation of the predetermined rule, for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.
 ここで、図19で示されるように、第(n-b)フレーム(bは自然数)から第(n-a)フレーム(aは自然数)を含んで第nフレームまでのフレーム群が第1場面に相当し、第(n+a)フレームから第(n+b)フレームを含んで第(n+c)フレームまでのフレーム群が次の第2場面に相当する場合を想定する。 Here, as shown in FIG. 19, the frame group from the (n−b) th frame (b is a natural number) to the (n−a) th frame (a is a natural number) to the nth frame is the first scene. Assume that the frame group from the (n + a) th frame to the (n + c) th frame including the (n + a) th frame corresponds to the next second scene.
 図19には、図示の複雑化を防ぐ観点から、第N左眼用画像GNLが図示され、第N右眼用画像GNRの図示が省略されている。そして、第1場面に含まれる第(n-b)左眼用画像G(n-b)Lから第(n-a)左眼用画像G(n-a)Lを含んで第n左眼用画像GnLまでの第1フレーム群GG1Lが示されている。また、第2場面に含まれる第(n+a)左眼用画像G(n+a)Lから第(n+b)左眼用画像G(n+b)Lを含んで第(n+c)左眼用画像G(n+c)Lまでの第2フレーム群GG2Lが示されている。 In FIG. 19, the N-th left-eye image GN L is illustrated and the N-th right-eye image GN R is not illustrated from the viewpoint of preventing the illustrated complexity. The (n−b) th left-eye image G (n−b) L to the (n−a) th left-eye image G (na) L included in the first scene are included in the n-th left eye. the first frame group GG1 L to use image Gn L is shown. Also, the (n + c) left eye image G (n + c) including the (n + b) left eye image G (n + b) L to the (n + a) left eye image G (n + a) L included in the second scene. the second frame group GG2 L to L are shown.
 図20には、時間経過に対する画素ズレ量の変化が曲線で示されている。ここで、時刻T0~T1が第1場面であり、時刻T1~T2が第2場面であれば、第1場面における画素ズレ量の代表値RV1が算出され、第2場面における画素ズレ量の代表値RV2が算出され得る。 FIG. 20 shows the change in the amount of pixel shift with time as a curve. If the times T0 to T1 are the first scene and the times T1 to T2 are the second scene, the representative value RV1 of the pixel shift amount in the first scene is calculated, and the representative of the pixel shift amount in the second scene is calculated. The value RV2 can be calculated.
 ここで、仮に、画素ズレ量の代表値として、画素ズレ量に係る平均値または中央値が採用されると、基準ズレ量に応じた視差を有する画像(基準視差画像)が3D動画に付加されることで、基準視差画像に係る視差が基準とされて、3D動画の前後への拡がりが認識され易くなる。これにより、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 Here, if an average value or median value regarding the pixel shift amount is adopted as a representative value of the pixel shift amount, an image having a parallax corresponding to the reference shift amount (reference parallax image) is added to the 3D moving image. Thus, the parallax related to the reference parallax image is used as a reference, and the spread before and after the 3D moving image is easily recognized. As a result, the sense of depth that can be obtained by the user watching the 3D video can be improved.
    <(1-3-3-2)第2決定方法>
 映像情報のうちの一場面のフレーム群に含まれる最初のフレームの第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)に基づいて、基準ズレ量が決定される。一場面のフレーム群は、例えば、一場面に含まれる全てのフレームであれば良い。これにより、基準となる視差を有する共通の画像を一場面の3D動画に付加する処理がリアルタイムで行われ得る。
<(1-3-3-2) Second determination method>
A shift amount of a pixel position indicating the same part of the object between the N-th left-eye image GN L and the N-th right-eye image GN R of the first frame included in the frame group of one scene in the video information. Based on (pixel shift amount), the reference shift amount is determined. The frame group of one scene may be all frames included in one scene, for example. Thereby, the process which adds the common image which has the parallax used as a reference | standard to the 3D moving image of one scene may be performed in real time.
 なお、映像情報のうちの一場面のフレーム群に含まれる最初のフレームの基準注目領域と対応注目領域との間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)に基づいて、基準ズレ量が決定されても良い。 In addition, based on the shift amount (pixel shift amount) of the pixel position indicating the same part of the object between the reference attention region and the corresponding attention region of the first frame included in the frame group of one scene in the video information. The reference deviation amount may be determined.
 例えば、画素ズレ量の代表値に基づいて、基準ズレ量が決定され得る。画素ズレ量の代表値としては、画素ズレ量に係る平均値、最大値、最小値、最頻値、および中央値のうちの少なくとも1つの値が採用され得る。この場合、画素ズレ量の代表値が、そのまま基準ズレ量として採用されても良いし、画素ズレ量の代表値から所定ルールの演算によってずらされた値が基準ズレ量として採用されても良い。所定ルールの演算としては、例えば、画素ズレ量の代表値に所定の係数(例えば0.8等)が乗じられる演算等が採用され得る。 For example, the reference shift amount can be determined based on the representative value of the pixel shift amount. As a representative value of the pixel shift amount, at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted. In this case, the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount. As the calculation of the predetermined rule, for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.
    <(1-3-3-3)第3決定方法>
 映像情報のうちの一場面のフレーム群に含まれる第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量の分布に基づいて、仮想的な基準面(仮想基準面とも言う)から物体の表面までの仮想的な距離(仮想距離とも言う)の代表値が算出される。そして、仮想距離の代表値が所定値となる場合における仮想基準面に対応するズレ量が、基準ズレ量として決定される。これにより、場面が変化する際にユーザーが受ける違和感が低減され得る。
<(1-3-3-3) Third Determination Method>
Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the N-th left-eye image GN L and the N-th right-eye image GN R included in the frame group of one scene in the video information. Thus, a representative value of a virtual distance (also referred to as virtual distance) from the virtual reference surface (also referred to as virtual reference plane) to the surface of the object is calculated. Then, a deviation amount corresponding to the virtual reference plane when the representative value of the virtual distance is a predetermined value is determined as the reference deviation amount. Thereby, the uncomfortable feeling that the user receives when the scene changes can be reduced.
 ここで、一場面のフレーム群は、一場面に含まれる全てのフレームであっても良いし、一場面に含まれる全てのフレームのうちの一部のフレームであっても良い。一部のフレームは、時間的に連続するフレームであっても良いし、離散的にサンプリングされたフレームであっても良い。仮想基準面は、例えば、画面に対して平行な面であれば良く、画面に正対しているユーザーの視線に対して直交する面であれば良い。仮想距離は、例えば、3D動画において立体的に浮き出て見える物体の表面と、仮想基準面とが離隔している距離であれば良い。仮想距離の代表値としては、仮想距離の平均値、最大値、最小値、最頻値、および中央値のうちの少なくとも1つの値が採用され得る。所定値は、ユーザーによる操作部41の操作に応じて任意の値に設定されても良いし、予め準備された固定値に設定されても良い。 Here, the frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames. The virtual reference plane may be a plane parallel to the screen, for example, and may be a plane orthogonal to the user's line of sight facing the screen. The virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the virtual reference plane. As a representative value of the virtual distance, at least one value of an average value, a maximum value, a minimum value, a mode value, and a median value of the virtual distance can be adopted. The predetermined value may be set to an arbitrary value according to the operation of the operation unit 41 by the user, or may be set to a fixed value prepared in advance.
 例えば、仮想距離の代表値が仮想距離の最大値とされて、所定値が0とされれば、仮想距離の最大値に合わせて基準ズレ量が設定される。仮想距離の代表値が仮想距離の最小値とされて、所定値が0とされれば、仮想距離の最小値に合わせて基準ズレ量が設定され得る。 For example, if the representative value of the virtual distance is the maximum value of the virtual distance and the predetermined value is 0, the reference deviation amount is set according to the maximum value of the virtual distance. If the representative value of the virtual distance is the minimum value of the virtual distance and the predetermined value is 0, the reference deviation amount can be set in accordance with the minimum value of the virtual distance.
 図21には、第nフレームと第(n+m)フレームとの間で、場面の変化が生じている様子が示されている。図22には、所定の仮想基準面が設定された場合における、時間経過に対する各フレームの仮想距離の平均値の変化が曲線で示されている。ここで、時刻T0~T1が、第nフレームが属する第1場面であり、時刻T1~T2が、第(n+m)フレームが属する第2場面である。 FIG. 21 shows a scene change occurring between the nth frame and the (n + m) th frame. In FIG. 22, the change in the average value of the virtual distance of each frame over time when a predetermined virtual reference plane is set is indicated by a curve. Here, the times T0 to T1 are the first scene to which the nth frame belongs, and the times T1 to T2 are the second scene to which the (n + m) th frame belongs.
 この場合、例えば、第1場面における仮想距離の代表値として平均値Mr1が算出され、第2場面における仮想距離の代表値として平均値Mr2が算出され得る。そして、第1場面における仮想距離の代表値と第2場面における仮想距離の代表値とが所定値となるように、第1場面に対して所定の仮想基準面から距離MR1飛び出した第1仮想基準面が設定され、第2場面に対して所定の仮想基準面から距離MR2飛び出した第2仮想基準面が設定され得る。ここでは、(Mr1-MR1)=(Mr2-MR2)の関係が成立する。そして、第1仮想基準面に対応するズレ量が、第1場面に対する基準ズレ量として決定され、第2仮想基準面に対応するズレ量が、第2場面に対する基準ズレ量として決定される。 In this case, for example, the average value Mr1 can be calculated as the representative value of the virtual distance in the first scene, and the average value Mr2 can be calculated as the representative value of the virtual distance in the second scene. Then, the first virtual reference that protrudes the distance MR1 from the predetermined virtual reference plane with respect to the first scene so that the representative value of the virtual distance in the first scene and the representative value of the virtual distance in the second scene become a predetermined value. A plane is set, and a second virtual reference plane that protrudes from the predetermined virtual reference plane by a distance MR2 can be set for the second scene. Here, the relationship of (Mr1-MR1) = (Mr2-MR2) is established. Then, the shift amount corresponding to the first virtual reference plane is determined as the reference shift amount for the first scene, and the shift amount corresponding to the second virtual reference plane is determined as the reference shift amount for the second scene.
    <(1-3-3-4)第4決定方法>
 一場面に係る基準ズレ量が、一場面の前の場面(前場面)の画像に応じて決定される。詳細には、映像情報のうちの或る一場面(第2場面とも言う)における1以上のフレームの第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(第1画素ズレ量とも言う)の分布が得られる。また、第2場面の前の他場面(第1場面とも言う)における1以上のフレームの第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(第2画素ズレ量とも言う)の分布が得られる。そして、第1画素ズレ量の分布と第2画素ズレ量の分布とに基づいて、第2場面における基準ズレ量が決定される。
<(1-3-3-4) Fourth Determination Method>
A reference shift amount related to one scene is determined according to an image of a scene before the one scene (previous scene). Specifically, the object is identical between the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames in a certain scene (also referred to as a second scene) of the video information. A distribution of the shift amount (also referred to as a first pixel shift amount) of the pixel position indicating the portion is obtained. Further, the same part of the object is shown between the N-th left-eye image GN L and the N-th right-eye image GN R of one or more frames in another scene (also referred to as the first scene) before the second scene. A distribution of pixel position shift amounts (also referred to as second pixel shift amounts) is obtained. Then, a reference shift amount in the second scene is determined based on the distribution of the first pixel shift amount and the distribution of the second pixel shift amount.
 ここで、第2場面における1以上のフレームは、1つのフレーム、連続した2以上のフレーム、サンプリングされた2以上のフレーム、および第2場面における全フレームのうちの何れであっても良い。また、第1場面における1以上のフレームは、1つのフレーム、連続した2以上のフレーム、サンプリングされた2以上のフレーム、および第1場面における全フレームのうちの何れであっても良い。 Here, the one or more frames in the second scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the second scene. Further, the one or more frames in the first scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the first scene.
 第4決定方法では、例えば、第1画素ズレ量の分布に係る代表値(第1ズレ代表値とも言う)と、第2画素ズレ量の分布に係る代表値(第2ズレ代表値とも言う)とに基づいて、基準ズレ量が決定され得る。なお、第4決定方法では、第1ズレ代表値としては、第1画素ズレ量の分布に係る平均値、最大値、最小値、最頻値、および中央値のうちの少なくとも1つの値が採用され得る。第2ズレ代表値としては、第2画素ズレ量の分布に係る平均値、最大値、最小値、最頻値、および中央値のうちの少なくとも1つの値が採用され得る。 In the fourth determination method, for example, a representative value related to the distribution of the first pixel shift amount (also referred to as a first shift representative value) and a representative value related to the distribution of the second pixel shift amount (also referred to as a second shift representative value). Based on the above, the reference deviation amount can be determined. In the fourth determination method, as the first deviation representative value, at least one of the average value, the maximum value, the minimum value, the mode value, and the median value related to the distribution of the first pixel deviation amount is adopted. Can be done. As the second deviation representative value, at least one of an average value, a maximum value, a minimum value, a mode value, and a median value related to the distribution of the second pixel deviation amount can be adopted.
 具体的には、例えば、第1ズレ代表値と第2ズレ代表値とが用いられた所定の演算によって第2場面の基準ズレ量が決定され得る。所定の演算としては、例えば、第1ズレ代表値と第2ズレ代表値との平均値が算出される演算が採用され得る。このとき、この平均値が第2場面の基準ズレ量として決定され得る。 Specifically, for example, the reference deviation amount of the second scene can be determined by a predetermined calculation using the first deviation representative value and the second deviation representative value. As the predetermined calculation, for example, an calculation in which an average value of the first deviation representative value and the second deviation representative value is calculated may be employed. At this time, this average value can be determined as the reference deviation amount of the second scene.
 図23は、基準ズレ量の第4決定方法を説明するための図である。図23には、第1場面(時刻T0~T1)に係る第1ズレ代表値MP1と、第1場面の次の第2場面(時刻T1~T2)に係る第2ズレ代表値MP2と、第2場面の次の第3場面(時刻T2~T3)に係る第3ズレ代表値MP3とが示されている。この場合、例えば、第1ズレ代表値MP1が第1場面の基準ズレ量RP1として設定され得る。また、第1ズレ代表値MP1と第2ズレ代表値MP2との平均値が第2場面の基準ズレ量RP2として設定され得る。更に、第2ズレ代表値MP2と第3ズレ代表値MP3との平均値が第3場面の基準ズレ量RP3として設定され得る。 FIG. 23 is a diagram for explaining a fourth method for determining the reference deviation amount. FIG. 23 shows the first deviation representative value MP1 related to the first scene (time T0 to T1), the second deviation representative value MP2 related to the second scene (time T1 to T2) next to the first scene, A third deviation representative value MP3 relating to the third scene (time T2 to T3) next to the two scenes is shown. In this case, for example, the first deviation representative value MP1 can be set as the reference deviation amount RP1 of the first scene. Further, the average value of the first deviation representative value MP1 and the second deviation representative value MP2 can be set as the reference deviation amount RP2 of the second scene. Further, an average value of the second deviation representative value MP2 and the third deviation representative value MP3 can be set as the reference deviation amount RP3 of the third scene.
 また、次の方法によって基準ズレ量が決定されても良い。まず、映像情報のうちの第1場面のフレーム群に含まれた第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布が得られる。この画素ズレ量の分布に基づいて、仮想的に設定された基準面(第1仮想基準面とも言う)から物体の表面までの仮想距離の第1代表値が算出される。一方、映像情報のうちの第1場面の次の第2場面のフレーム群に含まれた第N左眼用画像GNLと第N右眼用画像GNRとの間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布が得られる。この画素ズレ量の分布に基づいて、仮想的に設定された基準面(第2仮想基準面とも言う)から物体の表面までの仮想距離の第2代表値が算出される。そして、第2仮想基準面の変更と第2代表値の算出とが1回以上行われた結果、第1代表値と第2代表値との差が所定の許容範囲内となる場合の第2仮想基準面に対応するズレ量が、基準ズレ量として決定される。 Further, the reference deviation amount may be determined by the following method. First, a shift amount of a pixel position indicating the same part of an object between the Nth left eye image GN L and the Nth right eye image GN R included in the frame group of the first scene in the video information. A distribution of (pixel shift amount) is obtained. Based on the distribution of pixel shift amounts, a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated. On the other hand, the same part of the object is shown between the Nth left eye image GN L and the Nth right eye image GN R included in the frame group of the second scene next to the first scene in the video information. A distribution of pixel position shift amounts (pixel shift amounts) is obtained. Based on the distribution of the pixel shift amount, a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range. A deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.
 ここで、第1および第2仮想基準面は、例えば、画面に対して平行な面であれば良く、画面に正対しているユーザーの視線に対して直交する面であれば良い。仮想距離は、例えば、3D動画において立体的に浮き出て見える物体の表面と、第1または第2仮想基準面とが離隔している距離であれば良い。 Here, the first and second virtual reference planes may be planes parallel to the screen, for example, and may be planes orthogonal to the user's line of sight facing the screen. The virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the first or second virtual reference plane.
 更に、注目領域に係る仮想距離に応じて、基準ズレ量が決定されても良い。この場合、例えば、まず、映像情報のうちの第1場面のフレーム群に含まれた基準注目領域と対応注目領域との間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布が得られる。この画素ズレ量の分布に基づいて、仮想的に設定された基準面(第1仮想基準面とも言う)から物体の表面までの仮想距離の第1代表値が算出される。一方、映像情報のうちの第1場面の次の第2場面のフレーム群に含まれた基準注目領域と対応注目領域との間における物体の同一部分を示す画素の位置のズレ量(画素ズレ量)の分布が得られる。この画素ズレ量の分布に基づいて、仮想的に設定された基準面(第2仮想基準面とも言う)から物体の表面までの仮想距離の第2代表値が算出される。そして、第2仮想基準面の変更と第2代表値の算出とが1回以上行われた結果、第1代表値と第2代表値との差が所定の許容範囲内となる場合の第2仮想基準面に対応するズレ量が、基準ズレ量として決定される。 Furthermore, the reference deviation amount may be determined according to the virtual distance related to the attention area. In this case, for example, first, a displacement amount (pixel displacement amount) of a pixel position indicating the same part of the object between the reference attention region and the corresponding attention region included in the frame group of the first scene in the video information. Is obtained. Based on the distribution of pixel shift amounts, a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated. On the other hand, the amount of pixel position shift (pixel shift amount) indicating the same part of the object between the reference target region and the corresponding target region included in the frame group of the second scene next to the first scene in the video information. ) Distribution is obtained. Based on the distribution of the pixel shift amount, a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range. A deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.
   <(1-3-4)領域画像情報の取得方法>
 領域画像取得部465における第N基準領域画像INLと第N対応領域画像INRと係る第N領域画像情報の取得は、例えば、次の第1工程および第2工程が順に行われることで実現され得る。
<(1-3-4) Region Image Information Acquisition Method>
The acquisition of the N-th region image information regarding the N-th reference region image IN L and the N-th corresponding region image IN R in the region image acquisition unit 465 is realized, for example, by sequentially performing the following first step and second step. Can be done.
    <(1-3-4-1)第1工程>
 記憶部44等に予め格納された画像パターンが読み出される。この画像パターンとしては、例えば、比較的大きなドットがランダムに配列された特定の模様を示す画像パターン、デジタル放送の情報表示欄(例えば、データ欄や時刻欄等)を含む画像パターン、および機器の操作ボタン等を含む画像パターン等が採用され得る。
<(1-3-4-1) First step>
An image pattern stored in advance in the storage unit 44 or the like is read out. As this image pattern, for example, an image pattern showing a specific pattern in which relatively large dots are randomly arranged, an image pattern including an information display column (for example, a data column or a time column) of digital broadcasting, and a device An image pattern including operation buttons or the like can be employed.
    <(1-3-4-2)第2工程>
 第1工程で読み出された画像パターンがベースとなる一方の画像(例えば、第N基準領域画像INL)とされ、ズレ量決定部464で決定された各場面に係る基準ズレ量に応じて一方の画像の各画素の位置が一方向にずらされることで他方の画像(例えば、第N対応領域画像INR)が生成される。これにより、各場面に対して第N基準領域画像INLと第N対応領域画像INRとが取得される。つまり、一場面に対する第N基準領域画像INLおよび第N対応領域画像INRは一定のものとなる。
<(1-3-4-2) Second step>
The image pattern read out in the first step is used as a base image (for example, the Nth reference area image IN L ), and is determined according to the reference shift amount related to each scene determined by the shift amount determination unit 464. By shifting the position of each pixel in one image in one direction, the other image (for example, the Nth corresponding region image IN R ) is generated. As a result, the Nth reference area image IN L and the Nth corresponding area image IN R are acquired for each scene. That is, the Nth reference area image IN L and the Nth corresponding area image IN R for a scene are constant.
 なお、複数のズレ量に対応する画像パターンの組が予め記憶部44等に記憶され、ズレ量決定部464で決定された基準ズレ量に対応する画像パターンの組が記憶部44等から読み出されることで、第N基準領域画像INLと第N対応領域画像INRとが取得されても良い。すなわち、第N基準領域画像INLと第N対応領域画像INRとが、所定の模様等の表示対象物が一方向に基準ズレ量ずれている関係を有していれば良い。 Note that a set of image patterns corresponding to a plurality of deviation amounts is stored in advance in the storage unit 44 or the like, and a set of image patterns corresponding to the reference deviation amount determined by the deviation amount determination unit 464 is read out from the storage unit 44 or the like. Thus, the Nth reference area image IN L and the Nth corresponding area image IN R may be acquired. That is, it is only necessary that the Nth reference area image IN L and the Nth corresponding area image IN R have a relationship in which a display object such as a predetermined pattern is shifted in the reference displacement amount in one direction.
 図24および図25には、第N基準領域画像INLのバリエーションが例示されている。なお、図24および図25では、図1の第n左眼用画像GNLに相当する第N左眼用画像領域TANLの周囲の第N基準被合成位置PNLに合わせて、第N基準領域画像INLが配置されている第N左眼用合成画像GSNLの態様で例示されている。図24には、デジタル放送のデータ欄Pa1と時刻欄Ca1とを含む情報表示欄を有する画像パターンが模式的に示されている。図25には、操作ボタン群Ba1と時刻欄Ta1とを含む画像パターンが模式的に示されている。 24 and 25 illustrate variations of the Nth reference area image IN L. In FIGS. 24 and 25, the Nth reference position is matched with the Nth reference composite position PN L around the Nth left eye image area TAN L corresponding to the nth left eye image GN L in FIG. This is illustrated in the aspect of the composite image GSN L for the Nth left eye in which the region image IN L is arranged. FIG. 24 schematically shows an image pattern having an information display column including a digital broadcast data column Pa1 and a time column Ca1. FIG. 25 schematically shows an image pattern including the operation button group Ba1 and the time column Ta1.
   <(1-3-5)被合成位置の指定方法>
    <(1-3-5-1)基準被合成位置および対応被合成位置が固定である場合>
 位置指定部468では、例えば、予め決められた所定の第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されても良いし、ユーザーによる操作部41の操作に応じて第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されても良い。
<(1-3-5) Method of specifying the composition position>
<When (1-3-5-1) Reference Combining Position and Corresponding Combining Position are Fixed>
The position specifying unit 468, for example, the N reference-be-combined position of a predetermined previously determined PN L and the N corresponding object synthesizing position PN R may be specified, first in accordance with the operation of the operation unit 41 by the user N reference the combined position PN L and the N corresponding object synthesizing position PN R may be specified.
 この場合、第N基準被合成位置PNLは、例えば、第N左眼用画像GNLの周囲の領域と、第N左眼用画像GNLとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。また、第N対応被合成位置PNRは、例えば、第N右眼用画像GNRの周囲の領域と、第N右眼用画像GNRとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。 In this case, the N reference the combined position PN L, for example, and the surrounding region of the N left-eye image GN L, corresponding to the peripheral region in another image and the N left-eye image GN L It can be specified in one or more of a plurality of regions including the region. Further, the first N corresponds the combined position PN R, for example, and the surrounding region of the N right-eye image GN R, region and the N right-eye image GN R corresponding to the peripheral region in another image Can be specified in one or more of a plurality of regions including.
 具体的には、図2、図24および図25で示されるように、第N左眼用画像GNLに相当する第N左眼用画像領域TANLの周囲を囲む領域において第N基準被合成位置PNLが指定され得る。また、第N左眼用画像GNLに相当する第N左眼用画像領域TANLの周囲を囲む領域のうちの特定の領域が含まれるように第N基準被合成位置PNLが指定されても良い。例えば、図26で示されるように、第N左眼用画像領域TANLを挟む左右の領域を特定する第N基準被合成位置PNLが指定されても良い。また、図27で示されるように、第N左眼用画像領域TANLの下側に位置する領域を特定する第N基準被合成位置PNLが指定されても良い。更に、図28で示されるように、第N左眼用画像領域TANLの周囲を囲む幅が一様でない領域を特定する第N基準被合成位置PNLが指定されても良い。 Specifically, as shown in FIGS. 2, 24, and 25, the Nth reference combined image is generated in a region surrounding the Nth left eye image region TAN L corresponding to the Nth left eye image GN L. position PN L can be specified. In addition, the Nth reference synthesized position PN L is specified so as to include a specific area of the area surrounding the Nth left eye image area TAN L corresponding to the Nth left eye image GN L. Also good. For example, as shown in FIG. 26, an Nth reference combined position PN L that specifies the left and right areas sandwiching the Nth left eye image area TAN L may be designated. Further, as shown in Figure 27, the N reference the synthetic position PN L for specifying a region located below the first N left-eye image region TAN L may be specified. Furthermore, as shown in Figure 28, the N reference the combined position PN L may be specified width surrounding the periphery of the N left-eye image region TAN L identifies a region non-uniform.
 なお、第N基準被合成位置PNLおよび第N対応被合成位置PNRは、環状等の所定形状の領域の全画素を特定する位置であっても良いし、環状等の所定形状の領域を特定可能な位置(特定位置とも言う)であっても良い。特定位置は、例えば、所定形状の領域のうちの1以上の角部の画素の位置等であれば良い。 Note that the N reference the combined position PN L and the N corresponding the combined position PN R may be a position of identifying all pixels in the region of the predetermined shape of the annular like, the area of a predetermined shape of an annular or the like An identifiable position (also referred to as a specific position) may be used. The specific position may be, for example, a position of a pixel at one or more corners in a region having a predetermined shape.
    <(1-3-5-2)基準被合成位置および対応被合成位置が変更される場合>
 位置指定部468では、第N左眼用画像GNLおよび第N右眼用画像GNRのうちの1以上の画像に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定され得る。ここでは、第N基準被合成位置PNLが、第N左眼用画像GNLの周囲の領域と、第N左眼用画像GNLとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。また、第N対応被合成位置PNRが、第N右眼用画像GNRの周囲の領域と、第N右眼用画像GNRとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。
<When (1-3-5-2) reference composition position and corresponding composition position are changed>
In the position designating unit 468, the N-th reference combined position PN L and the N-th corresponding combined position PN according to one or more of the N-th left-eye image GN L and the N-th right-eye image GN R. R can be specified. Here, a region first N reference the combined position PN L is, and the surrounding region of the N left-eye image GN L, the first N left-eye image GN L corresponding to the region of the periphery in another image Can be specified in one or more of a plurality of regions including. Further, the N corresponding object synthesizing position PN R is, and the surrounding region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the area of the surrounding in another image It can be specified in one or more of the plurality of regions to be included.
 なお、第N基準被合成位置PNLが、第N左眼用画像GNLの外縁部近傍の領域と、第N左眼用画像GNLとは別の画像において該外縁部近傍の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定されても良い。また、第N対応被合成位置PNRが、第N右眼用画像GNRの外縁部近傍の領域と、第N右眼用画像GNRとは別の画像において該外縁部近傍の領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定されても良い。 Note that the N reference the combined position PN L is, and the outer edge portion near the region of the N left-eye image GN L, corresponding to the region of the outer edge vicinity in another image and the N left-eye image GN L It may be specified in one or more of a plurality of regions including a region to be performed. Further, the N corresponding object synthesizing position PN R is, and the outer edge portion near the region of the N right-eye image GN R, corresponding to the region of the outer edge vicinity in another image and the N right-eye image GN R It may be specified in one or more of a plurality of regions including a region to be performed.
 このような第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法としては、例えば、次の第1~3指定方法のうちの1以上の指定方法が採用され得る。 As such method of specifying the N reference the combined position PN L and the N corresponding object synthesizing position PN R, for example, one or more specified methods of the following first to third designated method can be adopted.
     <(1-3-5-2-1)第1指定方法>
 第N左眼用画像GNLと第N右眼用画像GNRとの間における、物体の同一部分を示す基準画素と対応画素との間の位置のずれ量(すなわち視差)に係る分布に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される。
<(1-3-5-2-1) First designation method>
According to the distribution relating to the positional deviation amount (ie, parallax) between the reference pixel indicating the same part of the object and the corresponding pixel between the N-th left-eye image GN L and the N-th right-eye image GN R Te, N-th reference the combined position PN L and the N corresponding object synthesizing position PN R is designated.
 例えば、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差が、第1閾値以上で且つ第2閾値以下であれば、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられ得る。この態様では、例えば、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差が、第1閾値未満であるか、または第2閾値を超えれば、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されない。更に、例えば、第N左眼用画像GNLと第N右眼用画像GNRとの間における視差の最大値と最小値との差が、小さくなれば小さくなる程、第N基準被合成位置PNLおよび第N対応被合成位置PNRの数が増大しても良い。 For example, the disparity between the first N left-eye image GN L and the N right-eye image GN R is, if and equal to or less than the second threshold value in the first threshold value or more, PN L and the second N reference-be-combined position N corresponds can aspects be combined position PN R is specified is considered. In this aspect, for example, if the parallax between the N-th left-eye image GN L and the N-th right-eye image GN R is less than the first threshold or exceeds the second threshold, combining position PN L and the N corresponding object synthesizing position PN R is not specified. Further, for example, the smaller the difference between the maximum value and the minimum value of the parallax between the Nth left eye image GN L and the Nth right eye image GN R , the smaller the Nth reference composite position. the number of PN L and the N corresponding object synthesizing position PN R may increase.
     <(1-3-5-2-2)第2指定方法>
 第N左眼用画像GNLと第N右眼用画像GNRとの間における、M組(Mは2以上の整数)の基準注目領域および対応注目領域についての視差の分布に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される。M組(Mは2以上の整数)の基準注目領域および対応注目領域としては、注目領域検出部462において検出されたものが採用され得る。
<(1-3-5-2-2) Second designation method>
According to the disparity distribution of the M sets (M is an integer of 2 or more) of the reference attention area and the corresponding attention area between the N-th left-eye image GN L and the N-th right-eye image GN R. N reference the combined position PN L and the N corresponding object synthesizing position PN R is designated. As M reference (M is an integer of 2 or more) reference attention areas and corresponding attention areas, those detected by the attention area detection unit 462 may be employed.
 例えば、基準注目領域および対応注目領域における視差の最大値と最小値との差が、第1閾値以上で且つ第2閾値以下であれば、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられ得る。この態様では、例えば、基準注目領域および対応注目領域における視差の最大値と最小値との差が、第1閾値未満であるか、または第2閾値を超えれば、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されない。更に、例えば、基準注目領域および対応注目領域における視差の最大値と最小値との差が、小さくなれば小さくなる程、第N基準被合成位置PNLおよび第N対応被合成位置PNRの数が増大しても良い。 For example, the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, if and equal to or less than the second threshold value in the first threshold value or more, the N reference the combined position PN L and the N corresponding object synthesizing position PN embodiment R is specified may be considered. In this aspect, for example, if the difference between the maximum value and the minimum value of the parallax in the reference attention area and the corresponding attention area is less than the first threshold value or exceeds the second threshold value, the Nth reference composite position PN L and not specified the N corresponding the combined position PN R. Furthermore, for example, the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, as the smaller the smaller the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R May increase.
     <(1-3-5-2-3)第3指定方法>
 第N左眼用画像GNLと第N右眼用画像GNRのうちの1以上の画像に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRの配置が変化するように第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される。
<(1-3-5-2-3) Third designation method>
The arrangement of the Nth reference synthesized position PN L and the Nth corresponding synthesized position PN R changes according to one or more of the Nth left eye image GN L and the Nth right eye image GN R. the N reference the combined position PN L and the N corresponding object synthesizing position PN R is designated as.
 例えば、注目領域検出部462において第N左眼用画像GNLと第N右眼用画像GNRから検出された基準注目領域および対応注目領域の位置に応じて第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられ得る。 For example, the attention area detection unit 462 first N N th reference the combined position PN L and in accordance with the detected position of the reference target region and corresponding region of interest from the left-eye image GN L and the N right-eye image GN R in is the N corresponding the combined position PN R may be considered a mode specified.
 ここで、例えば、注目領域検出部462によって、1組の基準注目領域と対応注目領域とが検出された場合を想定する。このとき、例えば、第N左眼用画像GNLを占める基準注目領域の位置に応じて、第N基準被合成位置PNLが指定され得る。また、第N右眼用画像GNRを占める対応注目領域の位置に応じて、第N対応被合成位置PNRが指定され得る。 Here, for example, it is assumed that the attention area detection unit 462 detects a set of reference attention areas and corresponding attention areas. At this time, for example, the Nth reference combined position PN L can be designated according to the position of the reference region of interest occupying the Nth left eye image GN L. Further, in accordance with the position of the corresponding region of interest occupied by the N-th right-eye image GN R, it is the N corresponding the combined position PN R may be designated.
 図29には、基準注目領域が物体領域O1Lであり、対応注目領域が物体領域O1Rである場合が示されている。この場合、第N左眼用画像GNLに相当する左眼用画像領域TANLの周囲の領域FLのうち、物体領域O1Lが第1の所定方向(ここでは、-X方向)に投影された領域を示す位置PN1Lと、物体領域O1Lが第2の所定方向(ここでは、-Y方向)に投影された領域を示す位置PN2Lとが、第N基準被合成位置PNLとして指定され得る。第N右眼用画像GNRについては、図示が省略されているが、第N左眼用画像GNLと同様に、第N対応被合成位置PNRが指定され得る。 FIG. 29 shows a case where the reference attention area is the object area O1 L and the corresponding attention area is the object area O1 R. In this case, the object region O1 L is projected in the first predetermined direction (here, the −X direction) out of the region F L around the left eye image region TAN L corresponding to the Nth left eye image GN L. and position PN1 L indicating a region which is (here, -Y direction) the object region O1 L is the second predetermined direction and the position PN2 L indicating an area which is projected on, as the N reference the synthetic position PN L Can be specified. For the N right-eye image GN R, but are not shown, as with the N image GN L for the left eye, the first N corresponds the combined position PN R may be designated.
 これにより、基準注目領域および対応注目領域の位置が変われば、第N基準被合成位置PNLおよび第N対応被合成位置PNRの配置が変更される。また、基準注目領域および対応注目領域の大きさが変われば、第N基準被合成位置PNLおよび第N対応被合成位置PNRの大きさが変更される。更に、基準注目領域および対応注目領域の数が変われば、第N基準被合成位置PNLおよび第N対応被合成位置PNRの数が変更される。 Thus, if Kaware the position of the reference target region and corresponding region of interest, the arrangement of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed. Also, if Kaware the magnitude of the reference target region and corresponding region of interest, the size of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed. Furthermore, Kaware the number of reference target region and corresponding region of interest, the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed.
   <(1-3-6)画像の合成方法>
 画像合成部469では、位置指定部468で指定された第N基準被合成位置PNLに合わせて第N基準領域画像INLが配置され、位置指定部468で指定された第N対応被合成位置PNRに合わせて第N対応領域画像INRが配置される。これにより、立体視画像情報が生成される。
<(1-3-6) Image composition method>
The image combining unit 469, the N reference region image IN L is aligned with the first N reference the synthetic position PN L specified by the position specifying unit 468, the N corresponding the synthesis position designated by the position designator 468 is the N corresponding region image iN R is aligned with the PN R. Thereby, stereoscopic image information is generated.
 例えば、画像合成部469では、第N基準領域画像INLが、第N左眼用画像GNLの周囲の領域と、第N左眼用画像GNLとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域に配置され得る。また、第N対応領域画像INRが、第N右眼用画像GNRの周囲の領域と、第N右眼用画像GNRとは別の画像において該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域に配置され得る。 For example, the image combining unit 469, the N reference region image IN L is, and the surrounding region of the N left-eye image GN L, to the surrounding area in another image and the N left-eye image GN L It can be arranged in one or more of a plurality of regions including corresponding regions. Also includes the N corresponding region image IN R is, and the surrounding region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the area of the surrounding in another image It may be arranged in one or more areas of the plurality of areas.
 これにより、第N左眼用画像GNLと第N基準領域画像INLとが合成された第N左眼用合成画像GSNLが生成され、第N右眼用画像GNRと第N対応領域画像INRとが合成された第N右眼用合成画像GSNRが生成され得る。このとき、一場面の全フレームについて、同一の基準ズレ量に基づいて取得された第N左眼用画像GNLと第N基準領域画像INLとが合成される。 Thus, the N left eye combined image GSN L to the N-th image GN L for the left eye and the N reference region image IN L is synthesized are generated, the N right-eye image GN R and the N corresponding region A composite image for the Nth right eye GSN R synthesized with the image IN R can be generated. At this time, the N-th left-eye image GN L and the N-th reference area image IN L acquired based on the same reference deviation amount are synthesized for all frames of one scene.
 例えば、図30で示されるように、図19で示された第1場面の第1フレーム群GG1Lの各第N左眼用画像GNLに対して、同一の基準ズレ量に係る第N基準領域画像INLが合成される。また、図19で示された第2場面の第2フレーム群GG2Lの各第N左眼用画像GNLに対して、他の同一の基準ズレ量に係る第N基準領域画像INLが合成される。また、例えば、図31で示されるように、図21で示された第1場面の第n左眼用画像GnLに対して第n基準領域画像InLが合成され、図21で示された第2場面の第(n+m)左眼用画像G(n+m)Lに対して第(n+m)基準領域画像I(n+m)Lが合成される。 For example, as shown in FIG. 30, for each Nth left eye image GN L in the first frame group GG1 L of the first scene shown in FIG. The area image IN L is synthesized. Further, the Nth reference region image IN L related to the same reference shift amount is combined with each Nth left eye image GN L of the second frame group GG2 L of the second scene shown in FIG. Is done. Further, for example, as shown in FIG. 31, the n-th reference region image In L is synthesized with the n-th left-eye image Gn L in the first scene shown in FIG. 21, and is shown in FIG. The (n + m) th reference region image I (n + m) L is synthesized with the (n + m) th left-eye image G (n + m) L in the second scene.
 以上のようにして生成された立体視画像情報は、第1形式および第2形式のうちの少なくとも一方の形式の情報を含んでいれば良い。ここで、第1形式は、一画面において第N左眼用画像GNLと第N右眼用画像GNRと第N基準領域画像INLと第N対応領域画像INRとが重畳されている態様で同時期に表示可能な形式を含む。また、第2形式は、一画面において第N左眼用画像GNL、第N右眼用画像GNR、第N基準領域画像INL、および第N対応領域画像INRのうちの1以上の画像と残余の1以上の画像とが時間順次に表示可能な形式を含む。このように種々の表示態様が採用されても、3D動画を見ているユーザーが得られる奥行き感が向上し得る。 The stereoscopic image information generated as described above only needs to include information of at least one of the first format and the second format. Here, in the first format, the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference region image IN L, and the Nth corresponding region image IN R are superimposed on one screen. The format which can be displayed at the same time in the aspect is included. Further, the second format is one or more of the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference area image IN L , and the Nth corresponding area image IN R in one screen. It includes a format in which an image and one or more remaining images can be displayed in time sequence. Thus, even if various display modes are employed, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.
  <(1-4)情報処理装置の動作フロー>
 図32は、一実施形態に係る画像処理装置の動作フローの一例を示すフローチャートである。本動作フローは、制御部46によって記憶部44内のプログラムPG1が読み込まれて実行されることで実現される。例えば、ユーザーによる操作部41の操作に応じて情報処理装置4における3D動画に係る画像処理の実行が要求され、本動作フローが開始される。
<(1-4) Operation flow of information processing apparatus>
FIG. 32 is a flowchart illustrating an example of an operation flow of the image processing apparatus according to the embodiment. This operation flow is realized by reading and executing the program PG1 in the storage unit 44 by the control unit 46. For example, execution of image processing related to a 3D moving image in the information processing apparatus 4 is requested in accordance with the operation of the operation unit 41 by the user, and this operation flow is started.
 図32のステップS1では、映像取得部461によって、映像情報が取得される。 32, the video information is acquired by the video acquisition unit 461.
 ステップS2では、モード設定部467によって、注目領域検出部462が、注目領域を検出するモードに設定されているか否か判定される。ここで、注目領域を検出するモードに設定されていれば、ステップS3に進み、注目領域を検出するモードに設定されていなければ、ステップS4に進む。 In step S2, the mode setting unit 467 determines whether or not the attention area detection unit 462 is set to a mode for detecting the attention area. If the mode for detecting the attention area is set, the process proceeds to step S3. If the mode for detecting the attention area is not set, the process proceeds to step S4.
 ステップS3では、注目領域検出部462によって、第N左眼用画像GNLおよび第N右眼用画像GNRが対象とされて、基準注目領域および対応注目領域が検出される。 In step S3, the attention area detection unit 462 detects the reference attention area and the corresponding attention area for the N-th left eye image GN L and the N-th right eye image GN R.
 ステップS4では、変化検出部463によって、場面の変化が検出される。 In step S4, the change detector 463 detects a change in the scene.
 ステップS5では、ズレ量決定部464によって、各場面に対して基準ズレ量が決定される。 In step S5, the deviation amount determination unit 464 determines a reference deviation amount for each scene.
 ステップS6では、領域画像取得部465によって、各場面に対して第N基準領域画像INLおよび第N対応領域画像INRに係る第N領域画像情報が取得される。 In step S6, the area image acquisition unit 465 acquires Nth area image information related to the Nth reference area image IN L and the Nth corresponding area image IN R for each scene.
 ステップS7では、モード設定部467によって、位置指定部468のモード設定に係る信号が信号受付部466で受け付けられたか否か判定される。ここで、モード設定に係る信号が受け付けられていればステップS8に進み、モード設定に係る信号が受け付けられていなければステップS9に進む。 In step S <b> 7, the mode setting unit 467 determines whether a signal related to mode setting of the position specifying unit 468 has been received by the signal receiving unit 466. If a signal related to mode setting is accepted, the process proceeds to step S8. If a signal related to mode setting is not accepted, the process proceeds to step S9.
 ステップS8では、モード設定部467によって、信号受付部466で受け付けられた信号に応じて、第1モードと第2モードとを含む複数のモードのうちの何れか1つのモードに位置指定部468が設定される。なお、ステップS8におけるモードの設定前には、位置指定部468は、所定のモードに初期設定されているものとする。 In step S <b> 8, the position specifying unit 468 selects one of a plurality of modes including the first mode and the second mode according to the signal received by the signal receiving unit 466 by the mode setting unit 467. Is set. It is assumed that position setting unit 468 is initially set to a predetermined mode before the mode is set in step S8.
 ステップS9では、位置指定部468によって、各場面に対して第N基準被合成位置PNLと第N対応被合成位置PNRとが指定される。 In step S9, the position specifying unit 468, and the N reference the combined position PN L and the N corresponding object synthesizing position PN R is specified for each scene.
 ステップS10では、画像合成部469によって、第N左眼用画像GNLと、第N右眼用画像GNRと、第N基準領域画像INLと、第N対応領域画像INRとが合成されることで、立体視画像情報が生成されて、本動作フローが終了する。 In step S10, the image synthesizing unit 469 synthesizes the Nth left eye image GN L , the Nth right eye image GN R , the Nth reference region image IN L , and the Nth corresponding region image IN R. As a result, stereoscopic image information is generated, and this operation flow ends.
  <(1-5)一実施形態のまとめ>
 以上のように、一実施形態に係る画像処理技術によれば、3D動画に対し、場面の変化に応じて場面に適した基準の視差を有する画像が合成される。このため、基準の視差を有する画像との比較により、3D動画における視差の違いが認識され易くなる。その結果、3D動画を見ているユーザーが得られる奥行き感が向上し得る。
<(1-5) Summary of Embodiment>
As described above, according to the image processing technology according to an embodiment, an image having a reference parallax suitable for a scene is synthesized with a 3D moving image according to a change in the scene. For this reason, the difference with the parallax in a 3D moving image becomes easy to be recognized by the comparison with the image which has a standard parallax. As a result, the sense of depth that can be obtained by the user watching the 3D moving image can be improved.
 <(2)変形例>
 なお、本発明は上述の実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更、改良等が可能である。
<(2) Modification>
It should be noted that the present invention is not limited to the above-described embodiment, and various changes and improvements can be made without departing from the gist of the present invention.
  <(2-1)第1変形例>
 上記一実施形態に係る処理では、第N基準領域画像INLが、第N左眼用画像GNLの周囲の領域と、第N左眼用画像GNLとは別の画像のうちの該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域に配置され、第N対応領域画像INRが、第N右眼用画像GNRの周囲の領域と、第N右眼用画像GNRとは別の画像のうちの該周囲の領域に対応する領域とを含む複数の領域のうちの1以上の領域に配置されたが、これに限られない。
<(2-1) First Modification>
In the process according to the one embodiment, the N reference region image IN L is, and the surrounding region of the N left-eye image GN L, the periphery of the other images from the first N left-eye image GN L The Nth corresponding region image IN R is arranged in one or more of a plurality of regions including a region corresponding to the region of the Nth right eye, the region around the Nth right eye image GN R , and the Nth right eye. Although it has been arranged in one or more of a plurality of regions including a region corresponding to the surrounding region in a different image from the work image GN R , the present invention is not limited to this.
 例えば、図33で示されるように、第N基準領域画像INLが、第N左眼用画像GNLの内部の領域(内部領域とも言う)に重畳するように配置され、第N対応領域画像INRが、第N右眼用画像GNRの内部領域に重畳するように配置されても良い。また、第N基準領域画像INLが、第N左眼用画像GNLとは別の画像における該第N左眼用画像GNLの内部領域に対応する領域に配置されても良いし、第N対応領域画像INRが、第N右眼用画像GNRとは別の画像における該第N右眼用画像GNRの内部領域に対応する領域に配置されても良い。 For example, as shown in FIG. 33, the Nth reference area image IN L is arranged so as to overlap an area (also referred to as an internal area) of the Nth left eye image GN L , and the Nth corresponding area image IN R may be arranged so as to overlap the inner region of the N-th right eye image GN R. Further, the N reference region image IN L is, to the N-th left-eye image GN L may be arranged in a region corresponding to the internal region of said N left-eye image GN L in another image, the N corresponding region image iN R, it may be the first N right-eye image GN R are disposed in a region corresponding to the internal region of said N right-eye image GN R in another image.
 図33の左側には、図1で示された第N左眼用画像GNLに相当する第n左眼用画像領域TAnLの内部領域の第n基準被合成位置PnLに合わせて第n基準領域画像InLが合成されている第n左眼用合成画像GSnLの一例が示されている。また、図33の右側には、図1で示された第N右眼用画像GNRに相当する第n右眼用画像領域TAnRの内部領域の第n対応被合成位置PnRに合わせて第n対応領域画像InRが合成されている第n右眼用合成画像GSnRの一例が示されている。 The left side of FIG. 33, the n in accordance with the n-th reference the combined position Pn L of the internal region of the n left-eye image area TAn L corresponding to the N left-eye image GN L shown in FIG. 1 An example of the composite image GSn L for the nth left eye in which the reference area image In L is combined is shown. Further, on the right side of FIG. 33, it is matched with the n-th corresponding combined position Pn R in the inner region of the n-th right eye image area TAn R corresponding to the N-th right eye image GN R shown in FIG. An example of the n corresponding region image an in R n-th right-eye synthesized image GSn R being synthesized is shown.
 このような構成により、本来の注目対象から極力目を離さずとも、3D動画を見ているユーザーが得られる奥行き感が向上し得る。特に、ユーザーが注目している物体の近傍に、第N基準領域画像INLおよび第N対応領域画像INRに基づく表示物が存在するような表示態様が実現されれば、3D動画を見ているユーザーが得られる奥行き感が更に向上し得る。 With such a configuration, it is possible to improve a sense of depth that can be obtained by a user watching a 3D video without taking away as much as possible from the original target of attention. In particular, if a display mode in which a display object based on the Nth reference area image IN L and the Nth corresponding area image IN R exists in the vicinity of the object that the user is paying attention to, the 3D moving image is viewed. The sense of depth that can be obtained by a user can be further improved.
 第N基準領域画像INLおよび第N対応領域画像INRは、例えば、特定のマーカーであれば良い。特定のマーカーは、例えば、固有の特徴を有し、ユーザーによって第N左眼用画像GNLおよび第N右眼用画像GNRに元々含まれる物体とは容易に区別可能なものであれば良い。固有の特徴は、例えば、形状、色、およびテクスチャー等によって実現され得る。 The Nth reference area image IN L and the Nth corresponding area image IN R may be, for example, specific markers. The specific marker may be, for example, a unique marker that can be easily distinguished from an object originally included in the Nth left eye image GN L and the Nth right eye image GN R by the user. . Intrinsic features can be realized, for example, by shape, color, texture, and the like.
 第N左眼用画像GNLおよび第N右眼用画像GNRが実在する物体を捉えた画像である場合、特定のマーカーは、CG等で描かれたマーカー等であれば良い。このとき、特定のマーカーとしては、棒、三角形、および矢印等といった各種の単純な形状、ならびに花瓶および蝶等といった各種のオブジェ等が考えられる。これにより、ユーザーが第N左眼用画像GNLおよび第N右眼用画像GNRに元々含まれる物体と特定のマーカーとを混同し難くなる。 When the N-th left-eye image GN L and the N-th right-eye image GN R are images that capture an actual object, the specific marker may be a marker drawn with CG or the like. At this time, examples of the specific marker include various simple shapes such as a stick, a triangle, and an arrow, and various objects such as a vase and a butterfly. This makes it difficult for the user to confuse the object originally included in the N-th left eye image GN L and the N-th right eye image GN R with the specific marker.
 また、特定のマーカーが、固有の形状、固有の色、および固有のテクスチャー等を有する一方で、半透明なものであれば、第N左眼用画像GNLおよび第N右眼用画像GNRに基づく画像の表示が阻害され難くなる。 If the specific marker has a specific shape, a specific color, a specific texture, and the like, but is translucent, the Nth left-eye image GN L and the Nth right-eye image GN R Display of an image based on is difficult to be hindered.
 位置指定部468では、例えば、予め決められた所定の第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されても良いが、ユーザーによる操作部41の操作に応じて第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されても良い。この場合、第N基準被合成位置PNLは、例えば、第N左眼用画像GNLの内部領域と、第N左眼用画像GNLとは別の画像の該内部領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。また、第N対応被合成位置PNRは、例えば、第N右眼用画像GNRの内部領域と、第N右眼用画像GNRとは別の画像の該内部領域に対応する領域とを含む複数の領域のうちの1以上の領域において指定され得る。 The position specifying unit 468, for example, may be predetermined and N-th reference the combined position PN L is the N corresponding object synthesizing position PN R is specified but the in response to operation of the operation unit 41 by the user N reference the combined position PN L and the N corresponding object synthesizing position PN R may be specified. In this case, the N reference the combined position PN L, for example, the internal region of the N left-eye image GN L, a region from the N-th left-eye image GN L corresponding to the internal region of another image Can be specified in one or more of a plurality of regions including. Further, the first N corresponds the combined position PN R, for example, the internal region of the N right-eye image GN R, a region and the N right-eye image GN R corresponding to the internal region of another image It can be specified in one or more of the plurality of regions to be included.
 また、位置指定部468では、第N左眼用画像GNLおよび第N右眼用画像GNRのうちの1以上の画像に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定されても良い。具体的な指定方法としては、例えば、基準注目領域および対応注目領域の近傍に第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられる。ここで、注目領域の近傍としては、例えば、注目領域の左下、左上、右下、右上、上方、下方、左方、および右方の位置等が考えられる。また、基準注目領域および対応注目領域の近傍としては、注目領域を取り囲む位置等も考えられる。 Further, in the position designation unit 468, the N-th reference composition position PN L and the N-th corresponding composition composition according to one or more of the N-th left eye image GN L and the N-th right eye image GN R. position PN R may be specified. Specific specifying, for example, certain aspects the N reference the combined position PN L and the N corresponding object synthesizing position PN R is specified in the vicinity of the reference target region and corresponding region of interest can be considered. Here, as the vicinity of the attention area, for example, the lower left, upper left, lower right, upper right, upper, lower, left, and right positions of the attention area can be considered. Further, as the vicinity of the reference attention area and the corresponding attention area, a position surrounding the attention area may be considered.
 例えば、物体領域O1L,O1Rが基準注目領域および対応注目領域であれば、図33で示されるように、基準注目領域および対応注目領域である物体領域O1L,O1Rの左下に第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられる。また、例えば、図34で示されるように、基準注目領域および対応注目領域である物体領域O1L,O1Rを囲むリング状の第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様が考えられる。 For example, if the object region O1 L, O1 R is a reference region of interest and corresponding region of interest, as shown in Figure 33, the reference target region and object region O1 is the corresponding region of interest L, O1 R N-th bottom left of the embodiments reference the combined position PN L and the N corresponding object synthesizing position PN R is specified are contemplated. Further, for example, as shown in FIG. 34, the ring-shaped Nth reference synthesized position PN L and Nth corresponding synthesized position PN R surrounding the object areas O1 L and O1 R that are the reference attention area and the corresponding attention areas. Is possible.
 なお、例えば、基準注目領域および対応注目領域の大きさが変われば、第N基準被合成位置PNLで特定される領域および第N対応被合成位置PNRで特定される領域の大きさが変更されても良い。また、基準注目領域および対応注目領域の数が変われば、第N基準被合成位置PNLおよび第N対応被合成位置PNRの数が変更されても良い。ユーザーによる操作部41の操作に応じて、第N基準被合成位置PNLで特定される領域および第N対応被合成位置PNRで特定される領域の大きさおよび数が変更されても良い。 Incidentally, for example, if Kaware the magnitude of the reference target region and corresponding region of interest, the size of the region specified by the region identified by the N reference the combined position PN L and the N corresponding object synthesizing position PN R is changed May be. Also, if Kaware the number of reference target region and corresponding region of interest, the number of the N reference the combined position PN L and the N corresponding object synthesizing position PN R may be changed. Depending on the operation of the operation unit 41 by the user, the size and number of regions specified by the region and the N corresponding the combined position PN R specified by the N reference the combined position PN L may be changed.
 また、例えば、第N左眼用画像GNLと第N右眼用画像GNRとの間における、M組(Mは2以上の整数)の基準注目領域および対応注目領域についての視差の分布に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定の要否が判断されても良い。第N左眼用画像GNLと第N右眼用画像GNRとの間における、物体の同一部分を示す基準画素と対応画素との間の位置のズレ量(すなわち視差)に係る分布に応じて、第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定の要否が判断されても良い。 Further, for example, the distribution of parallaxes for the M sets (M is an integer of 2 or more) of the reference attention area and the corresponding attention area between the N-th left-eye image GN L and the N-th right-eye image GN R. in response, the necessity of specifying the N-th reference the combined position PN L and the N corresponding object synthesizing position PN R may be determined. According to the distribution relating to the positional deviation amount (ie, parallax) between the reference pixel indicating the same part of the object and the corresponding pixel between the N-th left-eye image GN L and the N-th right-eye image GN R Te, necessity of specifying the N-th reference the combined position PN L and the N corresponding object synthesizing position PN R may be determined.
 更に、ユーザーが注目していないものと予測される領域(非注目領域)に第N基準被合成位置PNLおよび第N対応被合成位置PNRが指定される態様も考えられる。非注目領域としては、注目領域検出部462で検出される基準注目領域および対応注目領域とは異なる領域が挙げられる。例えば、非注目領域としては、第N左眼用画像GNLおよび第N右眼用画像GNRのうち、端部近傍の領域、動きベクトルの解析結果から得られる動きの少ない物体を示す領域、および色およびテクスチャーが目立たない領域等が挙げられる。これにより、ユーザーが注目している領域の表示が阻害され難くなる。その結果、視覚的な違和感の抑制と3D動画を見ているユーザーが得られる奥行き感の向上とが両立し得る。 Furthermore, a mode is also conceivable in which the Nth reference synthesized position PN L and the Nth corresponding synthesized position PN R are specified in an area that is predicted not to be noticed by the user (non-attention area). Examples of the non-attention area include areas different from the reference attention area and the corresponding attention area detected by the attention area detection unit 462. For example, as the non-attention area, an area in the vicinity of the end of the N-th left-eye image GN L and the N-th right-eye image GN R , an area indicating an object with little motion obtained from the motion vector analysis result, And an area where the color and texture are not conspicuous. This makes it difficult for the display of the area that the user is paying attention to. As a result, it is possible to achieve both the suppression of the visual discomfort and the improvement of the depth feeling that can be obtained by the user watching the 3D video.
  <(2-2)第2変形例>
 上記一実施形態に係る処理では、一場面に対して一定の基準ズレ量が決定されたが、これに限られない。例えば、場面の変化の近傍において、基準ズレ量が段階的または徐々に変更されても良い。これにより、3D動画において場面が変化する際にユーザーが受ける違和感が低減され得る。
<(2-2) Second Modification>
In the processing according to the above-described embodiment, a certain reference deviation amount is determined for one scene, but the present invention is not limited to this. For example, the reference deviation amount may be changed stepwise or gradually in the vicinity of the scene change. Thereby, the uncomfortable feeling that the user receives when the scene changes in the 3D moving image can be reduced.
 具体的には、ズレ量決定部464において、映像情報のうちの場面の変化の近傍における各フレーム間において、基準ズレ量の差分が所定量以下となるように、各フレームに対して基準ズレ量が決定されても良い。この場合、領域画像取得部465では、映像情報のうちの場面の変化の近傍における各フレームに対して、表示対象物の同一部分を示す画素の位置が、一方向に、各フレームに対して決定された基準ズレ量ずれている関係を有する第N基準領域画像INLと第N対応領域画像INRとに係る第N領域画像情報が取得され得る。そして、画像合成部469では、映像情報のうちの場面の変化の近傍における各フレームについて、該各フレームに係る第N左眼用画像GNLと第N右眼用画像GNRと第N基準領域画像INLと第N対応領域画像INRとが合成され得る。 Specifically, in the shift amount determination unit 464, the reference shift amount with respect to each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount between the frames in the vicinity of the scene change in the video information. May be determined. In this case, the region image acquisition unit 465 determines the position of the pixel indicating the same part of the display object in one direction for each frame for each frame in the vicinity of the scene change in the video information. The N-th region image information related to the N-th reference region image IN L and the N-th corresponding region image IN R having a relationship that is shifted by the reference displacement amount can be acquired. Then, in the image composition unit 469, for each frame in the vicinity of the scene change in the video information, the Nth left eye image GN L , the Nth right eye image GN R, and the Nth reference region relating to each frame are displayed. The image IN L and the Nth corresponding region image IN R can be combined.
 例えば、図35で示されるように、先ず、上記一実施形態と同様な決定方法によって、第1場面(時刻T0~T1)に対して基準ズレ量PR1が算出され、第2場面(時刻T1~T2)に対して基準ズレ量PR2が算出される。そして、例えば、第1場面に対しては、基準ズレ量PR1がそのまま採用され、第2場面に対しては、先頭のフレームから順に、各フレーム間で基準ズレ量が所定量ずつ増加または減少して基準ズレ量PR2まで至るように、基準ズレ量が決定され得る。この場合、時刻T1~T1+αにかけて、基準ズレ量が段階的に変更される。 For example, as shown in FIG. 35, first, the reference deviation amount PR1 is calculated for the first scene (time T0 to T1) by the same determination method as in the above embodiment, and the second scene (time T1 to T1) is calculated. A reference deviation amount PR2 is calculated with respect to T2). For example, for the first scene, the reference deviation amount PR1 is adopted as it is, and for the second scene, the reference deviation amount is increased or decreased by a predetermined amount between the frames in order from the first frame. Thus, the reference deviation amount can be determined so as to reach the reference deviation amount PR2. In this case, the reference deviation amount is changed stepwise from time T1 to T1 + α.
 ここで、所定量は、予め設定されていても良いし、基準ズレ量PR1と基準ズレ量PR2との差が所定のフレーム数(例えばNf)で除されることで設定されても良い。所定のフレーム数Nfは、例えば、30フレーム等に設定され得る。このとき、時刻T1~T1+αにかけて、基準ズレ量は線形的に変化する。 Here, the predetermined amount may be set in advance, or may be set by dividing the difference between the reference deviation amount PR1 and the reference deviation amount PR2 by a predetermined number of frames (for example, Nf). The predetermined frame number Nf can be set to 30 frames, for example. At this time, the reference deviation amount linearly changes from time T1 to T1 + α.
 また、所定量は、第1場面に係る基準ズレ量PR1に所定の係数(例えば、0.01)が乗ぜられることで算出されても良い。このとき、時刻T1~T1+αにかけて、基準ズレ量は非線形的に変化する。 Further, the predetermined amount may be calculated by multiplying a reference deviation amount PR1 related to the first scene by a predetermined coefficient (for example, 0.01). At this time, the reference deviation amount changes nonlinearly from time T1 to T1 + α.
 そして、このような構成は、背景が変化せず、人物を含む各種物体が大きく移動することで、各種物体が出現および消失するような場面が捉えられた動画に対して適用されれば、効果が顕著なものとなる。 And if such a configuration is applied to a movie in which scenes where various objects appear and disappear as a result of the large movement of various objects including people without changing the background are effective Becomes prominent.
  <(2-3)その他の変形例>
 例えば、モード設定部467によって設定される位置指定部468のモードによって、上記一実施形態に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法と、上記第1変形例に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法とが選択的に実行されても良い。換言すれば、例えば、位置指定部468が第1モードに設定されていれば、上記一実施形態に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法が実行され、位置指定部468が第2モードに設定されていれば、上記第1変形例に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法が実行されても良い。これにより、ユーザーの意図に沿って、視覚的な違和感の抑制と3D動画の見易さの確保とが適切に選択され得る。
<(2-3) Other modifications>
For example, the mode of position designator 468 to be set by the mode setting unit 467, a designation method of the N reference the combined position PN L and the N corresponding the combined position PN R according to the above embodiment, the first modified the N reference the synthesis position according to examples PN L and the N corresponding designation method of the combined position PN R may be performed selectively. In other words, for example, the position specifying section 468 if it is set to the first mode, specifying of the N reference the combined position PN L and the N corresponding the combined position PN R according to the embodiment is executed if the position specifying section 468 is set to the second mode, specifying of the N reference the combined position PN L and the N corresponding the combined position PN R according to the first modification may be performed. Thereby, suppression of visual discomfort and ensuring the visibility of 3D moving images can be appropriately selected according to the user's intention.
 また、上記一実施形態に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法と上記第1変形例に係る第N基準被合成位置PNLおよび第N対応被合成位置PNRの指定方法とが両方とも同時に実行されても良い。 Further, the N reference the combined position PN L and the N corresponding object synthesizing position PN R of the N reference according to the specified method in the first modification to be combined position PN L and the N corresponding the synthesis according to the embodiment both the specified methods position PN R may be performed simultaneously.
 また、上記一実施形態、第1変形例、および第2変形例では、元々3D動画を含む映像情報が取得されたが、これに限られない。例えば、通常の動画を含む映像情報が得られた後に、通常の動画から種々の方法によって3D動画が生成されることで、3D動画を含む映像情報が取得されても良い。 In the one embodiment, the first modified example, and the second modified example, the video information including the 3D moving image is originally acquired. However, the present invention is not limited to this. For example, video information including a 3D video may be acquired by generating 3D video from the normal video by various methods after video information including the normal video is obtained.
 なお、上記一実施形態および各種変形例をそれぞれ構成する全部または一部を、適宜、矛盾しない範囲で組み合わせ可能であることは、言うまでもない。 Needless to say, all or a part of each of the above-described embodiment and various modifications can be appropriately combined within a consistent range.
 1 情報処理システム
 2 ステレオカメラ
 4 情報処理装置
 5 視線検出センサ
 41 操作部
 42 表示部
 44 記憶部
 46 制御部
 461 映像取得部
 462 注目領域検出部
 463 変化検出部
 464 ズレ量決定部
 465 領域画像取得部
 466 信号受付部
 467 モード設定部
 468 位置指定部
 469 画像合成部
DESCRIPTION OF SYMBOLS 1 Information processing system 2 Stereo camera 4 Information processing apparatus 5 Eye-gaze detection sensor 41 Operation part 42 Display part 44 Storage part 46 Control part 461 Image | video acquisition part 462 Attention area detection part 463 Change detection part 464 Misalignment amount determination part 465 Area image acquisition part 466 Signal receiving unit 467 Mode setting unit 468 Position specifying unit 469 Image composition unit

Claims (25)

  1.  物体の同一部分を示す画素の位置が一方向にずれている関係を有する基準画像と対応画像とを各フレームの情報において含む映像情報を取得する第1取得部と、
     前記映像情報における場面の変化を検出する変化検出部と、
     前記映像情報のうちの前記場面の変化の後における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定する決定部と、
     表示対象物の同一部分を示す画素の位置が前記一方向に前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得する第2取得部と、
     前記映像情報のうちの前記場面の変化の後における各フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成する合成部と、
    を備えることを特徴とする画像処理装置。
    A first acquisition unit configured to acquire video information including a reference image and a corresponding image having a relationship in which positions of pixels indicating the same part of the object are shifted in one direction in information of each frame;
    A change detection unit for detecting a change in a scene in the video information;
    A reference shift amount is determined based on a shift amount of a pixel position indicating the same part of the object between the reference image and the corresponding image of one or more frames after the scene change in the video information. A decision unit to
    A second acquisition unit that acquires region image information relating to a reference region image and a corresponding region image having a relationship in which a position of a pixel indicating the same part of a display object is shifted in the one direction by the reference shift amount;
    A synthesizing unit that generates stereoscopic image information by synthesizing the reference image, the corresponding image, the reference region image, and the corresponding region image for each frame after the change of the scene in the video information; ,
    An image processing apparatus comprising:
  2.  請求項1に記載の画像処理装置であって、
     前記立体視画像情報が、
     一画面において前記基準画像と前記対応画像と前記基準領域画像と前記対応領域画像とを重畳している態様で同時期に表示可能とする第1形式の情報、および一画面において前記基準画像と前記対応画像と前記基準領域画像と前記対応領域画像とのうちの1以上の画像と残余の1以上の画像とを時間順次に表示可能とする第2形式の情報のうちの少なくとも一方の形式の情報を含むことを特徴とする画像処理装置。
    The image processing apparatus according to claim 1,
    The stereoscopic image information is
    Information in a first format that can be displayed at the same time in a mode in which the reference image, the corresponding image, the reference area image, and the corresponding area image are superimposed on one screen, and the reference image and the Information of at least one of the information in the second format that makes it possible to display one or more images of the corresponding image, the reference region image, and the corresponding region image and the remaining one or more images in time sequence. An image processing apparatus comprising:
  3.  請求項1または請求項2に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記変化検出部によって場面の第1変化が検出されてから場面の第2変化が検出されるまでの一場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定し、
     前記合成部が、
     前記映像情報のうちの前記一場面における全フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成することを特徴とする画像処理装置。
    The image processing apparatus according to claim 1 or 2,
    The determination unit is
    Between the reference image and the corresponding image of one or more frames in one scene from when the first change of the scene is detected by the change detection unit of the video information to when the second change of the scene is detected A reference shift amount is determined based on the shift amount of the pixel position indicating the same part of the object in
    The combining unit is
    Stereoscopic image information is generated by synthesizing the reference image, the corresponding image, the reference region image, and the corresponding region image for all frames in the one scene of the video information. Processing equipment.
  4.  請求項3に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記一場面における最初のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    The determination unit is
    Determining the reference shift amount based on a shift amount of a pixel position indicating the same part of the object between the reference image of the first frame in the one scene of the video information and the corresponding image. A featured image processing apparatus.
  5.  請求項3に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    The determination unit is
    The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image included in the one scene frame group of the video information. An image processing apparatus.
  6.  請求項3に記載の画像処理装置であって、
     予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、を更に備え、
     前記決定部が、
     前記映像情報のうちの前記一場面の最初のフレームの前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量に基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detection unit for detecting a pair with the area;
    The determination unit is
    The reference shift amount is determined based on a shift amount of a pixel position indicating the same part of the object between the reference attention region and the corresponding attention region of the first frame of the scene in the video information. An image processing apparatus.
  7.  請求項3に記載の画像処理装置であって、
     予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、を更に備え、
     前記決定部が、
     前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detection unit for detecting a pair with the area;
    The determination unit is
    The reference shift amount based on a distribution of shift amounts of pixel positions indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the scene in the video information. Determining an image processing apparatus.
  8.  請求項3に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、仮想基準面から前記物体の表面までの仮想距離の代表値を算出し、前記仮想距離の代表値が所定値となる場合における前記仮想基準面に対応するズレ量を、前記基準ズレ量として決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    The determination unit is
    Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image and the corresponding image included in the frame group of the one scene of the video information, A representative value of a virtual distance to the surface is calculated, and a deviation amount corresponding to the virtual reference plane when the representative value of the virtual distance is a predetermined value is determined as the reference deviation amount. apparatus.
  9.  請求項3に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記一場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置の第1ズレ量の分布と、前記映像情報のうちの前記第1変化前の前場面における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置の第2ズレ量の分布とに基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 3,
    The determination unit is
    A distribution of first shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image of one or more frames in the one scene of the video information; The reference shift amount based on a distribution of second shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image of one or more frames in the previous scene before the first change. Determining an image processing apparatus.
  10.  請求項9に記載の画像処理装置であって、
     前記決定部が、
     前記第1ズレ量の分布に係る第1ズレ代表値と、前記第2ズレ量の分布に係る第2ズレ代表値とに基づいて、前記基準ズレ量を決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 9,
    The determination unit is
    An image processing apparatus that determines the reference deviation amount based on a first deviation representative value relating to the distribution of the first deviation amount and a second deviation representative value relating to the distribution of the second deviation amount. .
  11.  請求項9に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記前場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第1仮想基準面から前記物体の表面までの仮想距離の第1代表値を算出するとともに、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第2仮想基準面から前記物体の表面までの仮想距離の第2代表値を算出し、前記第1代表値と前記第2代表値との差が所定の許容範囲内となる場合における前記第2仮想基準面に対応するズレ量を、前記基準ズレ量として決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 9,
    The determination unit is
    Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image included in the frame group of the previous scene of the video information and the corresponding image, the first virtual reference plane A pixel that calculates a first representative value of a virtual distance to the surface of the object and that indicates the same portion of the object between the reference image and the corresponding image included in the frame group of the scene in the video information A second representative value of a virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the amount of deviation of the position of the position, and a difference between the first representative value and the second representative value is a predetermined value An image processing apparatus, wherein a deviation amount corresponding to the second virtual reference plane in a case where it falls within an allowable range is determined as the reference deviation amount.
  12.  請求項9に記載の画像処理装置であって、
     予め設定された検出ルールに従って、前記映像情報のうちの各フレームの前記基準画像と前記対応画像との組からユーザーの目を惹くものと予測される同一の物体を捉えた基準注目領域と対応注目領域との組を検出する領域検出部、
    を更に備え、
     前記決定部が、
     前記映像情報のうちの前記前場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第1仮想基準面から前記物体の表面までの仮想距離の第1代表値を算出するとともに、前記映像情報のうちの前記一場面のフレーム群に含まれる前記基準注目領域と前記対応注目領域との間における物体の同一部分を示す画素の位置のズレ量の分布に基づき、第2仮想基準面から前記物体の表面までの仮想距離の第2代表値を算出し、前記第1代表値と前記第2代表値との差が所定の許容範囲内となる場合における前記第2仮想基準面に対応するズレ量を、前記基準ズレ量として決定することを特徴とする画像処理装置。
    The image processing apparatus according to claim 9,
    In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detector for detecting a pair with the area;
    Further comprising
    The determination unit is
    A first virtual reference plane based on a distribution of displacement amounts of pixels indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the previous scene in the video information A first representative value of a virtual distance from the surface of the object to the surface of the object, and the same object between the reference attention area and the corresponding attention area included in the frame group of the scene in the video information A second representative value of the virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the shift amount of the pixel position indicating the portion, and the first representative value and the second representative value are calculated. An image processing apparatus, wherein a deviation amount corresponding to the second virtual reference plane when the difference is within a predetermined allowable range is determined as the reference deviation amount.
  13.  請求項1または請求項2に記載の画像処理装置であって、
     前記決定部が、
     前記映像情報のうちの前記場面の変化の近傍における各フレーム間において、前記基準ズレ量の差分が所定量以下となるように、各フレームに対して前記基準ズレ量を決定し、
     前記第2取得部が、
     前記映像情報のうちの前記場面の変化の近傍における前記各フレームに対して、表示対象物の同一部分を示す画素の位置が、前記一方向に、前記決定部によって前記各フレームに対して決定された前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得し、
     前記合成部が、
     前記映像情報のうちの前記場面の変化の近傍における前記各フレームについて、該各フレームに係る前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することを特徴とする画像処理装置。
    The image processing apparatus according to claim 1 or 2,
    The determination unit is
    Determining the reference deviation amount for each frame so that the difference in the reference deviation amount is not more than a predetermined amount between the frames in the vicinity of the change in the scene of the video information;
    The second acquisition unit
    For each frame in the vicinity of the scene change in the video information, the position of a pixel indicating the same portion of the display object is determined for the frame in the one direction by the determination unit. Region image information related to the reference region image and the corresponding region image having a relationship that the reference deviation amount is shifted,
    The combining unit is
    An image characterized by combining the reference image, the corresponding image, the reference region image, and the corresponding region image related to each frame for each frame in the vicinity of the scene change in the video information. Processing equipment.
  14.  請求項1から請求項13の何れか1つの請求項に記載の画像処理装置であって、
     前記変化検出部が、
     前記映像情報に含まれる2以上のフレームの間における画像の変化に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    An image processing apparatus according to any one of claims 1 to 13, wherein
    The change detector is
    An image processing apparatus for detecting a change in the scene according to a change in an image between two or more frames included in the video information.
  15.  請求項14に記載の画像処理装置であって、
     前記画像の変化が、
     輝度の変化、色の変化、および周波数成分の変化のうちの1以上の変化を含むことを特徴とする画像処理装置。
    The image processing apparatus according to claim 14,
    The change in the image is
    An image processing apparatus comprising one or more of luminance change, color change, and frequency component change.
  16.  請求項14に記載の画像処理装置であって、
     前記変化検出部が、
     前記映像情報に含まれる複数のフレームで捉えられている人物の顔を識別し、前記複数のフレームの間における人物の入れ替わりに応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to claim 14,
    The change detection unit is
    An image processing apparatus for identifying a face of a person captured in a plurality of frames included in the video information, and detecting a change in the scene according to a change of the person between the plurality of frames. .
  17.  請求項14に記載の画像処理装置であって、
     前記変化検出部が、
     前記映像情報に含まれる複数のフレームにおける、ユーザーの目を惹くものと予測される物体を捉えた注目領域の変化に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to claim 14,
    The change detection unit is
    An image processing apparatus that detects a change in the scene according to a change in a region of interest that captures an object that is predicted to attract a user's eyes in a plurality of frames included in the video information.
  18.  請求項14に記載の画像処理装置であって、
     前記変化検出部が、
     前記映像情報に含まれる複数のフレームにおける、ユーザーの目を惹くものと予測される物体を捉えた注目領域の出現および消失の少なくとも一方に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to claim 14,
    The change detector is
    The scene change is detected in response to at least one of appearance and disappearance of a region of interest that captures an object predicted to attract the user's eyes in a plurality of frames included in the video information. Image processing device.
  19.  請求項1から請求項18の何れか1つの請求項に記載の画像処理装置であって、
     前記変化検出部が、
     前記映像情報に含まれる複数のフレームにおける、焦点が合っている合焦領域の変更に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to any one of claims 1 to 18, wherein
    The change detector is
    An image processing apparatus for detecting a change in the scene according to a change in a focused area in focus in a plurality of frames included in the video information.
  20.  請求項1から請求項19の何れか1つの請求項に記載の画像処理装置であって、
     前記映像情報が、
     音声に係る情報を含み、
     前記変化検出部が、
     前記音声の変化に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to any one of claims 1 to 19, wherein
    The video information is
    Including information about audio,
    The change detector is
    An image processing apparatus for detecting a change in the scene according to a change in the sound.
  21.  請求項20に記載の画像処理装置であって、
     前記音声の変化が、
     音量の変化、周波数成分の変化、および声紋の変化のうちの1以上の変化を含むことを特徴とする画像処理装置。
    The image processing apparatus according to claim 20, wherein
    The change in voice is
    An image processing apparatus comprising one or more of a change in volume, a change in frequency component, and a change in voiceprint.
  22.  請求項1から請求項21の何れか1つの請求項に記載の画像処理装置であって、
     前記映像情報が、
     メタデータを含み、
     前記変化検出部が、
     前記メタデータの変化に応じて、前記場面の変化を検出することを特徴とする画像処理装置。
    The image processing apparatus according to any one of claims 1 to 21, comprising:
    The video information is
    Including metadata,
    The change detection unit is
    An image processing apparatus that detects a change in the scene according to a change in the metadata.
  23.  請求項22に記載の画像処理装置であって、
     前記メタデータが、
     字幕情報、チャプター情報、および撮影条件の情報のうちの1以上の情報を含み、
     前記メタデータの変化が、
     字幕情報の変化、チャプター情報の変化、および撮影条件の変化のうちの1以上の変化を含むことを特徴とする画像処理装置。
    The image processing apparatus according to claim 22, wherein
    The metadata is
    Including one or more of subtitle information, chapter information, and shooting condition information,
    The change in metadata is
    An image processing apparatus comprising one or more changes among a change in caption information, a change in chapter information, and a change in shooting conditions.
  24.  (a)物体の同一部分を示す画素の位置が一方向にずれている関係を有する基準画像と対応画像とを各フレームの情報において含む映像情報を取得する工程と、
     (b)前記映像情報における場面の変化を検出する工程と、
     (c)前記映像情報のうちの前記場面の変化の後における1以上のフレームの前記基準画像と前記対応画像との間における物体の同一部分を示す画素の位置のズレ量に基づいて、基準ズレ量を決定する工程と、
     (d)表示対象物の同一部分を示す画素の位置が前記一方向に前記基準ズレ量ずれている関係を有する基準領域画像と対応領域画像とに係る領域画像情報を取得する工程と、
     (e)前記映像情報のうちの前記場面の変化の後における各フレームについて、前記基準画像、前記対応画像、前記基準領域画像、および前記対応領域画像を合成することで立体視画像情報を生成する工程と、
    を備えることを特徴とする画像処理方法。
    (a) obtaining video information including a reference image having a relationship in which the positions of pixels indicating the same part of the object are displaced in one direction and a corresponding image in the information of each frame;
    (b) detecting a scene change in the video information;
    (c) Based on a shift amount of a pixel position indicating the same part of the object between the reference image and the corresponding image of one or more frames after the scene change in the video information, Determining the amount;
    (d) obtaining region image information relating to a reference region image and a corresponding region image having a relationship in which the position of pixels indicating the same part of the display object is shifted in the one direction by the reference shift amount;
    (e) generating stereoscopic image information by combining the reference image, the corresponding image, the reference area image, and the corresponding area image for each frame after the scene change in the video information Process,
    An image processing method comprising:
  25.  情報処理装置に含まれる制御部において実行されることにより、前記情報処理装置を、請求項1から請求項23の何れか1つの請求項に記載の画像処理装置として機能させるプログラム。 A program for causing the information processing apparatus to function as the image processing apparatus according to any one of claims 1 to 23 when executed in a control unit included in the information processing apparatus.
PCT/JP2011/079613 2011-01-17 2011-12-21 Image processing device, image processing method, and program WO2012098803A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011006836 2011-01-17
JP2011-006836 2011-01-17

Publications (1)

Publication Number Publication Date
WO2012098803A1 true WO2012098803A1 (en) 2012-07-26

Family

ID=46515440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/079613 WO2012098803A1 (en) 2011-01-17 2011-12-21 Image processing device, image processing method, and program

Country Status (1)

Country Link
WO (1) WO2012098803A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009239388A (en) * 2008-03-26 2009-10-15 Fujifilm Corp Method, apparatus, and program for processing stereoscopic video
WO2010092823A1 (en) * 2009-02-13 2010-08-19 パナソニック株式会社 Display control device
WO2010122775A1 (en) * 2009-04-21 2010-10-28 パナソニック株式会社 Video processing apparatus and video processing method
JP2010258723A (en) * 2009-04-24 2010-11-11 Sony Corp Image information processing device, imaging apparatus, image information processing method, and program
JP2012015771A (en) * 2010-06-30 2012-01-19 Toshiba Corp Image processing apparatus, image processing program, and image processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009239388A (en) * 2008-03-26 2009-10-15 Fujifilm Corp Method, apparatus, and program for processing stereoscopic video
WO2010092823A1 (en) * 2009-02-13 2010-08-19 パナソニック株式会社 Display control device
WO2010122775A1 (en) * 2009-04-21 2010-10-28 パナソニック株式会社 Video processing apparatus and video processing method
JP2010258723A (en) * 2009-04-24 2010-11-11 Sony Corp Image information processing device, imaging apparatus, image information processing method, and program
JP2012015771A (en) * 2010-06-30 2012-01-19 Toshiba Corp Image processing apparatus, image processing program, and image processing method

Similar Documents

Publication Publication Date Title
TWI545934B (en) Method and system for processing an input three dimensional video signal
JP5963422B2 (en) Imaging apparatus, display apparatus, computer program, and stereoscopic image display system
US9451242B2 (en) Apparatus for adjusting displayed picture, display apparatus and display method
TWI439120B (en) Display device
JP5287702B2 (en) Image processing apparatus and method, and program
JP5304714B2 (en) Pseudo stereoscopic image generation apparatus and camera
RU2015145510A (en) CRIMINAL DISPLAY DEVICE, METHOD FOR MANAGEMENT OF THE CRIMINAL DISPLAY DEVICE AND DISPLAY SYSTEM
WO2011122177A1 (en) 3d-image display device, 3d-image capturing device and 3d-image display method
JP2005167310A (en) Photographing apparatus
WO2011078065A1 (en) Device, method and program for image processing
KR101270025B1 (en) Stereo Camera Appratus and Vergence Control Method thereof
JP2010181826A (en) Three-dimensional image forming apparatus
US10404964B2 (en) Method for processing media content and technical equipment for the same
WO2011070774A1 (en) 3-d video processing device and 3-d video processing method
JP5464129B2 (en) Image processing apparatus and parallax information generating apparatus
JP5347987B2 (en) Video processing device
JP2006267767A (en) Image display device
WO2012098803A1 (en) Image processing device, image processing method, and program
WO2012093580A1 (en) Image processing device, image processing method, and program
JP4249187B2 (en) 3D image processing apparatus and program thereof
KR20170033293A (en) Stereoscopic video generation
KR101939243B1 (en) Stereoscopic depth adjustment and focus point adjustment
KR101907127B1 (en) Stereoscopic video zooming and foreground and background detection in a video
JP5601375B2 (en) Image processing apparatus, image processing method, and program
WO2012073823A1 (en) Image processing device, image processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11856580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11856580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP