WO2012098803A1

WO2012098803A1 - Image processing device, image processing method, and program

Info

Publication number: WO2012098803A1
Application number: PCT/JP2011/079613
Authority: WO
Inventors: 允宣中村; 岳彦指田
Original assignee: コニカミノルタホールディングス株式会社
Priority date: 2011-01-17
Filing date: 2011-12-21
Publication date: 2012-07-26

Abstract

An objective of the present invention is to provide a technology whereby the depth sensation which is obtained by a user who sees a 3-D motion picture is improved. To achieve the objective, video information is obtained, which includes, in information of each frame, a reference image and a correspondence image which have a relationship wherein the location of pixels which denote the same portion of an object is offset in one direction. Next, a change in the scene in the video information is detected. Thereafter, a reference offset degree is determined on the basis of the offset degree of the location of the pixels which denote the same portion of the object between the reference image and the correspondence image in one or more frames of the video information after the change of the scene. In such a circumstance, region image information is acquired which concerns a reference region image and a correspondence region image which have a relation wherein the location of pixels which denote the same portion of an object to be displayed is offset by a reference degree in one direction. Then, stereoscopic image information is generated by compositing the reference image, the correspondence image, the reference region image, and the correspondence region image for each frame of the video information after the change of scene.

Description

Image processing apparatus, image processing method, and program

The present invention relates to an image processing technique.

Recently, televisions (also referred to as 3D televisions) that use moving images capable of stereoscopic viewing (also referred to as 3D moving images or stereoscopic viewing movies) are in the spotlight. In 3D television, two images obtained by viewing the same object from different viewpoints are used to display an image that can be stereoscopically viewed (also referred to as a 3D image or a stereoscopic image).

In the 3D image technology, the position of the pixel indicating the same part of the object is shifted between the image for the left eye and the image for the right eye, and the depth adjustment of the image is performed using the focus adjustment function of the human eye. A feeling is given to the user. Note that the shift amount of the pixel position indicating the same part of the object between the image for the left eye and the image for the right eye is also referred to as “parallax”.

Such 3D image technology has been adopted in various video fields. For example, an endoscope apparatus has been proposed that enables stereoscopic viewing of an image over a wide field of view by adjusting parallax detected from a stereo image to fall within the fusion range of human eyes (for example, Patent Document 1). In addition, there has been proposed a stereoscopic video processing apparatus that displays a stereoscopic reference image when a stereoscopic video is displayed and the sense of depth is adjusted (for example, Patent Document 2).

By the way, if the parallax is small to some extent, it may be difficult for the user to obtain a sense of depth depending on the size of the screen. That is, even if the object is the same, it may appear different to the user between when it is actually viewed and when it is viewed on the 3D image.

In such 3D image technology, a technique has been proposed in which a frame image is displayed around a 3D image in order to change the screen, add interest, or facilitate stereoscopic viewing. (For example, Patent Document 3). With this technique, it is possible to select a frame image to be used from a plurality of frame images.

JP-A-8-313825 Japanese Patent Laid-Open No. 11-155155 International Publication No. 2003/093023

However, even with the technique of the above-mentioned Patent Document 3, the user may not be able to obtain a sufficient depth feeling in the 3D video.

The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for improving a sense of depth obtained by a user watching a 3D moving image.

In order to solve the above-described problem, the image processing apparatus according to the first aspect includes a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame. A first acquisition unit that acquires video information including; a change detection unit that detects a change in a scene in the video information; and the reference image of one or more frames after the scene change in the video information; A determination unit that determines a reference shift amount based on a shift amount of a pixel position that indicates the same portion of the object between the corresponding image and a position of the pixel that indicates the same portion of the display target is the reference in the one direction. A second acquisition unit that acquires region image information related to a reference region image and a corresponding region image having a relationship of deviation, and each frame after the change of the scene in the video information, Comprising a reference image, the corresponding image, the reference region image, and said corresponding area image to generate a stereoscopic image information by synthesizing the composite section.

An image processing device according to a second aspect is the image processing device according to the first aspect, wherein the stereoscopic image information includes the reference image, the corresponding image, the reference region image, and the corresponding region on one screen. Information in a first format that can be displayed at the same time in a manner in which images are superimposed, and one or more of the reference image, the corresponding image, the reference area image, and the corresponding area image on one screen It includes information of at least one of the information of the second format that makes it possible to display the image and one or more remaining images in time sequence.

An image processing apparatus according to a third aspect is the image processing apparatus according to the first or second aspect, wherein the determination unit detects a first change of a scene by the change detection unit of the video information. Based on a shift amount of a pixel position indicating the same portion of the object between the reference image and the corresponding image of one or more frames in one scene until the second change in the scene is detected A shift amount is determined, and the combining unit combines the reference image, the corresponding image, the reference region image, and the corresponding region image for all the frames in the one scene of the video information to generate a stereoscopic view. Generate image information.

An image processing device according to a fourth aspect is the image processing device according to the third aspect, wherein the determination unit includes the reference image and the corresponding image of the first frame in the one scene of the video information. The reference shift amount is determined on the basis of the shift amount of the pixel position indicating the same part of the object between and.

An image processing device according to a fifth aspect is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same part of the object from the image.

An image processing apparatus according to a sixth aspect is the image processing apparatus according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: The reference shift amount is determined based on a shift amount of a pixel position indicating the same portion of the object between the reference attention region and the corresponding attention region of the first frame of the scene.

An image processing device according to a seventh aspect is the image processing device according to the third aspect, wherein the reference image and the corresponding image of each frame of the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the scene.

An image processing device according to an eighth aspect is the image processing device according to the third aspect, wherein the determination unit and the correspondence with the reference image included in the frame group of the scene in the video information A representative value of a virtual distance from a virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object from the image, and the representative value of the virtual distance is a predetermined value In this case, the amount of deviation corresponding to the virtual reference plane is determined as the reference amount of deviation.

An image processing apparatus according to a ninth aspect is the image processing apparatus according to the third aspect, wherein the determination unit and the correspondence with the reference image of one or more frames in the one scene of the video information The distribution of the first shift amount of the pixel position indicating the same part of the object between the image and the reference image and the corresponding image of one or more frames in the previous scene before the first change in the video information The reference shift amount is determined based on the distribution of the second shift amount of the pixel position indicating the same part of the object between and.

An image processing apparatus according to a tenth aspect is the image processing apparatus according to the ninth aspect, wherein the determining unit includes a first deviation representative value relating to the distribution of the first deviation amount, and the second deviation amount. The reference deviation amount is determined on the basis of the second deviation representative value relating to the distribution.

An image processing device according to an eleventh aspect is the image processing device according to the ninth aspect, wherein the determining unit and the correspondence are included in the reference image included in the frame group of the previous scene in the video information. A first representative value of a virtual distance from the first virtual reference plane to the surface of the object is calculated based on a distribution of a shift amount of a pixel position indicating the same part of the object with respect to the image, and the video information From the second virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image included in the frame group of the scene and the corresponding image A second representative value of the virtual distance is calculated, and a deviation amount corresponding to the second virtual reference plane when the difference between the first representative value and the second representative value is within a predetermined allowable range is Determined as the standard deviation amount.

An image processing apparatus according to a twelfth aspect is the image processing apparatus according to the ninth aspect, wherein the reference image and the corresponding image of each frame in the video information are determined according to a preset detection rule. A region detection unit that detects a set of a reference region of interest and a corresponding region of interest that captures the same object that is predicted to attract the user's eyes from the set, and the determination unit includes: From the first virtual reference plane to the surface of the object based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the previous scene A first representative value of the virtual distance of the image, and a pixel indicating the same portion of the object between the reference attention area and the corresponding attention area included in the frame group of the one scene of the video information A second representative value of a virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the displacement amount of the position, and a difference between the first representative value and the second representative value is a predetermined tolerance A deviation amount corresponding to the second virtual reference plane when it falls within the range is determined as the reference deviation amount.

An image processing device according to a thirteenth aspect is the image processing device according to the first or second aspect, wherein the determining unit is between each frame in the vicinity of the scene change in the video information. The reference shift amount is determined for each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount, and the second acquisition unit is configured to determine each of the video information in the vicinity of the change in the scene. A reference region image having a relationship in which a position of a pixel indicating the same portion of the display object is shifted in the one direction with respect to the frame in the reference shift amount determined for each frame by the determination unit; Region image information related to the corresponding region image is acquired, and the combining unit, for each frame in the vicinity of the scene change in the video information, the reference image related to each frame, the pair Image, the reference region image, and the corresponding area image synthesizing.

An image processing device according to a fourteenth aspect is the image processing device according to any one of the first to thirteenth aspects, wherein the change detection unit is between two or more frames included in the video information. The change of the scene is detected according to the change of the image.

An image processing device according to a fifteenth aspect is the image processing device according to the fourteenth aspect, wherein the change in the image is one or more of a change in luminance, a change in color, and a change in frequency component. including.

An image processing device according to a sixteenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit identifies a human face captured by a plurality of frames included in the video information. The change of the scene is detected according to the change of the person between the plurality of frames.

An image processing device according to a seventeenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information. The change of the scene is detected according to the change of the attention area where the object is captured.

An image processing device according to an eighteenth aspect is the image processing device according to the fourteenth aspect, wherein the change detection unit is predicted to attract a user's eyes in a plurality of frames included in the video information. The change in the scene is detected in response to at least one of the appearance and disappearance of the attention area capturing the target object.

An image processing device according to a nineteenth aspect is the image processing device according to any one of the first to eighteenth aspects, wherein the change detection unit is focused on a plurality of frames included in the video information. The change of the scene is detected according to the change of the in-focus area.

An image processing device according to a twentieth aspect is the image processing device according to any one of the first to nineteenth aspects, wherein the video information includes information related to sound, and the change detection unit includes the change detection unit, The change of the scene is detected according to the change of the voice.

An image processing device according to a twenty-first aspect is the image processing device according to the twentieth aspect, wherein the change in the sound is one or more of a change in volume, a change in frequency component, and a change in voiceprint. including.

An image processing apparatus according to a twenty-second aspect is the image processing apparatus according to any one of the first to twenty-first aspects, wherein the video information includes metadata, and the change detection unit includes the metadata. The change of the scene is detected according to the change of.

The image processing device according to a twenty-third aspect is the image processing device according to the twenty-second aspect, wherein the metadata includes one or more information of caption information, chapter information, and shooting condition information, The change in the metadata includes one or more of a change in caption information, a change in chapter information, and a change in shooting conditions.

The image processing method according to the twenty-fourth aspect obtains video information including (a) a reference image having a relationship in which the positions of pixels indicating the same portion of an object are shifted in one direction and a corresponding image in the information of each frame. (B) detecting a scene change in the video information; and (c) one or more frames of the reference image and the corresponding image after the scene change in the video information. A step of determining a reference shift amount based on a shift amount of a pixel position indicating the same part of the object in between, and (d) a position of the pixel indicating the same part of the display object is the reference shift amount in the one direction. Step of obtaining region image information related to the reference region image and the corresponding region image having a shifted relationship, and (e) for each frame after the change of the scene of the video information, the reference image, the Corresponding image, the standard Pass image, and and a step of generating a stereoscopic image information by synthesizing the corresponding region image.

The program according to the twenty-fifth aspect is executed by a control unit included in the information processing apparatus, thereby causing the information processing apparatus to function as an image processing apparatus according to any one of the first to twenty-third aspects.

The image processing apparatus according to any one of the first to twenty-third aspects makes it easy to recognize the difference in parallax in the 3D moving image, so that the sense of depth that can be obtained by the user watching the 3D moving image can be improved.

According to the image processing apparatus according to the second aspect, even if various display aspects are adopted, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.

According to the image processing device according to the third aspect, since a common image having a reference parallax is added to one scene of a 3D moving image, an excessive change in the image is suppressed, and the burden on the user's eyes is reduced. Can be reduced, and the amount of computation can be reduced.

Also by the image processing apparatus according to any of the fourth and sixth aspects, a process of adding a common image having a reference parallax to a scene of a 3D moving image can be performed in real time.

The image processing apparatus according to any of the fifth and seventh aspects can reduce the user's uncomfortable feeling in one scene of the 3D moving image.

The image processing apparatus according to any of the eighth to thirteenth aspects can reduce the user's uncomfortable feeling when the scene changes in the 3D moving image.

Also by the image processing apparatus according to any of the fourteenth to nineteenth aspects, an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image in accordance with a change in the image.

Even with the image processing apparatus according to any of the twentieth and twenty-first aspects, an image having a parallax serving as a reference suitable for a scene can be synthesized into a 3D moving image according to a change in sound.

Also with the image processing device according to any of the twenty-second and twenty-third aspects, an image having a parallax that is a reference suitable for a scene can be combined with a 3D moving image in accordance with changes in metadata.

According to any of the image processing method according to the twenty-fourth aspect and the program according to the twenty-fifth aspect, an effect similar to that of the image processing apparatus according to the first aspect is obtained.

FIG. 1 is a diagram illustrating an example of a left eye image and a right eye image included in a 3D moving image. FIG. 2 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added. FIG. 3 is a diagram illustrating an example of a left-eye image and a right-eye image included in a 3D moving image. FIG. 4 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. FIG. 5 is a diagram illustrating a schematic configuration of an information processing system according to an embodiment. FIG. 6 is a block diagram illustrating a functional configuration according to the image processing apparatus. FIG. 7 is a diagram for explaining a first change detection method for detecting a change in a scene. FIG. 8 is a diagram for explaining a first change detection method for detecting a change in a scene. FIG. 9 is a diagram for explaining a second change detection method for detecting a change in a scene. FIG. 10 is a diagram for explaining a second change detection method for detecting a change in a scene. FIG. 11 is a diagram for explaining a third change detection method for detecting scene changes. FIG. 12 is a diagram for explaining a third change detection method for detecting a change in a scene. FIG. 13 is a diagram for explaining a fourth change detection method for detecting scene changes. FIG. 14 is a diagram for explaining a fourth change detection method for detecting a change in a scene. FIG. 15 is a diagram for explaining a fifth change detection method for detecting a change in a scene. FIG. 16 is a diagram for explaining a fifth change detection method for detecting a change in a scene. FIG. 17 is a diagram for explaining a sixth change detection method for detecting a change in a scene. FIG. 18 is a diagram for explaining a sixth change detection method for detecting scene changes. FIG. 19 is a diagram for explaining a first determination method of the reference deviation amount. FIG. 20 is a diagram for explaining a first determination method of the reference deviation amount. FIG. 21 is a diagram for explaining a third determination method of the reference deviation amount. FIG. 22 is a diagram for explaining a third determination method of the reference deviation amount. FIG. 23 is a diagram for explaining a fourth determination method of the reference deviation amount. FIG. 24 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 25 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 26 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 27 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 28 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 29 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 30 is a diagram illustrating an image for the Nth left eye to which the Nth reference region image is added. FIG. 31 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. FIG. 32 is a flowchart showing the operation of the image processing apparatus. FIG. 33 is a diagram illustrating a 3D moving image to which a reference area image and a corresponding area image are added. FIG. 34 is a diagram illustrating a 3D image to which a reference area image and a corresponding area image are added. FIG. 35 is a diagram for explaining a method of determining a reference deviation amount according to a modification.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, parts having similar configurations and functions are denoted by the same reference numerals, and redundant description is omitted. Each drawing is schematically shown, and for example, the size and positional relationship of the display object on the image in each drawing are not accurately shown. In addition, the image data and the images displayed based on the image data are collectively referred to as “images” as appropriate. Further, in each drawing, the upper left pixel of the image is the origin, the direction along the long side of the image (here horizontal direction) is the X axis direction, and the direction along the short side of the image (here vertical direction) ) Is the Y-axis direction. The right direction of each image is the + X direction, and the downward direction of each image is the + Y direction. 1 to 4, FIG. 9 to FIG. 18, FIG. 21, FIG. 24 to FIG. 29, FIG. 31, FIG. 33, and FIG.

<(1) One Embodiment>
<(1-1) Outline of Processing According to One Embodiment>
Since a stereoscopically viewable video (also referred to as a 3D video) is displayed, for example, the position of a pixel indicating the same portion of the display object is unidirectional with respect to the Nth (N is an arbitrary natural number) frame. A left-eye N-th image (also referred to as an N-th left-eye image) GN _L and a right-eye N-th image (also referred to as an N-th right-eye image) GN _R are prepared. The

Here, the one direction is a direction that can match the separation direction of the human left eye and right eye, and is set, for example, in the horizontal direction of the image. The Nth left-eye image GN _L and the Nth right-eye image GN _R can be acquired, for example, by photographing using a stereo camera. The stereo camera has two cameras corresponding to a human left eye and a right eye.

1 is, n-th n-th left-eye image Gn _L according to the frame of the (N = n) and the n-th right-eye image Gn _R is shown. Specifically, the n-th left eye image Gn _{L includes} regions (also referred to as object regions) O1 _L , O2 _L , and O3 _L that indicate three objects, and the n-th right eye image Gn _R includes three object regions. A state in which O1 _R , O2 _R , and O3 _R are included is shown. The object areas O1 _L and O1 _R indicate the same person, the object areas O2 _L and O2 _R indicate the same object (here, pylon), and the object areas O3 _L and O3 _R indicate the other same object (here, the ball). . Here, the object areas O1 _R , O2 _R , and O3 _R occupying the n-th right eye image Gn _R with respect to the positions of the object areas O1 _L , O2 _L , and O3 _L occupying the n-th left-eye image Gn _L are shown. The position is shifted to the left. In FIG. 1, in the n-th right eye image Gn _R , positions corresponding to the outer edges of the object regions O1 _L , O2 _L , and O3 _L occupying the n-th left eye image Gn _L so as to facilitate comparison. Are marked with a thin broken line.

Incidentally, for example, when the amount of deviation between the position of the n-th object region O1 occupy the position and the n right-eye image Gn _R object region O1 _L occupying the left-eye image Gn _L _R (also referred to as parallax) it is small to some extent Can be considered. Similarly, the amount of deviation between the position of the n left-eye image Gn object area occupying _L O2 _L position and the object region O2 _R occupying the n-th right-eye image Gn _R (parallax) can be considered if somewhat less . Furthermore, the amount of deviation between the position of the n left-eye image Gn object area occupying _L O3 _L position and the object region O3 _R occupying the n-th right-eye image Gn _R (parallax) can be considered if somewhat small.

In these cases, depending on various display conditions including the size of the screen on which the 3D image is displayed, the object indicated by the object regions O1 _L , O2 _L , O3 _L , O1 _R , O2 _R , O3 _R Users may have difficulty getting a sense of depth. In other words, even when the object is the same, it is possible to give the user a sense of depth different from that of the actual object when the object is viewed on the 3D image as compared with the case where the object is actually viewed.

With respect to such a problem, in the present embodiment, as shown in FIG. 2, a reference parallax is applied to the Nth left eye image GN _L and the Nth right eye image GN _{R of} each frame. An image (also referred to as a reference parallax image) related to the display object is added. Thereby, the feeling of depth obtained by the user who is viewing the 3D image can be improved by comparison with the reference parallax image.

For example, the reference parallax image to be added to the n-th frame (also referred to as n-th frame) includes a first n corresponding region image an In _R for the n-th reference region image an In _L and the right eye for the left eye. The n-th reference area image In _L and the n-th corresponding area image In _R are generated according to the reference shift amount, and have a relationship in which the positions of pixels indicating the same portion of the display object are shifted in one direction. The reference deviation amount can be determined based on, for example, the Nth left-eye image GN _L and the Nth right-eye image GN _R in one or more frames of the scene to which the nth frame belongs.

On the left side of FIG. 2 is a position around an image area (also referred to as an n-th left-eye image area) TAn _L corresponding to the n-th left-eye image Gn _L (an n-th reference combined position Pn _L described later). An example of a combined first n reference region image an in _L n-th left-eye synthesized image GSn _L synthesized is shown. Further, on the right side of FIG. 2, positions around an image region (also referred to as an n-th right eye image region) TAn _R corresponding to the n-th right eye image Gn _R (an n-th corresponding combined position Pn _R described later). An example of the composite image GSn _R for the n-th right eye in which the n-th corresponding region image In _R is combined in accordance with () is shown.

Then, for example, aspect and the n left-eye synthesized image GSn _L and the n right-eye synthesized image GSn _R is displayed as an image combined or the n left-eye synthesized image GSn _L and the n right, The display of the 3D image can be realized by a mode in which the composite image for eye GSn _R is sequentially displayed in a short time. Incidentally, for example, n-th left-eye synthesized image GSn _L is displayed as an image of the first field, the n-th right-eye synthesized image GSn _R interlace display can be employed as the image of the second field.

In the present embodiment, the reference shift amount is changed according to the scene change in the 3D video. That is, the reference parallax related to the reference parallax image added to the 3D moving image can be changed. For this reason, the difference in parallax in the 3D moving image is easily recognized, and the sense of depth that can be obtained by the user watching the 3D moving image can be improved.

For example, FIG. 3 shows the (n + m) th left-eye image G (n + m) _L and (n + m) th included in the (n + m) th frame after the scene changes from the nth frame shown in FIG. ) An example of the right eye image G (n + m) _R is shown. The (n + m) left-eye image G (n + m) _L includes an object region O4 _L related to one person, and the (n + m) right-eye image G (n + m) _R relates to the one person. A state including the object region O4 _R is shown.

On the left side of FIG. 4, the (n + m) th reference composite position P ((n + m) left eye image area TA (n + m) _L corresponding to the (n + m) left eye image G (n + m) _L. n + m) an example of the (n + m) the reference area image I (n + m) second _L were synthesized (n + m) for the left eye synthesized image GS (n + m) _L in accordance with the _L is shown. Further, on the right side of FIG. 4, the (n + m) th corresponding composite position around the (n + m) th right eye image area TA (n + m) _R corresponding to the (n + m) th right eye image G (n + m) _R. P (n + m) an example of in accordance with the _R first (n + m) corresponding area image I (n + m) the _R was synthesized (n + m) the right-eye synthesized image GS (n + m) _R is shown.

Note that the (n + m) th reference area image I (n + m) _L and the (n + m) th corresponding area image I (n + m) _R can be generated based on the reference shift amount changed due to the change of the scene. That is, the position of the pixel in which the (n + m) th reference area image I (n + m) _L and the (n + m) th corresponding area image I (n + m) _R indicate the same part of the display object according to the changed reference deviation amount. Have a relationship shifted in one direction.

<(1-2) Schematic configuration of information processing system>
FIG. 5 is a diagram illustrating a schematic configuration of the information processing system 1 according to the embodiment.

The information processing system 1 includes a stereo camera 2, an information processing device 4, and a line-of-sight detection sensor 5. The information processing device 4 is connected to the stereo camera 2 and the line-of-sight detection sensor 5 so as to be able to transmit and receive data.

The stereo camera 2 has a camera 21 and a camera 22. Each of the

cameras

21 and 22 has a function of a digital camera having an image sensor such as a CCD. Each of the

cameras

21 and 22 performs a photographing operation of receiving light from the subject and acquiring information indicating the distribution regarding the luminance of the subject as image data by photoelectric conversion. For example, the optical axis of the camera 21 and the optical axis of the camera 22 are separated by a predetermined distance in the horizontal direction. The predetermined distance is also referred to as a base line length, and is set to a distance equivalent to the distance between the center of the average human left eye and the center of the right eye, for example.

In the stereo camera 2, a stereo image is acquired by performing a photographing operation with the camera 21 and the camera 22 at substantially the same timing. The stereo image includes a set of an image for the left eye and an image for the right eye, and can be displayed so as to enable stereoscopic viewing by the user. The left-eye image and right-eye image have A pixels in the horizontal direction (A is a natural number, for example, A = 1280) and B pixels in the vertical direction (B is a natural number, for example, B = 960). Each having an array of pixels in a matrix shape.

In the stereo camera 2, for example, Ns sets (Ns is an integer of 2 or more) of stereo images are obtained by performing a plurality of times of continuous shooting with the camera 21 and the camera 22 at a predetermined timing. Can be acquired. The Ns sets of stereo images correspond to images of Ns frames included in the 3D moving image.

The line-of-sight detection sensor 5 detects a portion of the screen of the display unit 42 included in the information processing device 4 that is noticed by the user (also referred to as a portion of interest). The display unit 42 and the line-of-sight detection sensor 5 are fixed to each other with a predetermined arrangement relationship. In the line-of-sight detection sensor 5, for example, first, an image of the user is obtained by photographing, and then the direction of the line of sight of the user is detected by analyzing the image, and then the user on the screen of the display unit 42 The part of interest is detected. The analysis of the image can be realized, for example, by detecting the orientation of the face using pattern matching, and identifying the white-eye portion and the black-eye portion in both eyes using a color difference.

Note that information relating to one or more stereo images obtained by the stereo camera 2 can be transmitted to the information processing device 4 via the communication line 3a. Information relating to shooting conditions in the stereo camera 2 can also be transmitted to the information processing apparatus 4 via the communication line 3a. In addition, information related to the target portion obtained by the line-of-sight detection sensor 5 can be transmitted to the information processing apparatus 4 via the communication line 3b. The

communication lines

3a and 3b may be wired lines or wireless lines.

The information processing apparatus 4 has a function of, for example, a personal computer (personal computer). The information processing apparatus 4 includes an operation unit 41, a display unit 42, an interface (I / F) unit 43, a storage unit 44, an input / output unit 45, and a control unit 46.

The operation unit 41 includes, for example, a mouse and a keyboard. The display unit 42 includes, for example, a liquid crystal display. The I / F unit 43 receives information from the stereo camera 2 and the line-of-sight detection sensor 5. The storage unit 44 includes, for example, a hard disk and stores each image obtained by the stereo camera 2. Further, the storage unit 44 stores a program PG1 and the like for realizing various operations in the information processing apparatus 4. The input / output unit 45 includes, for example, a disk drive, can receive the storage medium 6 such as an optical disk, and can exchange data with the control unit 46.

The control unit 46 includes a CPU 46 a that functions as a processor and a memory 46 b that can temporarily store information, and controls each unit of the information processing apparatus 4. In the control unit 46, various functions, various information processing, and the like are realized by reading and executing the program PG1 in the storage unit 44. Data temporarily generated in this information processing is appropriately stored in the memory 46b. And by control by the control part 46, the information processing apparatus 4 functions as an image processing apparatus which produces | generates a 3D moving image, and also functions as an image display system which displays a 3D moving image on the display part 42. FIG. Note that the control unit 46 can store the program stored in the storage medium 6 in the storage unit 44 or the like via the input / output unit 45.

<(1-3) Functional Configuration of Image Processing Device>
FIG. 6 is a block diagram illustrating a functional configuration of the image processing apparatus realized by the control unit 46. The functional configuration of the image processing apparatus includes a video acquisition unit 461 as a first acquisition unit, a region of interest detection unit 462, a change detection unit 463, a shift amount determination unit 464, and a region image acquisition unit as a second acquisition unit. 465, a signal receiving unit 466, a mode setting unit 467, a position specifying unit 468, and an image composition unit 469 are included.

The video acquisition unit 461 acquires video information including information related to the 3D moving image. The 3D moving image includes Ns sets (Ns is an integer of 2 or more) of stereo images, that is, image information of Ns frames. The stereo image of the Nth frame (N is an arbitrary natural number) of the Ns frames has a relationship in which the positions of pixels indicating the same portion of the object are shifted in one direction (here, the horizontal direction). It includes an eye image GN _L and an Nth right eye image GN _R. In addition, the video information may include information related to audio matched to the 3D video and metadata related to the 3D video. The metadata may include one or more information of caption information, chapter information, and shooting condition information.

In the present embodiment, the N-th left-eye image GN _L is used as a reference image (also referred to as a reference image), and the N-th right-eye image GN _R is an image corresponding to the reference image (also referred to as a corresponding image). It is said. However, the Nth right eye image GN _R may be a reference image, and the Nth left eye image GN _L may be a corresponding image.

The attention area detection unit 462 captures the same object that is predicted to attract the user's eyes from the set of the Nth left eye image GN _L and the Nth right eye image GN _R according to a preset detection rule. A set of attention areas is detected. An attention area set includes an attention area (also referred to as a reference attention area) detected from the Nth left eye image GN _L and an attention area (also referred to as a corresponding attention area) detected from the Nth right eye image GN _R. And a pair.

The change detector 463 detects a scene change in the video information. In the change detection unit 463, for example, a change in the scene can be detected based on one or more information of 3D moving image information, audio information, and metadata included in the video information. Changes in the scene can include changes in location and time, replacement of main display objects, changes in the state of display objects, and the like. The change in the state of the display object can include movement of the display object, change of the focused display object, and the like.

The deviation amount determination unit 464 determines a reference deviation amount for each scene by following a predetermined determination rule. In the deviation amount determination unit 464, for example, in response to detection of a scene change by the change detection unit 463, the reference deviation amount can be determined. At this time, the reference shift amount can be determined based on the N-th left-eye image GN _L and the N-th right-eye image GN _{R of} one or more frames belonging to one scene after the scene change. Specifically, the reference shift amount is determined based on the shift amount (parallax) of the pixel position indicating the same part of the object between the N-th left eye image GN _L and the N-th right eye image GN _R. Can be done. The deviation amount determination unit 464 can reduce the calculation amount by using various calculation results in the change detection unit 463.

The area image acquisition unit 465 acquires information (also referred to as Nth area image information) related to the Nth reference area image IN _L and the Nth corresponding area image IN _R. The N-th reference area image IN _L and the N-th corresponding area image IN _R are the reference in which the position of the pixel indicating the same portion of the display object is determined in one direction (here, the horizontal direction) by the shift amount determination unit 464. There is a relationship of deviation. In the area image acquisition unit 465, for example, for each scene, the Nth area image information related to the Nth reference area image IN _L and the Nth corresponding area image IN _R can be acquired based on the reference deviation amount.

The signal receiving unit 466 receives a signal input to the control unit 46 according to the operation of the operation unit 41 by the user.

The mode setting unit 467 sets the position specifying unit 468 to any one of a plurality of modes including the first mode and the second mode. For example, the mode setting unit 467 can set the mode of the position specifying unit 468 in accordance with the signal received by the signal receiving unit 466. Further, for example, in accordance with a signal received by the signal receiving unit 466, the attention area detection unit 462 detects a attention area (also referred to as a detection permission mode) or does not detect a attention area (also referred to as a detection prohibit mode). ). The mode setting unit 467 may set the mode of the position specifying unit 468 based on the Nth left eye image GN _L and the Nth right eye image GN _R acquired by the video acquisition unit 461. The mode of the attention area detection unit 462 may be set.

Position specifying unit 468, the N reference region image IN _L is (also referred to as N-th reference the synthetic position) by the position synthesized and PN _L, position where the N corresponding region image IN _R is synthesized (the N corresponding object synthesizing also referred to) to specify and PN _R, the a position. Further, for example, a first mode position specifying unit 468, a mode for specifying the first N reference the combined position PN _L and the N corresponding the combined position PN _R so as to surround the image, the position specifying unit 468 2 mode, when a mode for designating a first N reference the combined position PN _L and the N corresponding object synthesizing position PN _R in accordance with the region of interest can be considered.

The N-th reference composite position PN _L may be specified in a region around the N-th left-eye image GN _L , or the periphery of another image different from the N-th left-eye image GN _L. It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered. Further, the first N corresponds the combined position PN _R, may be specified in the region of the periphery of the N right-eye image GN _R, of the periphery of different separate image and the N right-eye image GN _R It may be specified in an area corresponding to the area. As this another image, for example, another field image in an interlaced moving image can be considered.

The image composition unit 469 synthesizes the N-th left eye image GN _L , the N-th right eye image GN _R , the N-th reference region image IN _L , and the N-th corresponding region image IN _R to create a three-dimensional image. Information of a viewable image (also referred to as stereoscopic image information) is generated. For example, for each scene that can be segmented by a change in scene, for each frame, an Nth left eye image GN _L , an Nth right eye image GN _R , an Nth reference region image IN _L, and an Nth The corresponding area image IN _R can be synthesized. Here, the N reference region image IN _L is aligned with the first N reference the synthetic position PN _L specified by the position specifying unit 468, the first N corresponding region image IN _R, is designated by position designator 468 the N corresponds may be aligned with the object to be combined position PN _R was.

The stereoscopic image information can be visually output on the display unit 42 under the control of the control unit 46. For example, in the display unit 42, based on the stereoscopic image information, the N-th left eye image GN _L , the N-th right eye image GN _R , the N-th reference region image IN _L , and the N-th corresponding region image IN. _R can be superimposed and displayed at the same time. Further, for example, in the display unit 42, one or more of the N-th left-eye image GN _L , the N-th right-eye image GN _R , the N-th reference region image IN _L , and the N-th corresponding region image IN _R. These images and the remaining one or more images may be displayed in time sequence.

At this time, the sense of distance related to the object given to the user by the N-th left-eye image area TAN _L and the N-th right-eye image area TAN _R is the N-th reference area image IN _L and the N-th corresponding area image IN _R. Can be enhanced by comparison with the sense of distance related to the display object given to the user. Further, since the difference in parallax in the 3D video can be easily recognized by changing the reference shift amount according to the change in the scene in the 3D video, the sense of depth that can be obtained by the user watching the 3D video can be improved.

<(1-3-1) Attention Area Detection Method>
As a detection method of the reference attention area and the corresponding attention area in the attention area detection unit 462, for example, one or more area detection methods among the following first to sixth area detection methods may be employed.

<(1-3-1-1) First Area Detection Method>
An image area located in the vicinity of the center of the N-th left-eye image GN _L and the N-th right-eye image GN _R is detected as the attention area. For example, an image region in which the contour of an object is extracted from the N-th left-eye image GN _L by image processing such as Hough transform, and an object located near the center of the N-th left-eye image GN _L is used as a reference attention. It can be detected as a region. Then, an image area corresponding to the reference attention area in the N-th right eye image GN _R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as a phase-only correlation (POC) method.

<(1-3-1-2) Second area detection method>
The N-th left-eye image GN _L and the N-th right-eye image GN _R are targeted, and an image region indicating a specific type of object such as a person is detected as a region of interest by template matching or the like. For example, the reference region of interest can be detected by template matching or the like targeting the Nth left eye image GN _L. Then, an image area corresponding to the reference attention area in the N-th right eye image GN _R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.

<(1-3-1-3) Third Area Detection Method>
A region of interest is detected by analyzing a motion vector for a plurality of stereo images included in the 3D moving image. For example, a region indicating an object having a motion exceeding a certain threshold in a predetermined number of frames can be detected as the reference attention region and the corresponding attention region. Specifically, the reference region of interest can be detected by analyzing motion vectors for a plurality of Nth left-eye images GN _L. Then, an image area corresponding to the reference attention area in the N-th right eye image GN _R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.

<(1-3-1-4) Fourth Area Detection Method>
An area in which at least one of a specific color and a specific texture different from the surroundings is targeted for the N-th left-eye image GN _L and the N-th right-eye image GN _R is the reference attention area and the corresponding attention. Detected as a region. For example, an area in which a specific color such as a skin color that is a characteristic color of human being is targeted for the N-th left-eye image GN _L can be detected as the reference attention area. In addition, an area in which a specific texture such as a color arrangement of a part (eyes, head hair, eyebrows, mouth, etc.) included in the human head is detected as a reference is targeted for the N-th left-eye image GN _L. It may be detected as a region. Then, an image area corresponding to the reference attention area in the N-th right eye image GN _R can be detected as the corresponding attention area. Here, for example, a corresponding attention area corresponding to the reference attention area can be detected by searching for corresponding points such as the POC method.

A case where a plurality of subjects of the same type (for example, humans) are partially overlapped in one image can be considered. In this case, for example, information indicating the distance from the viewpoint to the subject that can be obtained from the N-th left eye image GN _L and the N-th right eye image GN _R can be used to distinguish a plurality of subjects. The information indicating the distance can be obtained from the parallax between the N-th left-eye image GN _L and the N-th right-eye image GN _R and the baseline length of the stereo camera 2 using the principle of triangulation. The parallax can be obtained by searching for corresponding points such as the POC method.

<(1-3-1-5) Fifth Region Detection Method>
Based on the information relating to the attention portion obtained by the line-of-sight detection sensor 5, the reference attention region and the corresponding attention region are detected from the N-th left eye image GN _L and the N-th right eye image GN _R. For example, the N-th left-eye image GN _L and the N-th right-eye image GN _R are targeted, and one or more sets of image regions indicating a specific type of object such as a person can be detected in advance by template matching or the like. Then, an area corresponding to the target portion of the one or more sets of image areas can be detected as the reference target area and the corresponding target area.

<(1-3-1-6) Sixth Region Detection Method>
The N-th left-eye image GN _L and the N-th right-eye image GN _R are targeted, and the contour of each object is extracted by image processing such as Hough transform. And the parallax between the N-th left eye image GN _L and the N-th right eye image GN _{R in the} region related to each object surrounded by the outline does not change abruptly regardless of the passage of time. The areas are detected as the reference attention area and the corresponding attention area where the person is captured. Note that the parallax between the N-th left-eye image GN _L and the N-th right-eye image GN _R can be obtained, for example, by searching for corresponding points using the POC method or the like.

For example, when a marathon runner is video-recorded from the front or rear by a camera, if the camera follows the runner, the distance between the runner and the camera (viewpoint) will not change abruptly over time There is. In this case, the parallax between the N-th left-eye image GN _L and the N-th right-eye image GN _{R in the} region related to each subject surrounded by the contour is in a predetermined number of frames corresponding to a predetermined time. There may be an area that changes only within a predetermined range (also referred to as a parallax slightly changing area). Therefore, for example, in each parallax slightly changing region, an area where the parallax between the N-th left eye image GN _L and the N-th right eye image GN _R is equal to or greater than a predetermined value is a reference attention in which a person is captured. A region that is detected as a region and a corresponding attention region and whose parallax is less than a predetermined value can be detected as a region where a distant view is captured.

<(1-3-2) Scene change detection method>
Detection of a scene change in the change detection unit 463 may be detected based on one or more information of 3D moving image information, audio information, and metadata that may be included in the video information.

<(1-3-2-1) Detection based on 3D video information>
The change detection unit 463 can detect a change in the scene according to an image change between two or more frames included in the video information. Thereby, according to the change of an image, the image which has the parallax used as the reference suitable for a scene can be synthesize | combined with a 3D moving image.

For example, a scene change can be detected by comparing an n + m-th frame (m is a natural number) with an n-th frame before m frames. At this time, the n-th left-eye image Gn _L the (n + m) the left-eye image G (n + m) comparison of _{L, the} n-th right-eye image Gn _R and the (n + m) the right eye image G (n + m) Any comparison of the comparison with _R and both comparisons may be made. m may be 1 or 2 or more.

For example, one or more change detection methods among the following first to sixth change detection methods can be employed as the change detection method of the scene based on the information of the 3D moving image in the change detection unit 463.

<(1-3-2-1-1) First Change Detection Method>
A scene change may be detected in response to one or more of a luminance change, a color change, and a frequency component change between two or more frames included in the video information.

For example, if the value obtained by integrating the difference of the luminance signal in each pixel between the nth frame and the n + mth frame (also referred to as the luminance difference integrated value) exceeds a predetermined threshold value, it is assumed that the scene has changed. Changes in the scene can be detected. In addition, a change in scene may be detected in accordance with a change in distribution related to an integrated value (also referred to as an integrated luminance value) of luminance values of vertical lines in an image. In this case, for example, if the sum of the differences in luminance integrated values of the vertical lines exceeds a predetermined threshold between the nth frame and the (n + m) frame, the scene changes as if the scene has changed. Can be detected.

FIG. 7 schematically shows an example of the distribution of integrated luminance values of the n-th left-eye image Gn _L shown in FIG. 1, and FIG. 8 shows the (n + m) th left-hand side shown in FIG. An example of the distribution of integrated luminance values of the eye image G (n + m) _L is schematically shown. Thus, when a large change occurs in the distribution of the integrated luminance value, the difference in the distribution of the integrated luminance value exceeds a predetermined threshold between the n + m frame and the nth frame, and a scene change occurs. Scene changes can be detected as if.

Further, for example, if a value obtained by integrating the difference of pixel values of a specific color in each pixel (also referred to as a color component difference integrated value) exceeds a predetermined threshold between the nth frame and the n + m frame, A scene change can be detected as a change has occurred. In addition, a change in scene may be detected according to a change in distribution related to an integrated value (also referred to as a color component integrated value) of pixel values of a specific color of a vertical line in an image. At this time, for example, if the sum of the differences of the color component integrated values of each vertical line exceeds a predetermined threshold between the nth frame and the (n + m) frame, the scene change is assumed to have occurred. Can be detected.

In addition, for example, a change in scene is detected by comparing the intensity distribution of spatial frequency components obtained by Fourier transform or the like for the distribution of pixel values between the nth frame and the n + mth frame. Can be done. Specifically, for example, if the spatial frequency range is divided into a predetermined number of bands and the sum of the differences in the intensity of the frequency components in each band exceeds a predetermined threshold, a scene change has occurred. A change in scene can be detected.

<(1-3-2-1-2) Second change detection method>
A change in the scene can be detected in accordance with a change in the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.

For example, if the set of the reference attention area and the corresponding attention area is changed, the change in the scene can be detected as a change in the scene. Here, a change in the reference attention area may be detected as a change in the scene, and a change in the corresponding attention area may be detected as a change in the scene. In this case, for example, a change in at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame.

9 and 10, there are three object regions O1 _L to O3 _L as reference attention regions between the n-th left eye image Gn _L and the (n + m) left-eye image G (n + m) _L. A state is shown in which the captured image is changed to an image in which one object region O4 _L as a reference attention region is captured. In this case, the change of the reference attention area can be recognized in accordance with the object mismatch between the object area O1 _L and the object area O4 _L. The mismatch of the display object between the object region O1 _L and the object region O4 _L can be recognized by searching for corresponding points by template matching, the POC method, or the like. Here, m may be 1 or more, and may be 20 to 50.

Also, for example, a scene change can be detected in accordance with the movement of a set of a reference attention area and a corresponding attention area. Here, a scene change may be detected according to the movement of the reference attention area, or a scene change may be detected according to the movement of the corresponding attention area. In this case, it is only necessary to recognize whether at least one of the reference attention area and the corresponding attention area has moved beyond a predetermined amount between the nth frame and the (n + m) th frame.

<(1-3-2-1-3) Third change detection method>
A scene change can be detected in accordance with at least one of the appearance and disappearance of a set of a reference attention area and a corresponding attention area in a plurality of frames included in video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.

For example, if a set of a reference attention area and a corresponding attention area appears and disappears, a scene change can be detected as a scene change has occurred. Here, the appearance and disappearance of the reference attention area may be detected as a scene change, and the appearance and disappearance of the corresponding attention area may be detected as a scene change. In this case, for example, the appearance and disappearance of at least one of the reference attention area and the corresponding attention area may be recognized between the nth frame and the n + m frame. The appearance and disappearance of the reference attention area and the appearance and disappearance of the corresponding attention area can be recognized by, for example, detecting a change in the number of the reference attention area and the corresponding attention area, and detecting the movement amount and the moving direction of the object. The detection of the moving amount and the moving direction of the object can be realized by dividing each image into a plurality of blocks and comparing them between the nth frame and the n + m frame. Here, m may be 1 or more, and may be 20 to 50.

11 and 12, there are two object regions O5 _L and O6 _L as reference attention regions between the n-th left eye image Gn _L and the (n + m) left-eye image G (n + m) _L. The captured image shows a state in which an object region O7 _L as a reference attention region appears from the left side. That is, a state in which the object is in frame is shown. In this case, for example, the appearance of the object region O7 _L may be recognized according to the change in the number of object regions, or the appearance of the object region O7 _L according to the detection of the movement amount and the movement direction of the object region O7 _L. It may be recognized.

The first frame of the scene after the change may be a frame when the reference attention area starts to appear or disappear, or may be a frame when the appearance or disappearance of the reference attention area progresses by a predetermined rate. The frame may be a frame at the time when the appearance or disappearance of the reference attention area is completed. Further, the first frame of the changed scene may be a frame at the time when the corresponding attention area starts to appear or disappear, or may be a frame at the time when the appearance or disappearance of the corresponding attention area has progressed by a predetermined rate. It may be a frame when the appearance or disappearance of the corresponding attention area is completed.

<(1-3-2-1-4) Fourth Change Detection Method>
A change in the scene can be detected in accordance with a change in the size of the set of the reference attention area and the corresponding attention area in a plurality of frames included in the video information. A set of the reference attention area and the corresponding attention area can be detected by the attention area detection unit 462.

For example, if the areas of the reference attention area and the corresponding attention area change beyond a predetermined amount, a change in the scene can be detected as a change in the scene. Here, if the area of the reference region of interest changes beyond a predetermined amount, it may be detected as a scene change, and if the area of the corresponding region of interest changes beyond a predetermined amount, it changes as a scene change. It may be detected. In this case, for example, a set of reference attention areas indicating the same object may be detected between the nth frame and the n + m frame by searching for corresponding points by template matching, the POC method, or the like. A set of corresponding attention areas indicating an object may be detected. Then, in at least one of the group of reference attention areas and the group of corresponding attention areas, it can be recognized whether or not the area change exceeds a predetermined amount. For example, the predetermined amount may be an absolute amount of the area, or may be an amount calculated by multiplying the area by a predetermined coefficient (for example, 0.5). Here, m may be 1 or more, and may be 20 to 50.

13 and 14, between the and the n left-eye image Gn _L the (n + m) the left-eye image G (n + m) _L, expanding one object region O8 _L as reference region of interest The situation is shown. In this case, for example, if a change in the area of the object region O8 _L between the and the n left-eye image Gn _L the (n + m) the left-eye image G (n + m) _L exceeds the predetermined amount, the scene A scene change can be detected as a change has occurred.

<(1-3-2-1-5) Fifth Change Detection Method>
A change in the scene can be detected in accordance with a change in the in-focus area in the plurality of frames included in the video information. The recognition of the in-focus area can be executed by targeting at least one of the N-th left eye image GN _L and the N-th right eye image GN _R , for example. For example, an in-focus area may be recognized by edge extraction processing using Hough transform or the like, or an image is divided into a plurality of areas and a high-frequency component is calculated by Fourier transform related to the distribution of pixel values for each area. The in-focus area may be recognized by analyzing the presence or absence. In addition, after the region (object region) where the object is captured by template matching or the like is detected, the presence or absence of high-frequency components is analyzed for each object region by Fourier transform related to the distribution of pixel values. It may be recognized whether it is an area.

And, for example, if the focus area changes to an area where an object with a different focus is captured between the nth frame and the n + mth frame, the scene change can be detected as a scene change. Whether or not the region is a region where a different object is captured can be determined, for example, by searching for corresponding points by template matching, the POC method, or the like. Here, m may be 1 or more, and may be 20 to 50.

15 and 16, the focusing area between the n-th left eye image Gn _L and the (n + m) left-eye image G (n + m) _L is changed from the object area O6 _L to the object area O7 _L. It is shown that it is changing. In this case, for example, between the nth frame and the (n + m) th frame, the change detection unit 463 changes from the object region O6 _L in which the object having a different focus region is captured to the object region O7 _L. A scene change can be detected as recognized and as a scene change has occurred.

<(1-3-2-1-6) Sixth Change Detection Method>
A face of a person captured in a plurality of frames included in the video information is recognized, and a scene change can be detected in accordance with the change of the person between the plurality of frames. Recognition of a person's face can be performed, for example, by targeting at least one of the N-th left eye image GN _L and the N-th right eye image GN _R. Specifically, for example, a region (also referred to as a face region) in which a human face is captured from an image by template matching or the like is recognized, and then a region (eyes, eyebrows, nose, mouth) from the face region by template matching or the like. Etc.) and the face of a person can be recognized by the positional relationship of a plurality of feature points that specify the edge portions of those parts.

For example, if the captured person is switched between the nth frame and the (n + m) th frame, a scene change can be detected as a scene change has occurred. Whether or not a person is replaced can be determined according to, for example, a change in the positional relationship between a plurality of feature points. Here, m may be 1 or more, and may be 20 to 50.

In FIGS. 17 and 18, an area where a person is captured between the n-th left eye image Gn _L and the (n + m) left-eye image G (n + m) _L is shown as an object area O9 _L to an object area O10. _The state of changing to _L is shown. In this case, since the captured person is switched between the nth frame and the (n + m) th frame, the scene change can be detected as a scene change.

<(1-3-2-2) Detection based on audio information>
The change detection unit 463 can detect a change in scene according to a change in audio included in the video information. As a result, an image having a parallax serving as a reference suitable for a scene can be combined with a 3D moving image according to a change in sound. For example, a change in the scene can be detected by comparing the sound corresponding to the n + m-th frame (m is a natural number) with the sound corresponding to the n-th frame before m frames. m may be 1 or 2 or more.

As the scene change detection method based on the audio information in the change detection unit 463, for example, one or more change detection methods among the following seventh to ninth change detection methods can be adopted.

<(1-3-2-2-1) Seventh Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in volume obtained from audio information included in the video information. For example, the volume corresponding to the (n + m) th frame (where m is a natural number) is compared with the volume corresponding to the nth frame before m frames, and when the amount of change in volume exceeds a predetermined amount, a scene change has occurred. Scene changes can be detected as if. Here, m may be 1 or 2 or more. The predetermined amount may be, for example, a fixed absolute amount related to the volume, or may be a variable amount calculated from the volume according to a predetermined rule. As the predetermined rule, for example, a calculation rule for multiplying a volume corresponding to the nth frame by a predetermined coefficient (for example, 0.5) or the like may be employed. At this time, when the volume increases or decreases by 50% or more, the scene change can be detected as a scene change.

<(1-3-2-2-2) Eighth Change Detection Method>
The change detection unit 463 can detect a change in the scene according to the change in the frequency component of the audio based on the audio information included in the video information.

For example, the frequency component of the sound corresponding to the (n + m) th (m is a natural number) frame is compared with the frequency component of the sound corresponding to the nth frame before m frames, and the evaluation value related to the frequency component is a predetermined threshold value. If it has changed beyond the range, it can be detected that the scene has changed. Here, m may be 1 or 2 or more. As the evaluation value related to the frequency component, the sum of the intensities of the frequency components of the audio for each frequency band related to the audio can be adopted. As each frequency band related to sound, for example, a low frequency band, a high frequency band, and a band therebetween can be adopted. Such a change in the evaluation value relating to the frequency component can correspond to, for example, a change in BGM.

<(1-3-2-2-3) Ninth Change Detection Method>
The change detection unit 463 can detect a change in a scene according to a change in a voiceprint that can be recognized from sound, based on sound information included in the video information.

For example, a voiceprint is recognized by analyzing voice corresponding to the (n + m) th (m is a natural number) frame, and a voiceprint is recognized by analyzing voice corresponding to the nth frame before m frames. Then, the voiceprint related to the n + mth frame and the voiceprint related to the nth frame are compared, and if the person speaking is different, the change in the scene can be detected as a change in the scene.

<Detection based on (1-3-2-2-3) metadata>
The change detection unit 463 can detect a change in scene according to a change in metadata included in the video information. Thereby, according to the change of metadata, an image having a parallax that is a reference suitable for a scene can be combined with the 3D moving image. For example, a change in a scene can be detected by comparing metadata related to an (n + m) th frame (m is a natural number) with metadata related to an nth frame before m frames. m may be 1 or 2 or more.

As a detection method of a scene change based on metadata in the change detection unit 463, for example, one or more change detection methods among the following tenth to twelfth change detection methods can be adopted.

<(1-3-2-2-1) Tenth Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in subtitle information specified by metadata included in the video information.

For example, caption information corresponding to the (n + m) th (m is a natural number) frame is compared with caption information corresponding to the nth frame before m frames, and if a predetermined change occurs in the caption information, A scene change can be detected as a change has occurred. The predetermined change mode may include a mode in which a certain feature of subtitle information changes to a different feature, a mode in which subtitle information changes from a state without subtitle information, and the like.

Specifically, characteristic terms that lead to changes in scenes related to time, place, era, topic, etc. are detected from subtitle information, and changes in characteristic terms and appearance and disappearance of characteristic terms occur. Then, the scene change can be detected as a scene change has occurred.

<(1-3-2-2-3) Eleventh Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in chapter information specified by metadata included in video information.

For example, the chapter information corresponding to the (n + m) th frame (where m is a natural number) is compared with the chapter information corresponding to the nth frame before m frames, and if the chapter information has changed, the scene changes. Changes in the scene can be detected as objects.

<(1-3-2-2-3-3) Twelfth Change Detection Method>
The change detection unit 463 can detect a change in scene according to a change in shooting conditions specified by the metadata included in the video information.

For example, the shooting condition of the (n + m) th frame (m is a natural number) is compared with the shooting condition of the nth frame before m frames, and if the shooting condition changes, the scene changes as if the scene has changed. Can be detected. As the imaging conditions, for example, parameters (also referred to as imaging parameters) such as focal length, aperture size, and imaging magnification can be adopted.

More specifically, for example, if an increase / decrease in shooting parameters exceeding a predetermined ratio (for example, 50%) occurs, a change in the scene can be detected as a change in the scene.

<(1-3-3) Deviation amount determination method>
In the shift amount determination unit 464, one or more Nth frames for one scene from when the first change of the scene is detected by the change detection unit 463 of the video information until the second change of the next scene is detected. The reference shift amount is determined based on the shift amount (also referred to as pixel shift amount) of the pixel position indicating the same portion of the object between the Nth left eye image GN _L and the Nth right eye image GN _R. Can be done. At this time, one reference shift amount common to all frames in one scene can be determined.

Thereby, a common image having a reference parallax is added to the 3D moving image of one scene. As a result, an excessive change in the image can be suppressed, the burden on the user's eyes can be reduced, and the amount of calculation can be reduced.

The pixel shift amount between the first N left-eye image GN _L and the N right-eye image GN _R is, for example, by the search for corresponding points with POC method, the N left-eye image GN _L And the Nth right-eye image GN _R can be obtained by detecting a combination of a reference pixel and a corresponding pixel in which the same part of the object is captured.

As the determination method of the reference deviation amount in the deviation amount determination unit 464, for example, one or more of the following first to fourth determination methods can be adopted.

<(1-3-3-1) First Determination Method>
A displacement amount (pixel displacement amount) of a pixel position indicating the same portion of the object between the N-th left eye image GN _L and the N-th right eye image GN _R included in the frame group of one scene of the video information. ) Based on the distribution is determined. The frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames. Thereby, the uncomfortable feeling which a user receives in one scene of 3D animation can be reduced.

In addition, based on the distribution of pixel position shift amounts (pixel shift amounts) indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of one scene of the video information, the reference The amount of deviation may be determined.

For example, the reference shift amount can be determined based on the representative value of the pixel shift amount obtained from the distribution of the pixel shift amount. As a representative value of the pixel shift amount, at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted. In this case, the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount. As the calculation of the predetermined rule, for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.

Here, as shown in FIG. 19, the frame group from the (n−b) th frame (b is a natural number) to the (n−a) th frame (a is a natural number) to the nth frame is the first scene. Assume that the frame group from the (n + a) th frame to the (n + c) th frame including the (n + a) th frame corresponds to the next second scene.

In FIG. 19, the N-th left-eye image GN _L is illustrated and the N-th right-eye image GN _R is not illustrated from the viewpoint of preventing the illustrated complexity. The (n−b) th left-eye image G (n−b) _L to the (n−a) th left-eye image G (na) _L included in the first scene are included in the n-th left eye. the first frame group GG1 _L to use image Gn _L is shown. Also, the (n + c) left eye image G (n + c) including the (n + b) left eye image G (n + b) _L to the (n + a) left eye image G (n + a) _L included in the second scene. the second frame group GG2 _L to _L are shown.

FIG. 20 shows the change in the amount of pixel shift with time as a curve. If the times T0 to T1 are the first scene and the times T1 to T2 are the second scene, the representative value RV1 of the pixel shift amount in the first scene is calculated, and the representative of the pixel shift amount in the second scene is calculated. The value RV2 can be calculated.

Here, if an average value or median value regarding the pixel shift amount is adopted as a representative value of the pixel shift amount, an image having a parallax corresponding to the reference shift amount (reference parallax image) is added to the 3D moving image. Thus, the parallax related to the reference parallax image is used as a reference, and the spread before and after the 3D moving image is easily recognized. As a result, the sense of depth that can be obtained by the user watching the 3D video can be improved.

<(1-3-3-2) Second determination method>
A shift amount of a pixel position indicating the same part of the object between the N-th left-eye image GN _L and the N-th right-eye image GN _R of the first frame included in the frame group of one scene in the video information. Based on (pixel shift amount), the reference shift amount is determined. The frame group of one scene may be all frames included in one scene, for example. Thereby, the process which adds the common image which has the parallax used as a reference | standard to the 3D moving image of one scene may be performed in real time.

In addition, based on the shift amount (pixel shift amount) of the pixel position indicating the same part of the object between the reference attention region and the corresponding attention region of the first frame included in the frame group of one scene in the video information. The reference deviation amount may be determined.

For example, the reference shift amount can be determined based on the representative value of the pixel shift amount. As a representative value of the pixel shift amount, at least one value among an average value, a maximum value, a minimum value, a mode value, and a median value related to the pixel shift amount can be adopted. In this case, the representative value of the pixel shift amount may be directly used as the reference shift amount, or a value shifted from the representative value of the pixel shift amount by the calculation of a predetermined rule may be used as the reference shift amount. As the calculation of the predetermined rule, for example, a calculation by multiplying a representative value of the pixel shift amount by a predetermined coefficient (for example, 0.8) can be employed.

<(1-3-3-3) Third Determination Method>
Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the N-th left-eye image GN _L and the N-th right-eye image GN _R included in the frame group of one scene in the video information. Thus, a representative value of a virtual distance (also referred to as virtual distance) from the virtual reference surface (also referred to as virtual reference plane) to the surface of the object is calculated. Then, a deviation amount corresponding to the virtual reference plane when the representative value of the virtual distance is a predetermined value is determined as the reference deviation amount. Thereby, the uncomfortable feeling that the user receives when the scene changes can be reduced.

Here, the frame group of one scene may be all the frames included in one scene, or may be a part of all the frames included in one scene. Some frames may be temporally continuous frames or may be discretely sampled frames. The virtual reference plane may be a plane parallel to the screen, for example, and may be a plane orthogonal to the user's line of sight facing the screen. The virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the virtual reference plane. As a representative value of the virtual distance, at least one value of an average value, a maximum value, a minimum value, a mode value, and a median value of the virtual distance can be adopted. The predetermined value may be set to an arbitrary value according to the operation of the operation unit 41 by the user, or may be set to a fixed value prepared in advance.

For example, if the representative value of the virtual distance is the maximum value of the virtual distance and the predetermined value is 0, the reference deviation amount is set according to the maximum value of the virtual distance. If the representative value of the virtual distance is the minimum value of the virtual distance and the predetermined value is 0, the reference deviation amount can be set in accordance with the minimum value of the virtual distance.

FIG. 21 shows a scene change occurring between the nth frame and the (n + m) th frame. In FIG. 22, the change in the average value of the virtual distance of each frame over time when a predetermined virtual reference plane is set is indicated by a curve. Here, the times T0 to T1 are the first scene to which the nth frame belongs, and the times T1 to T2 are the second scene to which the (n + m) th frame belongs.

In this case, for example, the average value Mr1 can be calculated as the representative value of the virtual distance in the first scene, and the average value Mr2 can be calculated as the representative value of the virtual distance in the second scene. Then, the first virtual reference that protrudes the distance MR1 from the predetermined virtual reference plane with respect to the first scene so that the representative value of the virtual distance in the first scene and the representative value of the virtual distance in the second scene become a predetermined value. A plane is set, and a second virtual reference plane that protrudes from the predetermined virtual reference plane by a distance MR2 can be set for the second scene. Here, the relationship of (Mr1-MR1) = (Mr2-MR2) is established. Then, the shift amount corresponding to the first virtual reference plane is determined as the reference shift amount for the first scene, and the shift amount corresponding to the second virtual reference plane is determined as the reference shift amount for the second scene.

<(1-3-3-4) Fourth Determination Method>
A reference shift amount related to one scene is determined according to an image of a scene before the one scene (previous scene). Specifically, the object is identical between the N-th left-eye image GN _L and the N-th right-eye image GN _R of one or more frames in a certain scene (also referred to as a second scene) of the video information. A distribution of the shift amount (also referred to as a first pixel shift amount) of the pixel position indicating the portion is obtained. Further, the same part of the object is shown between the N-th left-eye image GN _L and the N-th right-eye image GN _R of one or more frames in another scene (also referred to as the first scene) before the second scene. A distribution of pixel position shift amounts (also referred to as second pixel shift amounts) is obtained. Then, a reference shift amount in the second scene is determined based on the distribution of the first pixel shift amount and the distribution of the second pixel shift amount.

Here, the one or more frames in the second scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the second scene. Further, the one or more frames in the first scene may be any one of one frame, two or more consecutive frames, two or more frames sampled, and all the frames in the first scene.

In the fourth determination method, for example, a representative value related to the distribution of the first pixel shift amount (also referred to as a first shift representative value) and a representative value related to the distribution of the second pixel shift amount (also referred to as a second shift representative value). Based on the above, the reference deviation amount can be determined. In the fourth determination method, as the first deviation representative value, at least one of the average value, the maximum value, the minimum value, the mode value, and the median value related to the distribution of the first pixel deviation amount is adopted. Can be done. As the second deviation representative value, at least one of an average value, a maximum value, a minimum value, a mode value, and a median value related to the distribution of the second pixel deviation amount can be adopted.

Specifically, for example, the reference deviation amount of the second scene can be determined by a predetermined calculation using the first deviation representative value and the second deviation representative value. As the predetermined calculation, for example, an calculation in which an average value of the first deviation representative value and the second deviation representative value is calculated may be employed. At this time, this average value can be determined as the reference deviation amount of the second scene.

FIG. 23 is a diagram for explaining a fourth method for determining the reference deviation amount. FIG. 23 shows the first deviation representative value MP1 related to the first scene (time T0 to T1), the second deviation representative value MP2 related to the second scene (time T1 to T2) next to the first scene, A third deviation representative value MP3 relating to the third scene (time T2 to T3) next to the two scenes is shown. In this case, for example, the first deviation representative value MP1 can be set as the reference deviation amount RP1 of the first scene. Further, the average value of the first deviation representative value MP1 and the second deviation representative value MP2 can be set as the reference deviation amount RP2 of the second scene. Further, an average value of the second deviation representative value MP2 and the third deviation representative value MP3 can be set as the reference deviation amount RP3 of the third scene.

Further, the reference deviation amount may be determined by the following method. First, a shift amount of a pixel position indicating the same part of an object between the Nth left eye image GN _L and the Nth right eye image GN _R included in the frame group of the first scene in the video information. A distribution of (pixel shift amount) is obtained. Based on the distribution of pixel shift amounts, a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated. On the other hand, the same part of the object is shown between the Nth left eye image GN _L and the Nth right eye image GN _R included in the frame group of the second scene next to the first scene in the video information. A distribution of pixel position shift amounts (pixel shift amounts) is obtained. Based on the distribution of the pixel shift amount, a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range. A deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.

Here, the first and second virtual reference planes may be planes parallel to the screen, for example, and may be planes orthogonal to the user's line of sight facing the screen. The virtual distance may be, for example, a distance where the surface of the object that appears three-dimensionally in the 3D moving image is separated from the first or second virtual reference plane.

Furthermore, the reference deviation amount may be determined according to the virtual distance related to the attention area. In this case, for example, first, a displacement amount (pixel displacement amount) of a pixel position indicating the same part of the object between the reference attention region and the corresponding attention region included in the frame group of the first scene in the video information. Is obtained. Based on the distribution of pixel shift amounts, a first representative value of a virtual distance from a virtually set reference plane (also referred to as a first virtual reference plane) to the surface of the object is calculated. On the other hand, the amount of pixel position shift (pixel shift amount) indicating the same part of the object between the reference target region and the corresponding target region included in the frame group of the second scene next to the first scene in the video information. ) Distribution is obtained. Based on the distribution of the pixel shift amount, a second representative value of a virtual distance from a virtually set reference plane (also referred to as a second virtual reference plane) to the surface of the object is calculated. Then, as a result of the change of the second virtual reference plane and the calculation of the second representative value being performed once or more, the second in the case where the difference between the first representative value and the second representative value falls within a predetermined allowable range. A deviation amount corresponding to the virtual reference plane is determined as a reference deviation amount.

<(1-3-4) Region Image Information Acquisition Method>
The acquisition of the N-th region image information regarding the N-th reference region image IN _L and the N-th corresponding region image IN _R in the region image acquisition unit 465 is realized, for example, by sequentially performing the following first step and second step. Can be done.

<(1-3-4-1) First step>
An image pattern stored in advance in the storage unit 44 or the like is read out. As this image pattern, for example, an image pattern showing a specific pattern in which relatively large dots are randomly arranged, an image pattern including an information display column (for example, a data column or a time column) of digital broadcasting, and a device An image pattern including operation buttons or the like can be employed.

<(1-3-4-2) Second step>
The image pattern read out in the first step is used as a base image (for example, the Nth reference area image IN _L ), and is determined according to the reference shift amount related to each scene determined by the shift amount determination unit 464. By shifting the position of each pixel in one image in one direction, the other image (for example, the Nth corresponding region image IN _R ) is generated. As a result, the Nth reference area image IN _L and the Nth corresponding area image IN _R are acquired for each scene. That is, the Nth reference area image IN _L and the Nth corresponding area image IN _R for a scene are constant.

Note that a set of image patterns corresponding to a plurality of deviation amounts is stored in advance in the storage unit 44 or the like, and a set of image patterns corresponding to the reference deviation amount determined by the deviation amount determination unit 464 is read out from the storage unit 44 or the like. Thus, the Nth reference area image IN _L and the Nth corresponding area image IN _R may be acquired. That is, it is only necessary that the Nth reference area image IN _L and the Nth corresponding area image IN _R have a relationship in which a display object such as a predetermined pattern is shifted in the reference displacement amount in one direction.

24 and 25 illustrate variations of the Nth reference area image IN _L. In FIGS. 24 and 25, the Nth reference position is matched with the Nth reference composite position PN _L around the Nth left eye image area TAN _L corresponding to the nth left eye image GN _L in FIG. This is illustrated in the aspect of the composite image GSN _L for the Nth left eye in which the region image IN _L is arranged. FIG. 24 schematically shows an image pattern having an information display column including a digital broadcast data column Pa1 and a time column Ca1. FIG. 25 schematically shows an image pattern including the operation button group Ba1 and the time column Ta1.

<(1-3-5) Method of specifying the composition position>
<When (1-3-5-1) Reference Combining Position and Corresponding Combining Position are Fixed>
The position specifying unit 468, for example, the N reference-be-combined position of a predetermined previously determined PN _L and the N corresponding object synthesizing position PN _R may be specified, first in accordance with the operation of the operation unit 41 by the user N reference the combined position PN _L and the N corresponding object synthesizing position PN _R may be specified.

In this case, the N reference the combined position PN _L, for example, and the surrounding region of the N left-eye image GN _L, corresponding to the peripheral region in another image and the N left-eye image GN _L It can be specified in one or more of a plurality of regions including the region. Further, the first N corresponds the combined position PN _R, for example, and the surrounding region of the N right-eye image GN _R, region and the N right-eye image GN _R corresponding to the peripheral region in another image Can be specified in one or more of a plurality of regions including.

Specifically, as shown in FIGS. 2, 24, and 25, the Nth reference combined image is generated in a region surrounding the Nth left eye image region TAN _L corresponding to the Nth left eye image GN _L. position PN _L can be specified. In addition, the Nth reference synthesized position PN _L is specified so as to include a specific area of the area surrounding the Nth left eye image area TAN _L corresponding to the Nth left eye image GN _L. Also good. For example, as shown in FIG. 26, an Nth reference combined position PN _L that specifies the left and right areas sandwiching the Nth left eye image area TAN _L may be designated. Further, as shown in Figure 27, the N reference the synthetic position PN _L for specifying a region located below the first N left-eye image region TAN _L may be specified. Furthermore, as shown in Figure 28, the N reference the combined position PN _L may be specified width surrounding the periphery of the N left-eye image region TAN _L identifies a region non-uniform.

Note that the N reference the combined position PN _L and the N corresponding the combined position PN _R may be a position of identifying all pixels in the region of the predetermined shape of the annular like, the area of a predetermined shape of an annular or the like An identifiable position (also referred to as a specific position) may be used. The specific position may be, for example, a position of a pixel at one or more corners in a region having a predetermined shape.

<When (1-3-5-2) reference composition position and corresponding composition position are changed>
In the position designating unit 468, the N-th reference combined position PN _L and the N-th corresponding combined position PN according to one or more of the N-th left-eye image GN _L and the N-th right-eye image GN _R. _R can be specified. Here, a region first N reference the combined position PN _L is, and the surrounding region of the N left-eye image GN _L, the first N left-eye image GN _L corresponding to the region of the periphery in another image Can be specified in one or more of a plurality of regions including. Further, the N corresponding object synthesizing position PN _R is, and the surrounding region of the N right-eye image GN _R, a region and the N right-eye image GN _R corresponding to the area of the surrounding in another image It can be specified in one or more of the plurality of regions to be included.

Note that the N reference the combined position PN _L is, and the outer edge portion near the region of the N left-eye image GN _L, corresponding to the region of the outer edge vicinity in another image and the N left-eye image GN _L It may be specified in one or more of a plurality of regions including a region to be performed. Further, the N corresponding object synthesizing position PN _R is, and the outer edge portion near the region of the N right-eye image GN _R, corresponding to the region of the outer edge vicinity in another image and the N right-eye image GN _R It may be specified in one or more of a plurality of regions including a region to be performed.

As such method of specifying the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R, for example, one or more specified methods of the following first to third designated method can be adopted.

<(1-3-5-2-1) First designation method>
According to the distribution relating to the positional deviation amount (ie, parallax) between the reference pixel indicating the same part of the object and the corresponding pixel between the N-th left-eye image GN _L and the N-th right-eye image GN _R Te, N-th reference the combined position PN _L and the N corresponding object synthesizing position PN _R is designated.

For example, the disparity between the first N left-eye image GN _L and the N right-eye image GN _R is, if and equal to or less than the second threshold value in the first threshold value or more, PN _L and the second N reference-be-combined position N corresponds can aspects be combined position PN _R is specified is considered. In this aspect, for example, if the parallax between the N-th left-eye image GN _L and the N-th right-eye image GN _R is less than the first threshold or exceeds the second threshold, combining position PN _L and the N corresponding object synthesizing position PN _R is not specified. Further, for example, the smaller the difference between the maximum value and the minimum value of the parallax between the Nth left eye image GN _L and the Nth right eye image GN _R , the smaller the Nth reference composite position. the number of PN _L and the N corresponding object synthesizing position PN _R may increase.

<(1-3-5-2-2) Second designation method>
According to the disparity distribution of the M sets (M is an integer of 2 or more) of the reference attention area and the corresponding attention area between the N-th left-eye image GN _L and the N-th right-eye image GN _R. N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is designated. As M reference (M is an integer of 2 or more) reference attention areas and corresponding attention areas, those detected by the attention area detection unit 462 may be employed.

For example, the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, if and equal to or less than the second threshold value in the first threshold value or more, the N reference the combined position PN _L and the N corresponding object synthesizing position PN _{embodiment R} is specified may be considered. In this aspect, for example, if the difference between the maximum value and the minimum value of the parallax in the reference attention area and the corresponding attention area is less than the first threshold value or exceeds the second threshold value, the Nth reference composite position PN _L and not specified the N corresponding the combined position PN _R. Furthermore, for example, the difference between the maximum value and the minimum value of the disparity in the reference region of interest and corresponding region of interest, as the smaller the smaller the number of the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R May increase.

<(1-3-5-2-3) Third designation method>
The arrangement of the Nth reference synthesized position PN _L and the Nth corresponding synthesized position PN _R changes according to one or more of the Nth left eye image GN _L and the Nth right eye image GN _R. the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is designated as.

For example, the attention area detection unit 462 first N N th reference the combined position PN _L and in accordance with the detected position of the reference target region and corresponding region of interest from the left-eye image GN _L and the N right-eye image GN _R in is the N corresponding the combined position PN _R may be considered a mode specified.

Here, for example, it is assumed that the attention area detection unit 462 detects a set of reference attention areas and corresponding attention areas. At this time, for example, the Nth reference combined position PN _L can be designated according to the position of the reference region of interest occupying the Nth left eye image GN _L. Further, in accordance with the position of the corresponding region of interest occupied by the N-th right-eye image GN _R, it is the N corresponding the combined position PN _R may be designated.

FIG. 29 shows a case where the reference attention area is the object area O1 _L and the corresponding attention area is the object area O1 _R. In this case, the object region O1 _L is projected in the first predetermined direction (here, the −X direction) out of the region F _L around the left eye image region TAN _L corresponding to the Nth left eye image GN _L. and position PN1 _L indicating a region which is (here, -Y direction) the object region O1 _L is the second predetermined direction and the position PN2 _L indicating an area which is projected on, as the N reference the synthetic position PN _L Can be specified. For the N right-eye image GN _R, but are not shown, as with the N image GN _L for the left eye, the first N corresponds the combined position PN _R may be designated.

Thus, if Kaware the position of the reference target region and corresponding region of interest, the arrangement of the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is changed. Also, if Kaware the magnitude of the reference target region and corresponding region of interest, the size of the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is changed. Furthermore, Kaware the number of reference target region and corresponding region of interest, the number of the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is changed.

<(1-3-6) Image composition method>
The image combining unit 469, the N reference region image IN _L is aligned with the first N reference the synthetic position PN _L specified by the position specifying unit 468, the N corresponding the synthesis position designated by the position designator 468 is the N corresponding region image iN _R is aligned with the PN _R. Thereby, stereoscopic image information is generated.

For example, the image combining unit 469, the N reference region image IN _L is, and the surrounding region of the N left-eye image GN _L, to the surrounding area in another image and the N left-eye image GN _L It can be arranged in one or more of a plurality of regions including corresponding regions. Also includes the N corresponding region image IN _R is, and the surrounding region of the N right-eye image GN _R, a region and the N right-eye image GN _R corresponding to the area of the surrounding in another image It may be arranged in one or more areas of the plurality of areas.

Thus, the N left eye combined image GSN _L to the N-th image GN _L for the left eye and the N reference region image IN _L is synthesized are generated, the N right-eye image GN _R and the N corresponding region A composite image for the Nth right eye GSN _R synthesized with the image IN _R can be generated. At this time, the N-th left-eye image GN _L and the N-th reference area image IN _L acquired based on the same reference deviation amount are synthesized for all frames of one scene.

For example, as shown in FIG. 30, for each Nth left eye image GN _L in the first frame group GG1 _L of the first scene shown in FIG. The area image IN _L is synthesized. Further, the Nth reference region image IN _L related to the same reference shift amount is combined with each Nth left eye image GN _L of the second frame group GG2 _L of the second scene shown in FIG. Is done. Further, for example, as shown in FIG. 31, the n-th reference region image In _L is synthesized with the n-th left-eye image Gn _{L in} the first scene shown in FIG. 21, and is shown in FIG. The (n + m) th reference region image I (n + m) _L is synthesized with the (n + m) th left-eye image G (n + m) _{L in} the second scene.

The stereoscopic image information generated as described above only needs to include information of at least one of the first format and the second format. Here, in the first format, the Nth left eye image GN _L , the Nth right eye image GN _R , the Nth reference region image IN _L, and the Nth corresponding region image IN _R are superimposed on one screen. The format which can be displayed at the same time in the aspect is included. Further, the second format is one or more of the Nth left eye image GN _L , the Nth right eye image GN _R , the Nth reference area image IN _L , and the Nth corresponding area image IN _R in one screen. It includes a format in which an image and one or more remaining images can be displayed in time sequence. Thus, even if various display modes are employed, a sense of depth that can be obtained by a user watching a 3D moving image can be improved.

<(1-4) Operation flow of information processing apparatus>
FIG. 32 is a flowchart illustrating an example of an operation flow of the image processing apparatus according to the embodiment. This operation flow is realized by reading and executing the program PG1 in the storage unit 44 by the control unit 46. For example, execution of image processing related to a 3D moving image in the information processing apparatus 4 is requested in accordance with the operation of the operation unit 41 by the user, and this operation flow is started.

32, the video information is acquired by the video acquisition unit 461.

In step S2, the mode setting unit 467 determines whether or not the attention area detection unit 462 is set to a mode for detecting the attention area. If the mode for detecting the attention area is set, the process proceeds to step S3. If the mode for detecting the attention area is not set, the process proceeds to step S4.

In step S3, the attention area detection unit 462 detects the reference attention area and the corresponding attention area for the N-th left eye image GN _L and the N-th right eye image GN _R.

In step S4, the change detector 463 detects a change in the scene.

In step S5, the deviation amount determination unit 464 determines a reference deviation amount for each scene.

In step S6, the area image acquisition unit 465 acquires Nth area image information related to the Nth reference area image IN _L and the Nth corresponding area image IN _R for each scene.

In step S <b> 7, the mode setting unit 467 determines whether a signal related to mode setting of the position specifying unit 468 has been received by the signal receiving unit 466. If a signal related to mode setting is accepted, the process proceeds to step S8. If a signal related to mode setting is not accepted, the process proceeds to step S9.

In step S <b> 8, the position specifying unit 468 selects one of a plurality of modes including the first mode and the second mode according to the signal received by the signal receiving unit 466 by the mode setting unit 467. Is set. It is assumed that position setting unit 468 is initially set to a predetermined mode before the mode is set in step S8.

In step S9, the position specifying unit 468, and the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is specified for each scene.

In step S10, the image synthesizing unit 469 synthesizes the Nth left eye image GN _L , the Nth right eye image GN _R , the Nth reference region image IN _L , and the Nth corresponding region image IN _R. As a result, stereoscopic image information is generated, and this operation flow ends.

<(1-5) Summary of Embodiment>
As described above, according to the image processing technology according to an embodiment, an image having a reference parallax suitable for a scene is synthesized with a 3D moving image according to a change in the scene. For this reason, the difference with the parallax in a 3D moving image becomes easy to be recognized by the comparison with the image which has a standard parallax. As a result, the sense of depth that can be obtained by the user watching the 3D moving image can be improved.

<(2) Modification>
It should be noted that the present invention is not limited to the above-described embodiment, and various changes and improvements can be made without departing from the gist of the present invention.

<(2-1) First Modification>
In the process according to the one embodiment, the N reference region image IN _L is, and the surrounding region of the N left-eye image GN _L, the periphery of the other images from the first N left-eye image GN _L The Nth corresponding region image IN _R is arranged in one or more of a plurality of regions including a region corresponding to the region of the Nth right eye, the region around the Nth right eye image GN _R , and the Nth right eye. Although it has been arranged in one or more of a plurality of regions including a region corresponding to the surrounding region in a different image from the work image GN _R , the present invention is not limited to this.

For example, as shown in FIG. 33, the Nth reference area image IN _L is arranged so as to overlap an area (also referred to as an internal area) of the Nth left eye image GN _L , and the Nth corresponding area image IN _R may be arranged so as to overlap the inner region of the N-th right eye image GN _R. Further, the N reference region image IN _L is, to the N-th left-eye image GN _L may be arranged in a region corresponding to the internal region of said N left-eye image GN _L in another image, the N corresponding region image iN _R, it may be the first N right-eye image GN _R are disposed in a region corresponding to the internal region of said N right-eye image GN _R in another image.

The left side of FIG. 33, the n in accordance with the n-th reference the combined position Pn _L of the internal region of the n left-eye image area TAn _L corresponding to the N left-eye image GN _L shown in FIG. 1 An example of the composite image GSn _L for the nth left eye in which the reference area image In _L is combined is shown. Further, on the right side of FIG. 33, it is matched with the n-th corresponding combined position Pn _R in the inner region of the n-th right eye image area TAn _R corresponding to the N-th right eye image GN _R shown in FIG. An example of the n corresponding region image an in _R n-th right-eye synthesized image GSn _R being synthesized is shown.

With such a configuration, it is possible to improve a sense of depth that can be obtained by a user watching a 3D video without taking away as much as possible from the original target of attention. In particular, if a display mode in which a display object based on the Nth reference area image IN _L and the Nth corresponding area image IN _R exists in the vicinity of the object that the user is paying attention to, the 3D moving image is viewed. The sense of depth that can be obtained by a user can be further improved.

The Nth reference area image IN _L and the Nth corresponding area image IN _R may be, for example, specific markers. The specific marker may be, for example, a unique marker that can be easily distinguished from an object originally included in the Nth left eye image GN _L and the Nth right eye image GN _R by the user. . Intrinsic features can be realized, for example, by shape, color, texture, and the like.

When the N-th left-eye image GN _L and the N-th right-eye image GN _R are images that capture an actual object, the specific marker may be a marker drawn with CG or the like. At this time, examples of the specific marker include various simple shapes such as a stick, a triangle, and an arrow, and various objects such as a vase and a butterfly. This makes it difficult for the user to confuse the object originally included in the N-th left eye image GN _L and the N-th right eye image GN _R with the specific marker.

If the specific marker has a specific shape, a specific color, a specific texture, and the like, but is translucent, the Nth left-eye image GN _L and the Nth right-eye image GN _R Display of an image based on is difficult to be hindered.

The position specifying unit 468, for example, may be predetermined and N-th reference the combined position PN _L is the N corresponding object synthesizing position PN _R is specified but the in response to operation of the operation unit 41 by the user N reference the combined position PN _L and the N corresponding object synthesizing position PN _R may be specified. In this case, the N reference the combined position PN _L, for example, the internal region of the N left-eye image GN _L, a region from the N-th left-eye image GN _L corresponding to the internal region of another image Can be specified in one or more of a plurality of regions including. Further, the first N corresponds the combined position PN _R, for example, the internal region of the N right-eye image GN _R, a region and the N right-eye image GN _R corresponding to the internal region of another image It can be specified in one or more of the plurality of regions to be included.

Further, in the position designation unit 468, the N-th reference composition position PN _L and the N-th corresponding composition composition according to one or more of the N-th left eye image GN _L and the N-th right eye image GN _R. position PN _R may be specified. Specific specifying, for example, certain aspects the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is specified in the vicinity of the reference target region and corresponding region of interest can be considered. Here, as the vicinity of the attention area, for example, the lower left, upper left, lower right, upper right, upper, lower, left, and right positions of the attention area can be considered. Further, as the vicinity of the reference attention area and the corresponding attention area, a position surrounding the attention area may be considered.

For example, if the object region O1 _L, O1 _R is a reference region of interest and corresponding region of interest, as shown in Figure 33, the reference target region and object region O1 is the corresponding region of interest _L, O1 _R N-th bottom left of the embodiments reference the combined position PN _L and the N corresponding object synthesizing position PN _R is specified are contemplated. Further, for example, as shown in FIG. 34, the ring-shaped Nth reference synthesized position PN _L and Nth corresponding synthesized position PN _R surrounding the object areas O1 _L and O1 _R that are the reference attention area and the corresponding attention areas. Is possible.

Incidentally, for example, if Kaware the magnitude of the reference target region and corresponding region of interest, the size of the region specified by the region identified by the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R is changed May be. Also, if Kaware the number of reference target region and corresponding region of interest, the number of the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R may be changed. Depending on the operation of the operation unit 41 by the user, the size and number of regions specified by the region and the N corresponding the combined position PN _R specified by the N reference the combined position PN _L may be changed.

Further, for example, the distribution of parallaxes for the M sets (M is an integer of 2 or more) of the reference attention area and the corresponding attention area between the N-th left-eye image GN _L and the N-th right-eye image GN _R. in response, the necessity of specifying the N-th reference the combined position PN _L and the N corresponding object synthesizing position PN _R may be determined. According to the distribution relating to the positional deviation amount (ie, parallax) between the reference pixel indicating the same part of the object and the corresponding pixel between the N-th left-eye image GN _L and the N-th right-eye image GN _R Te, necessity of specifying the N-th reference the combined position PN _L and the N corresponding object synthesizing position PN _R may be determined.

Furthermore, a mode is also conceivable in which the Nth reference synthesized position PN _L and the Nth corresponding synthesized position PN _R are specified in an area that is predicted not to be noticed by the user (non-attention area). Examples of the non-attention area include areas different from the reference attention area and the corresponding attention area detected by the attention area detection unit 462. For example, as the non-attention area, an area in the vicinity of the end of the N-th left-eye image GN _L and the N-th right-eye image GN _R , an area indicating an object with little motion obtained from the motion vector analysis result, And an area where the color and texture are not conspicuous. This makes it difficult for the display of the area that the user is paying attention to. As a result, it is possible to achieve both the suppression of the visual discomfort and the improvement of the depth feeling that can be obtained by the user watching the 3D video.

<(2-2) Second Modification>
In the processing according to the above-described embodiment, a certain reference deviation amount is determined for one scene, but the present invention is not limited to this. For example, the reference deviation amount may be changed stepwise or gradually in the vicinity of the scene change. Thereby, the uncomfortable feeling that the user receives when the scene changes in the 3D moving image can be reduced.

Specifically, in the shift amount determination unit 464, the reference shift amount with respect to each frame so that the difference in the reference shift amount is equal to or less than a predetermined amount between the frames in the vicinity of the scene change in the video information. May be determined. In this case, the region image acquisition unit 465 determines the position of the pixel indicating the same part of the display object in one direction for each frame for each frame in the vicinity of the scene change in the video information. The N-th region image information related to the N-th reference region image IN _L and the N-th corresponding region image IN _R having a relationship that is shifted by the reference displacement amount can be acquired. Then, in the image composition unit 469, for each frame in the vicinity of the scene change in the video information, the Nth left eye image GN _L , the Nth right eye image GN _R, and the Nth reference region relating to each frame are displayed. The image IN _L and the Nth corresponding region image IN _R can be combined.

For example, as shown in FIG. 35, first, the reference deviation amount PR1 is calculated for the first scene (time T0 to T1) by the same determination method as in the above embodiment, and the second scene (time T1 to T1) is calculated. A reference deviation amount PR2 is calculated with respect to T2). For example, for the first scene, the reference deviation amount PR1 is adopted as it is, and for the second scene, the reference deviation amount is increased or decreased by a predetermined amount between the frames in order from the first frame. Thus, the reference deviation amount can be determined so as to reach the reference deviation amount PR2. In this case, the reference deviation amount is changed stepwise from time T1 to T1 + α.

Here, the predetermined amount may be set in advance, or may be set by dividing the difference between the reference deviation amount PR1 and the reference deviation amount PR2 by a predetermined number of frames (for example, Nf). The predetermined frame number Nf can be set to 30 frames, for example. At this time, the reference deviation amount linearly changes from time T1 to T1 + α.

Further, the predetermined amount may be calculated by multiplying a reference deviation amount PR1 related to the first scene by a predetermined coefficient (for example, 0.01). At this time, the reference deviation amount changes nonlinearly from time T1 to T1 + α.

And if such a configuration is applied to a movie in which scenes where various objects appear and disappear as a result of the large movement of various objects including people without changing the background are effective Becomes prominent.

<(2-3) Other modifications>
For example, the mode of position designator 468 to be set by the mode setting unit 467, a designation method of the N reference the combined position PN _L and the N corresponding the combined position PN _R according to the above embodiment, the first modified the N reference the synthesis position according to examples PN _L and the N corresponding designation method of the combined position PN _R may be performed selectively. In other words, for example, the position specifying section 468 if it is set to the first mode, specifying of the N reference the combined position PN _L and the N corresponding the combined position PN _R according to the embodiment is executed if the position specifying section 468 is set to the second mode, specifying of the N reference the combined position PN _L and the N corresponding the combined position PN _R according to the first modification may be performed. Thereby, suppression of visual discomfort and ensuring the visibility of 3D moving images can be appropriately selected according to the user's intention.

Further, the N reference the combined position PN _L and the N corresponding object synthesizing position PN _R of the N reference according to the specified method in the first modification to be combined position PN _L and the N corresponding the synthesis according to the embodiment both the specified methods position PN _R may be performed simultaneously.

In the one embodiment, the first modified example, and the second modified example, the video information including the 3D moving image is originally acquired. However, the present invention is not limited to this. For example, video information including a 3D video may be acquired by generating 3D video from the normal video by various methods after video information including the normal video is obtained.

Needless to say, all or a part of each of the above-described embodiment and various modifications can be appropriately combined within a consistent range.

DESCRIPTION OF SYMBOLS 1 Information processing system 2 Stereo camera 4 Information processing apparatus 5 Eye-gaze detection sensor 41 Operation part 42 Display part 44 Storage part 46 Control part 461 Image | video acquisition part 462 Attention area detection part 463 Change detection part 464 Misalignment amount determination part 465 Area image acquisition part 466 Signal receiving unit 467 Mode setting unit 468 Position specifying unit 469 Image composition unit

Claims

A first acquisition unit configured to acquire video information including a reference image and a corresponding image having a relationship in which positions of pixels indicating the same part of the object are shifted in one direction in information of each frame;
A change detection unit for detecting a change in a scene in the video information;
A reference shift amount is determined based on a shift amount of a pixel position indicating the same part of the object between the reference image and the corresponding image of one or more frames after the scene change in the video information. A decision unit to
A second acquisition unit that acquires region image information relating to a reference region image and a corresponding region image having a relationship in which a position of a pixel indicating the same part of a display object is shifted in the one direction by the reference shift amount;
A synthesizing unit that generates stereoscopic image information by synthesizing the reference image, the corresponding image, the reference region image, and the corresponding region image for each frame after the change of the scene in the video information; ,
An image processing apparatus comprising:
The image processing apparatus according to claim 1,
The stereoscopic image information is
Information in a first format that can be displayed at the same time in a mode in which the reference image, the corresponding image, the reference area image, and the corresponding area image are superimposed on one screen, and the reference image and the Information of at least one of the information in the second format that makes it possible to display one or more images of the corresponding image, the reference region image, and the corresponding region image and the remaining one or more images in time sequence. An image processing apparatus comprising:
The image processing apparatus according to claim 1 or 2,
The determination unit is
Between the reference image and the corresponding image of one or more frames in one scene from when the first change of the scene is detected by the change detection unit of the video information to when the second change of the scene is detected A reference shift amount is determined based on the shift amount of the pixel position indicating the same part of the object in
The combining unit is
Stereoscopic image information is generated by synthesizing the reference image, the corresponding image, the reference region image, and the corresponding region image for all frames in the one scene of the video information. Processing equipment.
The image processing apparatus according to claim 3,
The determination unit is
Determining the reference shift amount based on a shift amount of a pixel position indicating the same part of the object between the reference image of the first frame in the one scene of the video information and the corresponding image. A featured image processing apparatus.
The image processing apparatus according to claim 3,
The determination unit is
The reference shift amount is determined based on a distribution of shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image included in the one scene frame group of the video information. An image processing apparatus.
The image processing apparatus according to claim 3,
In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detection unit for detecting a pair with the area;
The determination unit is
The reference shift amount is determined based on a shift amount of a pixel position indicating the same part of the object between the reference attention region and the corresponding attention region of the first frame of the scene in the video information. An image processing apparatus.
The image processing apparatus according to claim 3,
In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detection unit for detecting a pair with the area;
The determination unit is
The reference shift amount based on a distribution of shift amounts of pixel positions indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the scene in the video information. Determining an image processing apparatus.
The image processing apparatus according to claim 3,
The determination unit is
Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image and the corresponding image included in the frame group of the one scene of the video information, A representative value of a virtual distance to the surface is calculated, and a deviation amount corresponding to the virtual reference plane when the representative value of the virtual distance is a predetermined value is determined as the reference deviation amount. apparatus.
The image processing apparatus according to claim 3,
The determination unit is
A distribution of first shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image of one or more frames in the one scene of the video information; The reference shift amount based on a distribution of second shift amounts of pixel positions indicating the same part of the object between the reference image and the corresponding image of one or more frames in the previous scene before the first change. Determining an image processing apparatus.
The image processing apparatus according to claim 9,
The determination unit is
An image processing apparatus that determines the reference deviation amount based on a first deviation representative value relating to the distribution of the first deviation amount and a second deviation representative value relating to the distribution of the second deviation amount. .
The image processing apparatus according to claim 9,
The determination unit is
Based on the distribution of the shift amount of the pixel position indicating the same part of the object between the reference image included in the frame group of the previous scene of the video information and the corresponding image, the first virtual reference plane A pixel that calculates a first representative value of a virtual distance to the surface of the object and that indicates the same portion of the object between the reference image and the corresponding image included in the frame group of the scene in the video information A second representative value of a virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the amount of deviation of the position of the position, and a difference between the first representative value and the second representative value is a predetermined value An image processing apparatus, wherein a deviation amount corresponding to the second virtual reference plane in a case where it falls within an allowable range is determined as the reference deviation amount.
The image processing apparatus according to claim 9,
In accordance with a preset detection rule, a reference attention area and corresponding attention that captures the same object that is predicted to attract the user's attention from a set of the reference image and the corresponding image of each frame of the video information An area detector for detecting a pair with the area;
Further comprising
The determination unit is
A first virtual reference plane based on a distribution of displacement amounts of pixels indicating the same part of the object between the reference attention area and the corresponding attention area included in the frame group of the previous scene in the video information A first representative value of a virtual distance from the surface of the object to the surface of the object, and the same object between the reference attention area and the corresponding attention area included in the frame group of the scene in the video information A second representative value of the virtual distance from the second virtual reference plane to the surface of the object is calculated based on the distribution of the shift amount of the pixel position indicating the portion, and the first representative value and the second representative value are calculated. An image processing apparatus, wherein a deviation amount corresponding to the second virtual reference plane when the difference is within a predetermined allowable range is determined as the reference deviation amount.
The image processing apparatus according to claim 1 or 2,
The determination unit is
Determining the reference deviation amount for each frame so that the difference in the reference deviation amount is not more than a predetermined amount between the frames in the vicinity of the change in the scene of the video information;
The second acquisition unit
For each frame in the vicinity of the scene change in the video information, the position of a pixel indicating the same portion of the display object is determined for the frame in the one direction by the determination unit. Region image information related to the reference region image and the corresponding region image having a relationship that the reference deviation amount is shifted,
The combining unit is
An image characterized by combining the reference image, the corresponding image, the reference region image, and the corresponding region image related to each frame for each frame in the vicinity of the scene change in the video information. Processing equipment.
An image processing apparatus according to any one of claims 1 to 13, wherein
The change detector is
An image processing apparatus for detecting a change in the scene according to a change in an image between two or more frames included in the video information.
The image processing apparatus according to claim 14,
The change in the image is
An image processing apparatus comprising one or more of luminance change, color change, and frequency component change.
The image processing apparatus according to claim 14,
The change detection unit is
An image processing apparatus for identifying a face of a person captured in a plurality of frames included in the video information, and detecting a change in the scene according to a change of the person between the plurality of frames. .
The image processing apparatus according to claim 14,
The change detection unit is
An image processing apparatus that detects a change in the scene according to a change in a region of interest that captures an object that is predicted to attract a user's eyes in a plurality of frames included in the video information.
The image processing apparatus according to claim 14,
The change detector is
The scene change is detected in response to at least one of appearance and disappearance of a region of interest that captures an object predicted to attract the user's eyes in a plurality of frames included in the video information. Image processing device.
The image processing apparatus according to any one of claims 1 to 18, wherein
The change detector is
An image processing apparatus for detecting a change in the scene according to a change in a focused area in focus in a plurality of frames included in the video information.
The image processing apparatus according to any one of claims 1 to 19, wherein
The video information is
Including information about audio,
The change detector is
An image processing apparatus for detecting a change in the scene according to a change in the sound.
The image processing apparatus according to claim 20, wherein
The change in voice is
An image processing apparatus comprising one or more of a change in volume, a change in frequency component, and a change in voiceprint.
The image processing apparatus according to any one of claims 1 to 21, comprising:
The video information is
Including metadata,
The change detection unit is
An image processing apparatus that detects a change in the scene according to a change in the metadata.
The image processing apparatus according to claim 22, wherein
The metadata is
Including one or more of subtitle information, chapter information, and shooting condition information,
The change in metadata is
An image processing apparatus comprising one or more changes among a change in caption information, a change in chapter information, and a change in shooting conditions.
(a) obtaining video information including a reference image having a relationship in which the positions of pixels indicating the same part of the object are displaced in one direction and a corresponding image in the information of each frame;
(b) detecting a scene change in the video information;
(c) Based on a shift amount of a pixel position indicating the same part of the object between the reference image and the corresponding image of one or more frames after the scene change in the video information, Determining the amount;
(d) obtaining region image information relating to a reference region image and a corresponding region image having a relationship in which the position of pixels indicating the same part of the display object is shifted in the one direction by the reference shift amount;
(e) generating stereoscopic image information by combining the reference image, the corresponding image, the reference area image, and the corresponding area image for each frame after the scene change in the video information Process,
An image processing method comprising:
A program for causing the information processing apparatus to function as the image processing apparatus according to any one of claims 1 to 23 when executed in a control unit included in the information processing apparatus.