WO2013186882A1

WO2013186882A1 - 3d-image generation method, and 3d-image generation system

Info

Publication number: WO2013186882A1
Application number: PCT/JP2012/065145
Authority: WO
Inventors: 勝江畑; 優一宗宮
Original assignee: 株式会社エム・ソフト
Priority date: 2012-06-13
Filing date: 2012-06-13
Publication date: 2013-12-19

Abstract

A 3D-image generation method having: a region-setting step for setting a plurality of regions in an original image; a characteristic-information acquisition step for acquiring characteristic information for each pixel configuring the original image; a depth-information generation step for generating depth information for each pixel on the basis of the characteristic information, in each of the plurality of regions; and a 3D-image generation step for generating a 3D image in which the position of each pixel has been changed, on the basis of the depth information. As a result, the provided 3D-image generation method and 3D-image generation system are capable of generating, from an original image, a 3D image causing a viewer to perceive a natural three-dimensional sensation.

Description

Stereoscopic image generation method and stereoscopic image generation system

The present invention relates to a stereoscopic image generation method and a stereoscopic image generation system for generating a stereoscopic image that causes a viewer to perceive a stereoscopic effect by parallax.

In recent years, binocular parallax-type stereoscopic images that allow viewers to perceive a stereoscopic effect by visually recognizing different images for the right eye and the left eye have come to be widely used in fields such as movies and television broadcasting. Yes. In addition, a technique for causing an observer to perceive a stereoscopic effect using a multi-view (multi-viewpoint) stereoscopic image that changes an image viewed by the observer depending on a viewing angle is also used in, for example, an autostereoscopic device. In addition, multi-view parallax stereoscopic images combining these binocular parallax and multi-view types are also being used. In the case of a parallax stereoscopic image, the image is composed of a right-eye image to be visually recognized by the right eye and a left-eye image to be visually recognized by the left eye. By shifting (shifting) in the direction, the viewer (viewer) viewing the image perceives a stereoscopic effect.

Conventionally, a parallax-type stereoscopic image is generally generated by arranging two cameras side by side and simultaneously capturing a right-eye image and a left-eye image. In this case, since a right-eye image and a left-eye image having substantially the same parallax as human binocular parallax can be directly obtained, a natural stereoscopic image that does not give the viewer a sense of incongruity is generated. be able to.

However, in the method of taking the right-eye image and the left-eye image with two cameras in this way, the two cameras having exactly the same specifications are positioned and arranged accurately, and the two are completely synchronized. It is necessary to shoot in the state. For this reason, it is necessary to prepare a large number of special dedicated devices together with specialized staff when shooting, which not only increases the shooting cost but also requires a lot of time to set and adjust the camera and other various devices. It was.

Also, a conventional multi-view stereoscopic image is generally generated by arranging cameras at many viewpoints and simultaneously shooting multi-view images. However, in this method of taking a multi-viewpoint image with a plurality of cameras, a plurality of cameras with exactly the same specifications are positioned and arranged accurately, and all the images are photographed in a completely synchronized state. There was a problem that it was necessary.

Furthermore, when it comes to multi-view parallax stereoscopic video, it is necessary to place two cameras at various viewpoints to shoot video that includes parallax. Therefore, unless there is a very specific purpose, the situation is generally far from widespread.

On the other hand, a method of generating a binocular parallax type image for the right eye and an image for the left eye by performing image processing on an image photographed as usual by one camera (for example, Patent Document 1). In this method, depth information (depth value) is first set for each pixel constituting the original image, and the position of the subject in both images is changed by changing the horizontal position of each pixel according to this depth information. A right-eye image and a left-eye image shifted in accordance with the binocular parallax are generated.

According to this method, since a stereoscopic image can be generated from a normal original image taken by a general camera, the photographing cost can be reduced and the photographing time can be shortened. It is also possible to generate a stereoscopic image from existing content such as a movie, or convert a general television broadcast into a stereoscopic image and display it on a television screen.

JP 2002-123842 A

However, in the conventional method of generating a stereoscopic image from a normal original image, for example, the depth information value changes at the boundary between the person who is the subject and the background, and the discontinuity (discontinuity) occurs in the depth. There was a problem.

When such a discontinuity in depth occurs, only the perspective between the person and the background is emphasized, and the person can be perceived as a flat surface. It becomes. In addition, when changing the position of each pixel in the right-eye image and the left-eye image, the movement amount of the pixel included in the person and the pixel included in the background is greatly different. A large gap (deficiency) will occur in the background.

In the conventional method, in order to avoid such a gap, there is a method in which a blurring process for a boundary part or a process for enlarging or deforming a person or the like or a background image is performed. When parallax is performed, parallax is not given to the boundary portion, so that the viewer may feel uncomfortable about the boundary portion. In addition, this type of boundary processing has a problem of degrading the image quality of the stereoscopic image. In addition, these blurring processing and enlargement / deformation processing increase the workload of an operator who processes a stereoscopic image on software. Therefore, when generating a stereoscopic image from an original image of a multi-view type or a multi-view parallax type stereoscopic image, there is a problem that an operator's processing work becomes enormous.

In the conventional method, the hue, saturation, or brightness (saturation in Patent Document 1) of each pixel constituting the original image is generally used as the depth information of each pixel. Depth information changes greatly at the boundary between a person and the background. As a result, there was a problem that the disconnection of depth was easily emphasized.

Furthermore, the original image includes elements such as the creator's intention (will) and story. In this case, do not emphasize important subjects that the viewer wants to see firmly, emphasize the focused area in the original image, and do not emphasize unimportant or blurred parts. It is important to make adjustments. However, in the conventional method, since depth information is mechanically calculated from the entire original image and used as it is, there is a problem that it is difficult to reflect the intention of the creator in the stereoscopic effect.

In view of such circumstances, the present invention intends to provide a stereoscopic image generation method and a stereoscopic image generation system capable of generating a stereoscopic image from an original image that allows a viewer to perceive a natural stereoscopic effect. It is.

The present invention that achieves the above object includes an area setting step for setting a plurality of areas in the original image, a feature information acquisition step for acquiring feature information of each pixel constituting the original image, and the plurality of areas. A depth information generating step for generating depth information for each pixel based on the feature information; and a stereoscopic image generating step for generating a stereoscopic image in which the position of each pixel is changed based on the depth information. A method for generating a stereoscopic image, comprising:

In the stereoscopic image generation method that achieves the above object, in the region setting step of the invention, the region is set for each subject included in the original image.

In the stereoscopic image generation method that achieves the above object, the stereoscopic image generation step of the invention includes an individual image generation step of generating an individual stereoscopic image in which the position of the pixel is changed for each of the plurality of regions. A stereoscopic image combining step of generating the stereoscopic image by combining the plurality of individual stereoscopic images generated for each of the plurality of regions.

In the stereoscopic image generation method that achieves the above object, in the stereoscopic image synthesis step of the invention described above, based on the front-rear relationship of the plurality of individual stereoscopic images, It synthesize | combines so that the said separate stereoscopic vision image of the side may permeate | transmit.

In the stereoscopic image generating method that achieves the above object, the stereoscopic image combining step of the invention includes a depth information combining step of combining the depth information generated for each of the plurality of regions, The stereoscopic image is generated from the depth information thus obtained.

In the stereoscopic image generation method that achieves the above object, the region setting step of the invention includes the color of the pixel in the region on the back side with respect to the pixel in which the region on the front side and the region on the back side overlap. A back color value estimation step for estimating the value is included.

In the stereoscopic image generation method that achieves the above object, in the depth information generation step of the invention, the depth correlation adjustment is performed in which the depth information generated for each region is adjusted based on a relative front-rear relationship of the plurality of regions. It has a step.

In the stereoscopic image generation method that achieves the above object, the depth information generation step of the invention is based on an edge setting step of setting an edge between a pair of the pixels extracted from the original image, and the feature information. A weight information setting step for setting weight information for the edge, a start pixel selection step for selecting a start pixel from the pixels, and a path for the weight information from the start pixel to the pixels. And a path information setting step for setting path information for each pixel, and a depth determination step for setting the depth information for each pixel based on the path information.

In the stereoscopic image generating method that achieves the above object, in the start pixel selecting step of the invention, the start pixel is included in an area that indicates the innermost part of the plurality of areas or an area that indicates the frontmost part. The pixel is selected.

In the stereoscopic image generation method that achieves the above object, the start pixel selection step of the invention is characterized in that a plurality of the start pixels are selected.

The present invention that achieves the above object is constituted by an electronic computer, an area setting means for setting a plurality of areas in an original image, a feature information acquisition means for acquiring feature information of each pixel constituting the original image, A depth information generating unit that generates depth information for each pixel based on the feature information and a stereoscopic image that generates a stereoscopic image in which the position of each pixel is changed based on the depth information for each of a plurality of regions. And a stereoscopic image generation system characterized by comprising an image generation means.

According to the present invention, it is possible to produce an excellent effect that a stereoscopic image that allows a viewer to perceive a natural stereoscopic effect can be generated from an original image.

It is a block diagram which shows the hardware constitutions of the stereoscopic vision image generation system which concerns on 1st Embodiment of this invention. It is a block diagram which shows the program structure and functional structure of the stereoscopic vision image generation system. It is a block diagram which shows the area | region selection of the original image by the same stereoscopic vision image generation system. It is a block diagram which shows the correction method of the individual original image by the same stereoscopic vision image generation system. It is a block diagram which shows the correction method of the individual original image by the same stereoscopic vision image generation system. It is a figure which shows the production | generation concept of the separate depth map in the same stereoscopic vision image generation system. It is a figure which shows the production | generation concept of the depth map in the same stereoscopic vision image generation system. It is a figure which shows the procedure which calculates the shortest path | route information in the same stereoscopic vision image generation system. It is a figure which shows the example which calculates the shortest path information in the same stereoscopic vision image generation system. It is a figure which shows the state of the depth information before adjustment in the stereoscopic vision image generation system, (B) The state of the depth information after adjustment. It is a figure which shows the state of the depth information before adjustment in the stereoscopic vision image generation system, (B) The state of the depth information after adjustment. It is a figure which shows the state of the input screen for performing correlation adjustment of the depth information in the same stereoscopic vision image generation system. It is a figure which shows the individual stereoscopic image production | generation procedure by the same stereoscopic vision image generation system. It is a figure which shows the stereoscopic vision image generation procedure by the stereoscopic vision image generation system. It is a figure which shows the synthetic | combination method of the stereoscopic vision image by the stereoscopic vision image generation system. It is a flowchart which shows the stereoscopic vision image generation procedure by the stereoscopic vision image generation system. It is a block diagram which shows the function structure by the other example of the same stereoscopic vision image generation system. It is a block diagram which shows the flow of the stereoscopic vision image generation by the other example of the same stereoscopic vision image generation system. It is a block diagram which shows the flow of the stereoscopic vision image generation by the other example of the same stereoscopic vision image generation system. It is a block diagram which shows the flow of the stereoscopic vision image generation by the other example of the same stereoscopic vision image generation system.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

FIG. 1 shows an internal configuration of a computer 10 constituting the stereoscopic image generation system 1 according to the first embodiment. The computer 10 includes a CPU 12, a first storage medium 14, a second storage medium 16, a third storage medium 18, an input device 20, a display device 22, an input / output interface 24, and a bus 26. The CPU 12 is a so-called central processing unit, and executes various programs to realize various functions of the stereoscopic image generation system 1. The first storage medium 14 is a so-called RAM (Random Access Memory) and is a memory used as a work area of the CPU 12. The second storage medium 16 is a so-called ROM (Read Only Memory) and is a memory for storing a basic program executed by the CPU 12. The third storage medium 18 is composed of a hard disk device incorporating a magnetic disk, a disk device accommodating a CD, DVD, or BD, a non-volatile semiconductor flash memory device, and the like. OS (operating system) program that realizes basic operations, a stereoscopic image generation program executed by the CPU 12 when generating a stereoscopic image, a depth map and a stereoscopic image used in this stereoscopic image generation program Various data such as are stored. The input device 20 is a keyboard or a mouse, and is a device for appropriately inputting information to the stereoscopic image generation system 1 by an operator. The display device 22 is a display and provides a visual interface to the worker. The input / output interface 24 is an interface for inputting original image data necessary for the stereoscopic image generation program, and for outputting a depth map and a stereoscopic image generated by the stereoscopic image generation program to the outside. is there. The bus 26 is a wiring for integrally connecting the CPU 12, the first storage medium 14, the second storage medium 16, the third storage medium 18, the input device 20, the display device 22, the input / output interface 24, and the like for communication. It becomes.

In FIG. 2, the program configuration of the stereoscopic image generation program stored in the third storage medium 18 and the stereoscopic image generation system 1 realized by executing the stereoscopic image generation program by the CPU 12. The functional configuration realized by is shown. 3 to 5 conceptually show a stereoscopic image generation method executed by the stereoscopic image generation system 1. FIG. In this stereoscopic image generation system 1, since the configuration of the stereoscopic image generation program and the functional configuration thereof are in a correspondence relationship, the functional configuration of the stereoscopic image generation system 1 will be described here to explain the program. Description is omitted.

The stereoscopic image generation system 1 includes a region selection unit 110 realized by a region selection program, a feature information acquisition unit 140 realized by a feature information acquisition program, a depth information generation unit 160 realized by a depth information generation program, A stereoscopic image generation unit 180 realized by a visual image generation program is provided.

The area selection unit 110 selects a plurality of areas for the original image 200. In particular, in the present embodiment, the area selection unit 110 selects a plurality of areas 202A to 202E with the subject included in the original image 200 as a main unit, and the areas 202A to 202E overlap each other. Yes. Specifically, as shown in FIG. 3, the first region 202A occupies the upper side of the original image 202 and is located on the most back side including the mountain. The second region 202B and the third region 202C are located on the front side of the first region 202A, and occupy the left and right sides along both sides of the central road. The fourth area 202D is a central road that occupies the lower side of the original image 202, and is located at a depth similar to that of the second and

third areas

202B and 202C. The fifth region 202E is located on the foremost side in a state where it overlaps with the third region 202C and the fourth region 202D, and coincides with the female contour. Therefore, as shown in FIG. 4A, when obtaining the individual original images 201A to 201E separated for each of the areas 202A to 202E from the original image 200, the individual original images of the third area 202C and the fourth area 202D are obtained. In 201C and 201D, the overlapping area X that overlaps the individual original image 201E in the fifth area 202E is in a state in which the color value of the pixel is missing.

In this embodiment, the area selection unit 110 includes a back color value estimation unit 112 in order to compensate for the lack of color values. The back surface color value estimation unit 112 estimates the color values of the pixels in the back side region with respect to the pixels in the overlapping region X where the front side region and the back side region overlap. As shown in FIG. 5, for example, when the original image 200 is a moving image including original images 201A to 201C of other frames, the color of the pixel T in the overlapping region X in the individual original image 201C in the third region 202C. The value is estimated by referring to the pixels TA to TC at the same position in the other original images 200A to 200C. Here, the pixel TC in the original image 200C can recognize the color value of the row of trees because the woman on the front side has moved to the left side. The color value of the pixel TC of the original image 200C is applied to the color value of the pixel T of the original image 200. In this way, the color values of all the pixels 204 included in the overlapping area X, that is, the image of the subject in the entire area are completed. As a result of the above, as shown in FIG. 4B, corrected original original images 203A to 203E in which the missing color values in the overlapping region X are eliminated from the original image 200 are obtained.

In this embodiment, the case where the color value is estimated from the original images 200A to 200C of other frames in the moving image is illustrated, but the present invention is not limited to this. For example, the color value of the overlapping region X in the original image 200 can be estimated from the color values of the surrounding pixels 204. In addition, it is not necessary to estimate all the pixels 204 in the overlapping area X, and the estimation may be performed mainly on the pixels 204 near the contour (periphery) of the overlapping area X.

The feature information acquisition unit 140 acquires the feature information 240 of each pixel 204 constituting the original image 200 as shown in FIGS. In particular, in the present embodiment, feature information 240 is acquired for each pixel 204 of the corrected individual original images 203A to 203E. This feature information 240 includes, for example, characteristic information that each pixel 204 has independently such as the hue, brightness, saturation, and color space of each pixel 204, and pixels around the target pixel 204. 204. Characteristic information derived from the relationship of 204, or in the case of a moving image having a plurality of frames, characteristic information derived from temporal changes of each pixel 204 (relationship with pixels at the same position in the previous and subsequent frames), etc. It is also possible to use.

The depth information generation unit 160 sets the depth information 270 for each pixel 204 based on the feature information 240 acquired for each pixel 204, with the areas 202A to 202E as units. Specifically, the depth information 270 is set for each pixel 204 of the corrected individual original images 203A to 203E. As a result, individual depth maps 265A to 265E are generated as a set of depth information 270 corresponding to the corrected individual original images 203A to 203E.

Returning to FIG. 2, the depth information generation unit 160 further includes an edge setting unit 162, a weight information setting unit 164, a start pixel selection unit 166, a path information setting unit 168, a depth determination unit 170, and a depth correlation adjustment unit. 172.

As shown in FIG. 8, the edge setting unit 162 sets an edge 262 between a pair of pixels 204 extracted from the original image 200. The edge 262 conceptually means a line connecting a pair of pixels 204 or a path connecting both. Considering the graph theory, the pair of pixels 204 are nodes or vertices, and the edges 262 are branches or edges. In the present embodiment, for each pixel 204, an edge 262 for a total of four pixels 204 adjacent in the vertical and horizontal directions is set. Note that the present invention is not limited to the case where the edge 262 is set for each pixel 204 that is vertically and horizontally adjacent to each other, but the diagonally upper right, the upper left, the lower right, and the lower left. An edge 262 can be set for adjacent pixels 204, or an edge 262 can be set for a total of eight pixels 204 obtained by combining these with upper, lower, left and right. In addition, the edge 262 is not necessarily set between adjacent pixels 204, and the edge 262 with respect to a pair of pixels 204 having a certain distance by skipping pixels in the middle, that is, the pixels 204 that have been thinned out. Can also be set. Needless to say, an edge 262 can be set between a pair of pixels 204 located far away like an enclave.

The weight information setting unit 164 sets the weight information 264 for the edge 262 based on a pair of feature information 240 connecting the edges 262. In this embodiment, the weight information 264 uses the difference between the feature information 240 of the pair of pixels 204 connecting the edges 262. The weight information 264 increases as the difference increases, and the weight information 264 decreases as the difference decreases. The weight information 264 is not limited to the “difference” between the pair of feature information 240 at both ends of the edge 262, and various functions that calculate weight information using the pair of feature information 240 are used. Thus, the weight information 264 can be set.

The start pixel selection unit 166 selects a start pixel 266 from each pixel 204 in the original image 200. The start pixel 266 becomes a start point when setting the shortest path information 268 described later. Here, since the original image 200 is separated into a plurality of corrected individual original images 203A to 203E by the area selection unit 110, the start pixel selection unit 166 has a start pixel 266A for each of the corrected individual original images 203A to 203E. Select ~ 266E.

The start pixels 266A to 266E can be freely selected from the pixels 204 in the corrected individual original images 203A to 203E. For example, as shown in FIG. 7, each corrected individual original image 203A is selected. It is preferable to select from a pixel group existing in the innermost area 200A located on the farthest side in 203E or a pixel group present in the foremost area 200B located on the foremost side. Furthermore, as shown in the corrected individual original image 203A in FIG. 7, it is also possible to collectively select all the plurality of pixels 204 included in the predetermined area 200C as one start pixel 266.

In this embodiment, one pixel is selected as the start pixels 266A to 266E from the back side area 200A in the corrected individual original images 203A to 203E.

The path information setting unit 168 uses the weight information 264 of the path (edge 262) from the start pixel 266A to 266E to each pixel 204 for each of the plurality of regions 202A to 202E, that is, for each of the corrected individual original images 203A to 203E. Then, the shortest path is calculated, and the shortest path information 268 is set for each pixel 204 in the corrected individual original images 203A to 203E. A specific example of this will be described with reference to FIG.

In order to simplify the explanation, it is assumed here that the original image 200 is composed of nine pixels 204A to 204I of 3 rows × 3 columns, and the upper left pixel 204A is the pixel located on the farthest side. Therefore, a case where the pixel is set as the start pixel 266 will be considered. The twelve edges 262 (1) to 262 (12) connecting the pixels 204A to 204I use 1 to 10 by using the relative difference of characteristic information (not shown) held by the pixels 204A to 204I. The weight information 264 is preset. Here, considering the path of the pixel 204D in the upper center stage, as the path connecting the start pixel 204A and the pixel 204D, for example, the first path R1 including only the edge 262 (3) directly connecting the start pixel 204A and the pixel 204D, and the start pixel A second path R2 including three edges 262 (1), 262 (4), and 262 (6) connecting 204A, pixel 204B, pixel 204E, and pixel 204D is provided. The sum of the weight information 264 of the first route R1 is “1”, and the sum of the weight information 264 of the second route R2 is “10” of 3 + 2 + 5. As described above, the sum of the weight information 264 is calculated in the same manner for all possible paths between the start pixel 204A and the pixel 204D, and the shortest path is the smallest value. Here, the first route R1 is the shortest route. As a result, “1”, which is the sum of the weight information 264 on the shortest route, is set as the shortest route information 268 in the pixel 204D.

The route information setting unit 168 sets the shortest route information 268 for all the pixels 204A to 204I by the above method. As a result, the pixel 204A is “0”, the pixel 204B is “3”, the pixel 204C is “11”, the pixel 204D is “1”, the pixel 204E is “5”, the pixel 204F is “10”, and the pixel 204G is “5”. ", The shortest path information 268 of" 12 "is set for the pixel 204H and" 12 "is set for the pixel 204I.

The depth determination unit 170 sets the depth information 270 for each pixel 204 based on the shortest path information 268. In the present embodiment, the depth determination unit 170 uses the shortest path information 268 as the depth information 270 as it is.

Particularly, here, the depth information 270 can be determined independently for each of the areas 202A to 202E set in the original image 200. For example, as in the present embodiment, in the original image 200, a central woman, left and right rows of trees, a central road, and a sky on the background side are partially present, and these objects are clearly three-dimensional. When the continuity should not be ensured, the depth information 270 can be uniquely set by selecting each subject in the areas 202A to 202E in this way. As a result, in the regions 202A to 202E, the optimum start pixels 266A to 266E are selected and the depth information 270 is calculated by the shortest path method, so that the depth information 270 is continuously and extremely delicate. The depth maps 265A to 265E are visual maps of the depth information 270 set for each pixel 204.

A value obtained by correcting the shortest path information 268 as needed can be used as the depth information 270. For example, different correction functions are prepared depending on whether the original image 200 is an image of an outdoor landscape or an image of an indoor space. The depth information 270 can also be calculated by applying a correction function selected according to the content. It is also possible to calculate depth information 270 by applying different correction functions according to the type of subject for each of the corrected individual original images 203A to 203E.

In particular, when the start pixels 266A to 266E are set for each of the regions 202A to 202E as in the present embodiment, the shortest path information 268 of each of the start pixels 266A to 266E is “zero”. Therefore, if this is adopted as the depth information 270 as it is, there is a possibility that the relative depth sensation is shifted between the plurality of individual depth maps 265A to 265E. Accordingly, it is preferable that the depth determining unit 170 determines the depth information 270 after correcting the shortest path information 268 as a whole for each of the individual depth maps 265A to 265E. For example, as compared with the first individual depth map 265A of the first area 202A on the background side, all the pixels 204 of the fifth individual depth map 265E of the fifth area 202E on the front side are constant with respect to the shortest path information 268. After adding the correction value for the front side shift, this is used as the depth information 270. In this way, by correcting the sense of depth in units of the individual depth maps 265A to 265E, while providing a delicate and smooth stereoscopic effect in the regions 202A to 202E, an optimum distance is obtained between the plurality of individual depth maps 265A to 265E. A clear and clear three-dimensional effect can be imparted by the difference.

An example of the depth information 270 of the individual depth maps 265A to 265E when the farthest distance is defined as 0 and the closest distance is defined as 1 will be described. FIG. 10 schematically shows a case where an actual scene obtained by photographing the original image 200 with the camera C is viewed from above. As shown in FIGS. 10A and 11A, the first area 202A in the original image 200 has the sky S and the mountain M as subjects, and the individual depth map 265A corresponding to the first area 202A. The depth information 270 is set from the farthest 0 to the nearest 1. The second area 202B and the third area 202C have a row of trees T as subjects, and the depth information 270 of the individual depth maps 265B and 265C corresponding thereto is set from the farthest 0 to the nearest 1. In the fourth area 202D, the road L is the subject, and the depth information 270 of the individual depth map 265D corresponding to the fourth area 202D is set from the farthest 0 to the nearest 1. In the fifth area 202E, the female H is the subject, and the depth information 270 of the individual depth map 265E corresponding to the fifth area 202E is set from the farthest 0 to the nearest 1.

That is, since the depth determining unit 170 determines the depth information 270 independently for each of the areas 202A to 202E, the relative scales are different. As a result, if the individual depth maps 265A to 265E are used as they are, an error may occur in the relative depth relationship between the regions 202A to 202E.

Therefore, the depth correlation adjusting unit 172 adjusts (corrects) the depth information 270 determined for each of the areas 202A to 202E based on the relative front-rear relationship of these areas 202A to 202E. Specific correction examples in the depth correlation adjustment unit 172 are shown in FIGS. 10B and 11B. The depth information 270 of the individual depth map 265A corresponding to the first region 202A corrects the farthest place as 0 and the nearest place as 0.1. That is, the first area 202A is positioned on the farthest side, but its depth feeling (depth width) is set to 0.1, so that almost no three-dimensional feeling is felt. In fact, even with the human eye, mountains and clouds that are very far away cannot recognize a three-dimensional stereoscopic effect. The depth information 270 of the individual depth maps 265B and 265C corresponding to the second area 202B and the third area 202C corrects the farthest place as 0.3 and the nearest place as 0.7.

The depth information 270 of the individual depth map 265D corresponding to the fourth region 202D corrects the farthest place as 0 and the nearest place as 1. In the original positional relationship, since the subject road L is a part of the overall depth, it does not occupy the entire depth as 0 to 1. However, here, it is determined that it is important to emphasize the depth feeling of the road L from the intention of the producer in the original image 200, and the depth width is emphasized. In the depth information 270 of the individual depth map 265E corresponding to the fifth area 202E, the farthest place is set to 0.7 and the nearest place is set to 0.9.

The correlation adjustment by the depth correlation adjustment unit 172 is performed by displaying the individual depth maps 265A to 265E shown in FIG. 11A on the display device 22 and correcting the innermost value and the nearest value. It is preferable to prompt the input of a numerical value and a scale to be performed. Further, for example, as shown in FIG. 12, by displaying a bar chart indicating the depth information 270 for each of the individual depth maps 265A to 265E on the display device 22 and moving the setting range of the bar chart on the screen, Correlation adjustment may be performed. Thereafter, a stereoscopic image is generated using the adjusted individual depth maps 265A to 265E.

The stereoscopic image generation unit 180 changes the position of each pixel 204 based on the plurality of individual depth maps 265A to 265E generated for each of the plurality of regions 202A to 202E, and the image 280B for the right eye and the image 280B for the left eye. The stereoscopic image 280 comprised from these is produced | generated.

More specifically, the stereoscopic image generation unit 180 of this embodiment includes an individual image generation unit 182 and a stereoscopic image synthesis unit 184. As shown in FIG. 13, the individual image generation unit 182 changes the individual stereoscopic images 282A to 282E in which the positions of the pixels 204 of the corrected individual original images 203A to 203E are changed based on the individual depth maps 265A to 265E. (Individual image for right eye and individual image for left eye) are generated. By applying the generation of the individual stereoscopic images 282A to 282E to all the original images 200 (all frames in the moving image), the operator confirms the completion of the individual stereoscopic images 282A to 282E in units of the areas 202A to 202E. To do.

More specifically, the individual stereoscopic images 282A to 282E use the depth information 270 of the individual depth maps 265A to 265E, and the horizontal displacements of the pixels 204 located on the far side in the corrected individual original images 203A to 203E. The amount (shift amount) is decreased, and the displacement amount in the horizontal direction is increased for the pixel 204 located on the near side. As a result, the individual stereoscopic images 282A to 282E can include parallax.

Thereafter, as shown in FIG. 14, the stereoscopic image combining unit 184 combines these individual stereoscopic images 282A to 282E to generate a stereoscopic image 280 (right eye image 280A and left eye image 280B). . In this synthesis, the right eye individual images in the individual stereoscopic images 282A to 282E are synthesized to generate the right eye image 280A, and the left eye individual images in the individual stereoscopic images 282A to 282E are synthesized. The left-eye image 280B is generated.

In the present embodiment, the stereoscopic image composition unit 184 also performs individual stereoscopic viewing on the rear side with respect to the individual stereoscopic images 282A to 282E on the front side based on the front-rear relationship of the plurality of individual stereoscopic images 282A to 282E. The images 282A to 282E are transmitted. For example, as shown exaggeratedly in FIG. 15, when the third individual stereoscopic image 282C and the fifth individual stereoscopic image 282E are synthesized, by using transmission synthesis (for example, alpha channel synthesis), the rear side The row tree of the third individual stereoscopic image 282C that is to be hidden at the position can be seen through the woman of the fifth individual stereoscopic image 282E. Here, for convenience of explanation, the entire row of trees (subject) on the back side is transmitted, but this transparent composition processing is a contour edge that becomes a part of the area 202E of the fifth individual stereoscopic image 282E on the front side. Is transparently transmitted.

In this way, in the combined stereoscopic image 280, the stereoscopic effect of the fifth individual stereoscopic image 282E on the front side and the stereoscopic effect of the third individual stereoscopic image 282C on the back side are directly overlapped in the vicinity of the overlapping boundary. Remain. As a result, the disconnection of the depth and the gap are automatically suppressed at the boundary between the subjects separated from each other in the front-rear direction.

With regard to the stereoscopic image 280 generated through the above steps, the right eye image 280A is shown on the right eye of the viewer (viewer) of the image viewer, and the left eye image 280B is shown on the left eye. The parallax contained in the image is processed in the brain to perceive a stereoscopic effect.

Next, a procedure for generating a stereoscopic image by the stereoscopic image generation system 1 will be described with reference to FIG.

First, in step 300, a moving image composed of a plurality of original images (frames) 200 is registered in the third storage medium 18 via the input / output interface 24 of the stereoscopic image generation system 1. Next, in step 301, a plurality of areas 202 are set in the original image 200, and the color values of the overlapping area X are corrected for the individual original image 201 configured in units of the areas 202, and then corrected. The individual original image 203 is acquired (area setting step). Thereafter, in step 302, the feature information processing unit 140 extracts the first original image (frame) 200 from the moving image, and obtains the feature information 240 of each pixel 204 of the corrected individual original image 203 constituting this. (Feature information acquisition step).

Next, in step 310, an individual depth map 265 in which depth information 270 is set for each pixel 204 is generated based on the feature information 240 (depth information generation step). The depth information generation step 310 is divided into steps 312 to 320 in detail.

First, in step 312, an edge 262 is set between two approaching pixels 204 (edge setting step). Thereafter, in step 314, weight information 264 is set for the edge 262 based on the feature information 240 already set for each pixel 204 (weight information setting step). Next, in step 316, a start pixel 266 is selected from each pixel 204 of the corrected individual original image 203 (start pixel selection step), and further, the process proceeds to step 318, from the start pixel 266 to each pixel 204. The shortest path that minimizes the cumulative value of the weight information 264 on the path is calculated, and the shortest path information 268 that is the minimum cumulative value of the weight information 264 is set for each pixel 204 for which the shortest path is calculated. (Route information setting step). Thereafter, in step 320, depth information 270 is set for each pixel 204 using the shortest path information 268, and the depth information 270 is aggregated to generate an individual depth map 265 for the pixel group (depth determination step). Finally, in step 322, the depth information 270 of the individual depth map 265 generated for each region 202 is adjusted based on the relative anteroposterior relationship of the plurality of regions (depth correlation adjustment step).

When the above depth information generation step 310 is completed, the process proceeds to step 330, where the right eye image 280A obtained by shifting the position of each pixel 204 based on the determined depth information 270 (individual depth map 265) and A stereoscopic image including the left-eye image 280B is generated (stereoscopic image generation step). The stereoscopic image generation step 330 is divided into an individual image generation step 332 and a stereoscopic image synthesis step 334 in detail. In the individual image generation step 332, an individual stereoscopic image 282 in which the position of the pixel 204 is changed is generated using the corrected individual original image 203 and the individual depth map 265 set for each region 202. Next, in the stereoscopic image synthesis step 334, these individual stereoscopic images 282 are transparently synthesized to generate a stereoscopic image 280.

In this example, the depth information 270 is aggregated to generate the individual depth map 265, and the individual depth map 265 is used to generate the individual stereoscopic image 282. However, the present invention is not limited to this. Not. It is possible to generate the individual stereoscopic image 282 by using the depth information 270 as it is without making a depth map. Further, it is not necessary to wait for the stereoscopic image generation step 330 until all the depth information 270 is generated in units of the corrected individual original images 203, and the depth information 270 set in units of the pixels 204 is sequentially displayed in the stereoscopic view. It is also possible to sequentially generate the individual stereoscopic image 282 and the stereoscopic image 280 for each pixel 204 by applying to the image generation step 330. Of course, as shown in the present embodiment, it is also preferable that the depth information 270 is imaged or visualized by the individual depth map 265 as necessary, and the operator of the stereoscopic image generation system 1 can visually check the setting status of the depth information 270. This is convenient for checking the situation.

When the generation of the stereoscopic image 280 from the original image 200 is completed by the above procedure, the process proceeds to step 340 to determine whether or not the current original image 200 is the last frame in the moving image, and is not the last frame. In this case, the process returns to step 302, the next original image (frame) 200 is extracted, and the same steps as described above are repeated. On the other hand, when the original image 200 that generated the stereoscopic image 280 is the last frame in the moving image, the stereoscopic image generation procedure is terminated.

As described above, according to the stereoscopic image generation system 1 of the present embodiment, a plurality of areas 202 are set for the original image 200, and the depth information 270 is determined in units of the areas 202. As a result, since the depth information 270 can be set finely in the area 202, the stereoscopic effect of the stereoscopic image 280 can be set with high accuracy. In particular, in the present embodiment, the individual stereoscopic image 282 is generated for each region 202 and then combined to complete the stereoscopic image 280. In this way, the stereoscopic effect is adjusted and confirmed in detail in units of the areas 202A to 202E, and the completeness of the individual stereoscopic images 282A to 282E is increased, and then the final image is synthesized as it is without losing the stereoscopic effect. A stereoscopic image 280 (right-eye image 280A, left-eye image 280B) can be generated. As a result, a stereoscopic image 280 with less discomfort can be obtained. In addition, the generation time of the individual stereoscopic image 282 can be significantly shortened compared to the time for generating the entire stereoscopic image 280 together. Therefore, the operator can proceed with the work while efficiently confirming the stereoscopic effect in units of the area 202.

Particularly, according to the stereoscopic image generation system 1, the depth information 270 set for each region 202 is adjusted based on the front-rear relationship between the plurality of regions 202. As a result, the overall stereoscopic effect can be freely adjusted, and the intention (will) of the creator of the original image 200 can be reflected in the stereoscopic effect. For example, in the original image 200, the region 202 including the focused subject can be set to have a large depth difference, so that a stereoscopic effect stronger than actual can be generated. In addition, regarding the region 202 including the subject out of focus, it is possible to set a small depth difference to weaken the stereoscopic effect. Similarly, it is possible to adjust the depth information by arranging the region 202 to be emphasized on the front side of the actual image, or arranging the region 202 not to be emphasized on the deeper side than the actual image.

Further, in the stereoscopic image generation system 1, when the individual stereoscopic image 282 is synthesized, the individual stereoscopic image on the back side is compared to the individual stereoscopic image 282 on the front side for a portion where the plurality of regions 202 overlap each other. 282 is transmitted. In this way, since the three-dimensional effect is also expressed, it is possible to produce a natural depth feeling as if a part of the subject on the back side wraps around the back side of the subject on the front side. In particular, here, since the color value of the back side which is originally hidden can be estimated for the pixel 204 where the front side region 202 and the back side region 202 overlap, the color value of one pixel 204 is set in the depth direction. Can be multiplexed. As a result, the above-described wraparound effect can be further emphasized by individually imparting a stereoscopic effect to the multiplexed color values and transmitting them.

Furthermore, according to the present embodiment, the depth information 270 that is the basis of the stereoscopic effect when the stereoscopic image 280 is generated is the shortest calculated from the accumulated value of the weight information 264 along the shortest path between the plurality of pixels 204. It is generated using the route information 268. As a result, the depth information 270 can be made continuous with respect to the set of pixels 204 connected by the edge 262. A natural depth feeling is given to the stereoscopic image 280 generated using the depth information 270. In particular, it is possible to suppress a discontinuity (discontinuity) phenomenon in a stereoscopic image that occurs due to an extreme change in depth information at the boundary between a person on the front side and a background on the back side. The stereoscopic image 280 can be imparted with a stereoscopic effect with little discomfort for the user. Further, with the suppression of this disconnection phenomenon, it is possible to suppress the occurrence of a gap in the generated stereoscopic image 280, and image correction (blurring and image deformation) for filling the gap is also reduced. Deterioration of image quality is suppressed.

Furthermore, in this stereoscopic image generation system 1, the start pixel 266 is selected from the region 200A indicating the innermost part or the region 200B indicating the foremost part in the original image 200 (the corrected individual original image 203). The start pixel 266 serves as a reference point (zero point) when calculating the shortest path information 268 of the other pixels 204. By selecting the start pixel 266 from the backmost or frontmost pixel 204, it is possible to generate depth information 270 that does not give a sense of incongruity. The selection of the start pixel 266 causes the display device (display) 22 to display the original image 200 and prompts the operator of the stereoscopic image generation system 1 to select the start pixel 266 considered to be the farthest or foremost. You may do it. In addition, the stereoscopic image generation system 1 may analyze the original image 200 to estimate the

regions

200A and 200B that will be the farthest or the foremost, and automatically select the start pixel 266 from the

regions

200A and 200B. .

As a result, since all the depth information 270 can be calculated almost automatically, the work burden on the operator of the stereoscopic image generation system 1 is greatly reduced. Note that the conventional system requires a complicated operation to correct the depth information 270 while confirming the stereoscopic image.

Furthermore, according to the present embodiment, since a plurality of start pixels 266 that serve as reference values for calculating a sense of depth are selected for each region 202, using them in any combination allows more flexibility in units of regions 202. The depth information 270 can be determined. In other words, the optimal start pixel 266 for each area 202 can be selected in consideration of the scene of the original image 200 and the subject included in each area 202, so that a more natural stereoscopic effect can be produced.

In the present embodiment, the case where one pixel is selected as the start pixel 266 in the start pixel selection step 316 is exemplified, but the present invention is not limited to this. For example, as illustrated in FIG. 7, a plurality of pixels 204 included in a predetermined region 200 </ b> C in the original image 200 can be selected as one start pixel 266. Considering this by the shortest path method, it means that the edge weight information and the shortest path information of all the pixels 204 included in these regions 200C are set to zero or a fixed value (reference value) in advance. By doing in this way, even if it is a case where the image-like noise is contained in this area | region, it becomes possible to cut the influence of noise. Further, since it is possible to omit the calculation of an area where there is no need to make a difference in the sense of depth, such as a clear sky with no clouds, the information processing time for calculating the shortest path can be greatly reduced. In addition, here, the start pixel 266 is not limited to being specified in a certain area, and other pixels other than the start pixel can be integrated as a certain area. For example, this region setting is suitable for a simple subject that may share depth information of a certain area range composed of a plurality of adjacent pixels. In this case, in the area to be integrated, the operator gives an area instruction so that these pixel groups are virtually regarded as one pixel. As a result, the information processing time for calculating the shortest path can be greatly reduced.

Furthermore, in the present embodiment, in the stereoscopic image generation step 330, the individual stereoscopic image 282 is generated using the individual depth map 265, and the stereoscopic image 280 is generated by transmitting and synthesizing the individual stereoscopic image 282. However, the present invention is not limited to this. For example, as shown in FIGS. 17 and 18, the stereoscopic image generation unit 180 preferably includes a depth information synthesis unit 186 instead of the individual image generation unit 182 and the stereoscopic image synthesis unit 184. The depth information combining unit 186 combines a plurality of individual depth maps 265A to 265E generated for each of the areas 202A to 202E by the depth information generating unit 160 to generate one depth information (joined depth map 267). As a result, the operator can visually check the overall stereoscopic effect using the combined depth map 267. The stereoscopic image generation unit 180 uses the combined depth map 267 to generate the right eye image 280A and the left eye image 280B. When the operator does not need the combined depth map 267, the depth information combining unit 186 may not be used. In other words, if the stereoscopic image generation unit 180 applies the depth information 270 set for each of the areas 202A to 202E in the depth information generation unit 160 in units of pixels 204, it is possible to generate a stereoscopic video 280 as a result. It becomes.

In the present embodiment, the case where the start pixels 266A to 266E are selected from the pixels 204 in the range of the selected regions 202A to 202E is illustrated, but the present invention is not limited to this.

For example, as shown in FIG. 19, the start pixel selection unit 166 selects a plurality of start pixels 266A to 266C from the entire original image 200 without depending on the region 202, and the path information setting unit 168 The shortest path can be calculated for each of the plurality of start pixels 266A to 266C for all the pixels 204 of the original image 200, and a plurality of shortest path information 268A to 268C can be set for each pixel.

The depth determination unit 170 selects one of the shortest path information 268A to 268C set in each pixel 204 in units of the area 202, and determines the depth information 270. At this time, the depth determination unit 170 can also determine the depth information 270 using a plurality of shortest path information 268A to 268C set for each pixel 204. The determination to select one shortest path information from the plurality of shortest path information 268A to 268C or to use the plurality of shortest path information 268A to 268C is preferably made common to the areas 202.

This method will be described from another viewpoint with reference to FIG. The depth information generation unit 160 generates a plurality of temporary depth maps 263A to 263C corresponding to the start pixels 266A to 266C. The depth determining unit 170 uses one of the plurality of temporary depth maps 263A to 263C generated in units of the start pixel 266, or overlaps any one of the temporary depth maps 263A to 263C. It is determined whether or not. At this time, if the determination is made in units of a plurality of areas 202A to 202E selected from the original image 200, individual depth maps 265A to 265E corresponding to the areas 202A to 202E are generated.

As described above, the options for determining the depth information 270 can be increased. This option means the start pixels 266A to 266C. As described above, by selecting the start pixels 266A to 266C from a wide range including the outside of the areas 202A to 202E, a more desirable start pixel 266 can be selected. Although the case where three start pixels are selected is illustrated here, the depth information 270 can be determined more flexibly as the number of start pixels 266 is increased.

As already described, it is also preferable to select a plurality from the shortest path information 268A to 268C (temporary depth maps 263A to 263C) and determine the depth information 270 using these. In this way, even if one of the shortest path information 268A to 268C (provisional depth maps 263A to 263C) contains an error part for which accurate depth information cannot be obtained, the remaining other shortest path information 268A If accurate depth information is obtained in ˜268C (provisional depth maps 263A to 263C), the error portion can be automatically compensated by using the information together. As a result, it is possible to obtain smoother depth information 270 in which noise is canceled. When the depth information 270 is determined using a plurality of shortest path information 268A to 268C, various calculation methods such as the sum and average value of these can be applied.

Furthermore, in the present embodiment, the case where the shortest path is calculated in the path information setting step 318 so that the cumulative value of the weight information 264 on the path from the start pixel 266 to each pixel 204 is minimized. Is not limited to this. For example, by using a prim method or the like, a route that has a minimum sum of weights of a set of sides may be obtained from routes constituted by a subset of sides including all pixels 204. In other words, in the present invention, any algorithm can be used as long as any weight value can be specified using various paths between pixels.

In the above-described embodiment, the binocular parallax stereoscopic image of the right-eye image and the left-eye image is exemplified. However, the present invention is not limited to this. For example, this depth information may be used to generate a multi-view stereoscopic image, and it is also possible to generate a multi-view parallax stereoscopic image. That is, in the present invention, any type of stereoscopic video using depth information may be used.

The stereoscopic image generation method and the stereoscopic image generation system of the present invention can be applied to various devices such as a television and a game machine that convert a normal image into a stereoscopic image and display it in addition to the field of production of movies and television programs. Can be used in the field.

Claims

An area setting step for setting a plurality of areas in the original image;
A feature information acquisition step of acquiring feature information of each pixel constituting the original image;
A depth information generating step for generating depth information for each pixel based on the feature information for each of the plurality of regions;
A stereoscopic image generation step of generating a stereoscopic image in which the position of each pixel is changed based on the depth information,
Stereoscopic image generation method.
In the region setting step, the region is set for each subject included in the original image.
The stereoscopic image generation method according to claim 1.
The stereoscopic image generation step includes:
An individual image generating step for generating an individual stereoscopic image in which the position of the pixel is changed for each of the plurality of regions;
A stereoscopic image combining step of combining the plurality of individual stereoscopic images generated for each of the plurality of regions to generate the stereoscopic image,
The stereoscopic image generation method according to claim 1 or 2.
In the stereoscopic image synthesizing step, based on the front-rear relationship of the plurality of individual stereoscopic images, the individual stereoscopic images on the rear side are synthesized so as to transmit the individual stereoscopic images on the front side. Characterized by the
The stereoscopic image generation method according to claim 3.
The stereoscopic image synthesis step includes:
A depth information combining step of combining the depth information generated for each of the plurality of regions;
Generating the stereoscopic image from the synthesized depth information,
The stereoscopic image generation method according to claim 1 or 2.
The region setting step includes:
A back surface color value estimating step of estimating a color value of the pixel of the region on the back surface side for the pixel where the region on the front surface side and the region on the back surface side overlap,
The stereoscopic image generation method according to any one of claims 1 to 5.
(Newly established)
The depth information generation step includes:
A depth correlation adjustment step of adjusting the depth information generated for each region based on a relative context of the plurality of the regions,
The stereoscopic image generation method according to any one of claims 1 to 6.
The depth information generation step includes:
An edge setting step for setting an edge between a pair of the pixels extracted from the original image;
A weight information setting step for setting weight information for the edge based on the feature information;
A start pixel selection step of selecting a start pixel from the pixels;
A path information setting step for calculating a path for the weight information from the start pixel to each pixel and setting path information for each pixel;
A depth determination step for setting the depth information for each pixel based on the path information.
The stereoscopic image generation method according to any one of claims 1 to 7.
In the start pixel selection step, the pixel included in the region indicating the innermost portion or the region indicating the frontmost portion in each of the plurality of regions is selected as the start pixel.
The stereoscopic image generation method according to claim 8.
In the start pixel selection step, a plurality of the start pixels are selected.
The stereoscopic image generation method according to claim 8 or 9.
Composed by an electronic computer,
Area setting means for setting a plurality of areas in the original image;
Feature information acquisition means for acquiring feature information of each pixel constituting the original image;
Depth information generating means for generating depth information for each pixel based on the feature information for each of the plurality of regions;
A stereoscopic image generation unit that generates a stereoscopic image in which the position of each pixel is changed based on the depth information.
Stereoscopic image generation system.