WO2012042998A1 - Image processing device, image processing method, program, and recording medium - Google Patents

Image processing device, image processing method, program, and recording medium Download PDF

Info

Publication number
WO2012042998A1
WO2012042998A1 PCT/JP2011/065003 JP2011065003W WO2012042998A1 WO 2012042998 A1 WO2012042998 A1 WO 2012042998A1 JP 2011065003 W JP2011065003 W JP 2011065003W WO 2012042998 A1 WO2012042998 A1 WO 2012042998A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual viewpoint
image
images
viewpoint
viewpoints
Prior art date
Application number
PCT/JP2011/065003
Other languages
French (fr)
Japanese (ja)
Inventor
大津 誠
敦稔 〆野
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2012042998A1 publication Critical patent/WO2012042998A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation

Definitions

  • the present invention relates to an image processing apparatus, an image processing method, a program, and a recording medium. More specifically, the present invention relates to an image of a viewpoint that is not actually captured by performing signal processing on images captured at a plurality of different viewpoints. The present invention relates to an image processing apparatus, an image processing method, a program for realizing the image processing function, and a recording medium.
  • Stereo stereoscopic televisions (hereinafter referred to as 3D televisions) that simulate stereoscopic viewing by presenting different images to the left and right eyes can strongly feel a sense of depth that cannot be expressed by conventional two-dimensional images. There is an effect to increase the feeling.
  • the left and right eyes of a human being are located in different places. When you actually see an object, you will see the object from a slightly different angle with the left and right eyes. It is believed that.
  • 3D television stereoscopic vision is realized by presenting images with different angles to the left and right eyes using this human visual characteristic.
  • multi-view display As a system different from 3D television, there is a naked-eye multi-view display (hereinafter referred to as multi-view display).
  • multi-viewpoint display images with slightly different angles in multiple directions are presented by a lenticular lens with a small bowl-shaped lens attached to the front of the display.
  • a lenticular lens with a small bowl-shaped lens attached to the front of the display.
  • you enter the left eye you can see stereoscopically.
  • images of two different angles at the next position enter the right eye and the left eye, so that a more natural stereoscopic view can be achieved.
  • the viewpoint position is changed, the change in the angle at which an object is seen in accordance with the movement is called motion parallax, and it is an element necessary for natural stereoscopic vision along with binocular parallax.
  • viewpoint synthesis technology there is a method for improving the synthesis quality by using videos taken from a plurality of different viewpoints. Specifically, once the video for the desired viewpoint is generated for each viewpoint in the middle, and the intermediate generation result that can be assumed that the intermediate viewpoint video is appropriately high quality in units of pixels is selected or the quality is high It is possible to improve the quality of the final synthesized video by blending weighted intermediate generation results that can be assumed. There are a plurality of prior arts describing a method for calculating such an intermediate composite video from each viewpoint, and selecting and blending appropriately.
  • Patent Document 1 discloses an example in which the method for calculating the pixel value of the combined viewpoint is switched according to three conditions when the video of the viewpoint between the left and right camera images is combined.
  • a point corresponding to two neighboring pixels straddling the pixel of the composite viewpoint in the horizontal direction is obtained from the left and right composite source images, and depending on the difference in length between the two points corresponding to the two pixels of the composite viewpoint Switch.
  • the video shot by the left camera Is used for synthesis when the length between two corresponding points in the left video is longer than a predetermined condition compared to the length between the two corresponding points in the right video.
  • the image taken with the right camera is used. To synthesize. If neither of the above two conditions is satisfied, the composition ratio is steadily determined according to the position of the viewpoint for composition, and blending is performed based on the ratio.
  • Patent Document 2 when calculating the pixels in the video of the virtual viewpoint, regarding the pixel values obtained by combining from different viewpoints and the pixel values of the pixels combined from different times at the corresponding viewpoints, respectively.
  • a method is disclosed in which the reliability is calculated, and the pixel value having the higher reliability is set so that the composition ratio is increased, and the composition is performed.
  • the reliability of an image (scheme 1) synthesized from a viewpoint different from the desired viewpoint and the image synthesized from a different time but the same viewpoint (scheme 2) is calculated, and the reliability is high
  • the viewpoint synthesis is performed so that the synthesis ratio of.
  • the determination method uses the feature amount indicating the reliability calculated from the method 1 and the feature amount indicating the reliability calculated from the method 2 to determine the size relationship.
  • the feature quantity indicating the reliability in the method 1 is a value (average error between parallaxes) calculated by obtaining corresponding blocks in the left and right camera images and adding the pixel value difference between the left and right blocks.
  • the feature amount indicating the reliability in the method 2 is a value (time average error) calculated by obtaining blocks corresponding to each other and adding pixel value differences in the blocks before and after the time to be calculated. . At this time, the center of the block of method 1 and method 2 is the position of the pixel to be processed for viewpoint synthesis.
  • the pixel selected as the final synthesis result is steadily determined by the distance between the pixels in the left and right camera images corresponding to two points sandwiching the pixel. For example, when the length between two corresponding points in the left camera image is longer than the length between two corresponding points in the right camera image, a pixel value to be obtained is calculated using the left camera image. On the other hand, when the length between two corresponding points in the right camera image is long, a pixel value to be obtained is calculated using the right camera image.
  • the viewpoint synthesis method described in Patent Document 2 compares a pixel synthesized from videos of different viewpoints with a pixel synthesized from images of the same viewpoint but taken at different times, and in a corresponding block.
  • the one with the smaller average error is judged to be highly reliable and is synthesized.
  • an occlusion area where there is an area that cannot be seen in one image at a different viewpoint or at a different time, the error between blocks increases, so it is easy to select the synthesis method because the reliability is likely to be different, but in the non-occlusion area, As shown in the problem of Patent Document 1, there is a possibility that a sampling position for synthesis does not match depending on a pixel position to be obtained even if reliability is high.
  • Patent Document 1 has a wider range of corresponding pixels.
  • Japanese Patent Application Laid-Open No. 2004-26853 it is assumed that the average error of the corresponding block between different viewpoints and the error of the corresponding block at different times are suitable for the synthesis. This is based on the situational criteria.), And the error that occurs during the composite conversion is not included in the criteria.
  • the present invention does not make a judgment based on the situation criteria, but uses the result of the synthesis once, and makes a judgment based on the continuity of the synthesized signal as a judgment criterion.
  • An object of the present invention is to provide a program and a recording medium.
  • a first technical means for solving the above-described problem is an image processing apparatus that synthesizes a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints.
  • a virtual viewpoint synthesizing unit that generates a plurality of intermediate virtual viewpoint images based on each of the camera videos of the plurality of viewpoints using information indicating corresponding points between the images, and the virtual viewpoint synthesizing unit, For each of the synthesized intermediate virtual viewpoint images, a continuity calculation unit that calculates a local continuity of each virtual viewpoint image and the feature amount calculated by the continuity calculation unit
  • a synthesis ratio calculation unit that calculates a ratio for combining the plurality of intermediate virtual viewpoint images, and combines the plurality of intermediate virtual viewpoint images according to the ratio calculated by the synthesis ratio calculation unit, Virtual viewpoint image A combining unit for forming, in which is characterized by having a.
  • the feature amount is entropy with an edge amount in a mask centering on a processing target pixel of the intermediate virtual viewpoint image as an event. It is a thing.
  • the third technical means is characterized in that, in the first or second technical means, the plurality of viewpoints are two or more viewpoints.
  • the fourth technical means is characterized in that in any one of the first to third technical means, information indicating the corresponding points is input from the outside.
  • the virtual viewpoint synthesis unit calculates information indicating a correspondence relationship between the images of the plurality of viewpoints, and indicates the correspondence relationship. Based on the information, a plurality of intermediate virtual viewpoint images based on each of the plurality of viewpoint camera images are generated by interpolating mutually compatible pixels.
  • a sixth technical means is an image processing method for synthesizing a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints, A virtual viewpoint synthesizing step for generating a plurality of intermediate virtual viewpoint images based on respective camera images of the plurality of viewpoints using information indicating corresponding points between the images of the plurality of viewpoints; For each of the intermediate virtual viewpoint images synthesized in the viewpoint synthesis step, a continuity calculation step for calculating a feature amount indicating local continuity of each virtual viewpoint image, and the continuity calculation step calculated above A synthesis ratio calculating step for calculating a ratio for combining the plurality of intermediate virtual viewpoint images based on the feature amount, and the plurality of intermediate virtual viewpoint images according to the ratio calculated in the combining ratio calculation step. And a synthesis step for synthesizing the final virtual viewpoint image.
  • the feature amount is entropy having an event of an edge amount in a mask centered on a processing target pixel of the intermediate virtual viewpoint image. It is a thing.
  • the eighth technical means uses the information indicating the corresponding points between the images of the plurality of viewpoints acquired from the camera images of the plurality of viewpoints to the computer, and uses the information indicating the corresponding points between the images of the plurality of viewpoints as an intermediate
  • the feature quantity is entropy having an event of an edge quantity in a mask centered on a processing target pixel of the intermediate virtual viewpoint image. It is a thing.
  • the tenth technical means is a computer-readable recording medium on which the program of the eighth or ninth technical means is recorded.
  • an intermediate synthesized viewpoint video obtained by synthesizing a video of a target viewpoint from different conditions is determined by using the result of the synthesis once itself and making a determination using the continuity of the synthesized signal as a criterion. It becomes possible to select appropriately and accurately in units, and the synthesis quality can be improved.
  • viewpoint synthesis technology that creates videos of viewpoints that do not physically have a camera using videos taken from a plurality of different viewpoints
  • intermediate synthesized viewpoint videos generated from different viewpoints are accurately and accurately per pixel. By selecting and blending, it is possible to improve the quality of the composite video.
  • FIG. 1 is a block diagram showing an embodiment of an image processing apparatus of the present invention.
  • the image processing apparatus of the present invention includes frame buffers 1, 2, 3, 4, 7, 8, virtual viewpoint synthesis units 5, 6, mask formation units 9, 10, and continuity feature amount calculation unit. 11, 12, a synthesis ratio calculation unit 13, and a synthesis unit 14.
  • the frame buffers 1 and 2 are frame buffers for temporarily holding a frame (image) at a certain time extracted from a video shot at a predetermined viewpoint.
  • the frame buffer 3 is a frame buffer for storing corresponding point calculation information, which is information indicating corresponding points between images of the same viewpoint and different viewpoints from the video held in the frame buffer 1.
  • information indicating corresponding points is input from the outside.
  • the frame buffer 4 stores corresponding point calculation information corresponding to the viewpoint of the frame buffer 2.
  • Corresponding point calculation information indicating corresponding points is depth information indicating the distance to the subject, for example, and will be described in detail later.
  • the viewpoint image at a specific time held in the frame buffer 1 (2) and the corresponding point calculation information at the same time held in the frame buffer 3 (4) are input.
  • the input viewpoint image is converted to a desired viewpoint using the input corresponding point calculation information. Note that a method of converting a viewpoint image shot using the corresponding point calculation information into a desired viewpoint will be described later.
  • the image converted in the virtual viewpoint synthesis unit 5 (6) is temporarily stored in another frame buffer 7 (8), divided into local blocks in the mask formation unit 9 (10), and the position is shifted for each pixel. While outputting a part of the image.
  • the continuity feature quantity calculation unit 11 calculates a feature quantity indicating continuity using the block image input from the mask formation unit 9 (10), and the result is a synthesis ratio calculation unit. 13 is output.
  • the synthesis ratio calculation unit 13 calculates a synthesis ratio for the synthesis result of each viewpoint generated intermediately according to the feature quantity indicating the continuity input from each of the continuity feature quantity calculation units 11 and 12, and To the unit 14.
  • the synthesizing unit 14 takes out the converted image from the frame buffers 7 and 8 in which the converted images are stored in accordance with the synthesizing ratio input from the synthesizing ratio calculating unit 13, and combines it into an intermediate viewpoint synthesized image of each viewpoint. Is used to generate and output a composite viewpoint image.
  • FIG. 2 shows a state where the subject is photographed from a plurality of different positions.
  • 21, 22, 23, 24, and 25 indicate real cameras for photographing a subject and their positional relationships.
  • Reference numerals 26, 27, 28, and 29 between the cameras indicate areas that cannot be physically installed due to, for example, the size of the camera casing, or gaps due to sparse cameras.
  • a position where a real camera exists is referred to as a real camera position.
  • the viewpoints i-2, i-1, i, i + 1, i + 2 are defined so as to correspond to the real cameras 21, 22, 23, 24, 25 in FIG. 2, and the video captured by each camera is represented by V i-2. , V i ⁇ 1 , V i , V i + 1 , V i + 2 . Since the actual processing is performed on an image at a specific time extracted from these videos, the following provisions are made to facilitate handling. Images (frames) taken at time t taken out of the video shot at each viewpoint are I (i ⁇ 2, t), I (i ⁇ 1, t), I (i, t), I (i + 1, t ), I (i + 2, t).
  • the images are arranged with respect to the viewpoint and time for explanation, they can be arranged two-dimensionally as shown in FIG.
  • a row of horizontal images surrounded by a dotted line represents a row of images of a certain viewpoint, that is, an image, and a collection of vertical images that are not shown by dotted lines or the like is different at a certain time.
  • a case where the desired virtual viewpoint position is between the camera 23 and the camera 24 is taken as an example.
  • the case of calculating a composite viewpoint at a position different from this can also be processed in the same manner as described below.
  • a desired viewpoint calculated by synthesis is referred to as a virtual viewpoint
  • a synthesized video at that viewpoint is referred to as a synthesized video.
  • the basic part of the viewpoint synthesis handled in this embodiment can be realized by using the technique described in Non-Patent Document 1, for example.
  • external parameters and internal parameters of a camera that acquires a video are known, and distance information corresponding to each viewpoint (hereinafter referred to as depth information) is necessary for performing viewpoint synthesis.
  • the external parameter of the camera here is a matrix indicating the three-dimensional position and orientation of the camera
  • the internal parameter is a matrix indicating the focal length, lens distortion, and inclination of the projection plane.
  • Depth information used as corresponding point calculation information indicating corresponding points is defined by Moving International Picture Standards (ISO / IEC) working group Moving Picture Experts Group (MPEG). It is expressed in 256 levels, that is, 8-bit luminance values. As a result, the distance information is an 8-bit gray scale. Since a higher value of brightness is assigned as the distance is shorter, the subject at the front is whiter and the farther away is the blacker. In addition, in order to decode this distance information as an actual distance, the distance of the largest (white) value and the distance of the smallest (black) value are separately defined. By assigning linearly, the actual distance can be obtained.
  • ISO / IEC Moving International Picture Standards
  • MPEG Moving Picture Experts Group
  • Non-Patent Document 1 two nearest real cameras are selected with a desired virtual viewpoint in between.
  • the selected real cameras are the camera 23 (viewpoint i) and the camera 24 (viewpoint i + 1).
  • a virtual viewpoint video is created using the two camera videos.
  • an image for one frame is extracted from the video, and a virtual viewpoint image is synthesized using the image.
  • a virtual viewpoint image is created in the middle, and finally the closest image is selected for one screen depending on which camera the virtual viewpoint camera position is close to. Or it blends according to the composition ratio by the position, and produces a virtual viewpoint image.
  • Non-Patent Document 1 a method described in Non-Patent Document 1 for calculating a video of a desired viewpoint from a camera video with known external parameters and internal parameters and depth information will be described.
  • a 3D warping technique is used for the viewpoint synthesis technique described in Non-Patent Document 1.
  • the 3D warping technology uses an image acquired with a camera with known characteristics and depth information to determine a position in the three-dimensional space that corresponds to each pixel of the image in a one-to-one manner. By projecting onto the projection plane of the viewpoint video, it is possible to obtain the correspondence between the pixels in the real camera and the corresponding virtual viewpoint pixels. Based on this correspondence, a texture (pixel value) of a pixel corresponding to the real camera is acquired and assigned to the corresponding pixel of the virtual viewpoint image, so that a composite image can be created.
  • the above is the basic concept of viewpoint synthesis.
  • Non-Patent Document 1 a reference for calculating this selection or composition ratio is definitely determined by the relationship between the position of the virtual viewpoint and the actual camera position of the selected composition source.
  • the present invention compares the steadiness of local signals of virtual viewpoint images obtained in the middle, selects intermediate synthesis results with high steadiness, or creates a synthesized image by increasing the weight of the synthesis ratio. It is characterized by improving the final synthesis quality.
  • Local signal continuity is the degree to which a synthesized signal is concentrated on a characteristic signal in a local region extracted from an intermediate virtual viewpoint image obtained for each of multiple viewpoints. is there. Concentrating on characteristic signals means that, for example, the edge amount obtained by calculating the absolute value of the difference from the adjacent pixel is concentrated on a specific size. If the synthesis quality is high, the local area concentrates on a specific edge amount. On the other hand, when the synthesis quality is low, a conversion error is added to the inherent specific edge amount because of the conversion noise mixed in the conversion process, and the resulting distribution of the edge amount is dispersed.
  • FIG. 4 is a diagram for explaining the relationship between the difference in the occurrence probability of the edge amount of the local region and the continuity of the intermediate viewpoint generated virtual viewpoint image.
  • the horizontal axis indicates the edge amount
  • the vertical axis indicates the occurrence probability of the edge amount in the local region.
  • FIG. 4A is a diagram showing that a specific edge amount e has a peak and the occurrence probabilities are concentrated in the vicinity thereof.
  • FIG. 4 (B) is a diagram showing that the edge amount e has a peak as in FIG. 4 (A), but the concentration is low and the whole is broad.
  • FIG. 4 (A) concentrates on a specific edge amount e (high stationarity), so it can be said that the reliability of the signal obtained by the synthesis is high. Therefore, it can be said that selecting the virtual viewpoint image having the characteristics shown in FIG. By selecting or blending based on the determination using the stationarity over the entire image while shifting the pixels, it is possible to create an optimal composite image.
  • a method for generating a virtual viewpoint video according to the present invention will be specifically described with reference to a block diagram (FIG. 1) and a flowchart (FIG. 5).
  • the real cameras for photographing the subject are cameras 21, 22, 23, 24, and 25, and an example in which the viewpoint between the real camera positions of the camera 23 and the camera 24 is synthesized will be described.
  • S1-1 an actual camera to be used for synthesizing a virtual viewpoint video is selected. The selection of the real camera is made by selecting the two closest cameras so as to sandwich the virtual viewpoint position to be synthesized.
  • Depth information can be obtained in various ways.
  • measurement is performed using a distance measuring device that can irradiate an object with infrared rays, measure the time until the light is reflected and returned, and obtain the distance to the object.
  • the distance d 0 to the object to be calculated can be calculated by the following equation, where V IR is the speed at which the infrared light travels, and t tof is the time from when the infrared light is irradiated until it returns to the distance measuring device.
  • This processing is performed at the same resolution as the captured image, and a depth image (depth information) is obtained.
  • Non-Patent Document 1 a virtual viewpoint image can be created by the following equation. This process is called 3D warping and corresponds to the processes S1-6 and S1-7 performed in the virtual viewpoint synthesis units 5 and 6.
  • d 0 and d 0 ′ are distance information of the real camera position and virtual viewpoint position, respectively.
  • A, R, and t represent the camera rotation angle and the three-dimensional position of the camera, which are part of the internal and external parameters of the actual camera, respectively.
  • a ′, R ′, and t ′ represent a rotation angle and a three-dimensional position that are a part of the internal parameters and the external parameters of the virtual viewpoint camera.
  • R ⁇ 1 and A ⁇ 1 are inverse matrices of the corresponding matrix.
  • c and c ′ indicate the coordinate of the real camera image and the coordinate of the virtual camera image in a homogeneous coordinate system in which one dimension is added to normal two-dimensional coordinates. For example, when a two-dimensional coordinate (x, y) is expressed in a homogeneous coordinate system, the number of dimensions is increased by one as in (x, y, 1), and 1 is assigned to the added dimension portion. be able to.
  • the correspondence relationship between the coordinate c of the real camera and the coordinate c ′ of the virtual viewpoint is obtained by the expression (3), and the pixel values of the real camera corresponding to all the pixels of the virtual viewpoint are extracted and pasted. Images can be created.
  • the generated virtual viewpoint image is temporarily stored in the frame buffers 7 and 8 for each viewpoint.
  • two intermediate composite viewpoint images I i ( i ′, t) and I i + 1 as shown in FIG. (I ′, t) is obtained.
  • 61 and 62 are real cameras
  • 63 is a virtual viewpoint camera
  • I i and I i + 1 indicate virtual viewpoint images synthesized from the viewpoints i and i + 1, respectively.
  • the composite viewpoint is i ′.
  • the two generated intermediate virtual viewpoint images I i ( i ′, t) and I i + 1 ( i ′, t) are two-dimensional planes, and each of them represents the position of the x coordinate and the y coordinate.
  • a 7 ⁇ 7 size mask is formed as follows with the processing target pixel (x, y) as the center (S1-8, S1-9).
  • the continuity feature amount calculation units 11 and 12 will be described.
  • entropy average amount of information handled in information theory is applied to the determination of continuity.
  • the amount of information is a measure that represents how difficult a certain event occurs when a plurality of events can occur.
  • the average value (expected value) of the information amount of all the events is called entropy. For example, when the events of the peak value e in FIGS. 4A and 4B are compared, the occurrence probability of the event of the peak value e is higher in FIG. 4A than in FIG. In the case of FIG. 4A, even if information on the event of the peak value e is obtained, the amount of information is not high. This is because it is easy to predict.
  • the average information amount (entropy) of the average information amount of all events is smaller in FIG. That is, the value of entropy decreases when the occurrence event can be estimated with a high probability such that the occurrence probability is biased. Therefore, it can be determined that the smaller the entropy value, the higher the continuity of the obtained signal.
  • Equation (4) For each pixel in the mask obtained by Equation (4), an absolute value of a difference from an adjacent pixel is obtained, and this is an event for calculating the entropy.
  • the frequency of occurrence of each event can be calculated by the following equation.
  • the pixel value of the deal image generally by three values, such as RGB values and YC b C R is formed, wherein in order to simplify the explanation, gray scale values and performing the following conversion To do.
  • the occurrence probability of each event can be obtained by dividing the occurrence frequency of equation (5) by the number of pixels in the mask, and can be obtained by the following equation.
  • numM is a constant and is the number of pixels in the mask of Equation (4).
  • the entropy calculation (S1-10, S1-11) performed by the stationary feature quantity calculation units 11 and 12 is performed by the following equation.
  • the entropy is calculated for each intermediate virtual viewpoint image generated from each viewpoint.
  • two entropy values are calculated for each pixel by equation (7).
  • the composition ratio can be determined by the following equation (S1-12). This process is performed by the synthesis ratio calculation unit 13.
  • the synthesis unit 14 has shown an example in which a synthesized viewpoint image that is intermediately synthesized from two different viewpoints is appropriately weighted and synthesized in units of pixels. It also supports multiple viewpoints with 3 or more viewpoints. Accordingly, in FIG. 1, the view points corresponding to one view point handle the configurations of the frame buffer 1, the frame buffer 3, the virtual view point synthesis unit 5, the frame buffer 7, the mask formation unit 9, and the stationary feature value calculation unit 11. By adding, it is possible to increase the number of real cameras used for composition. By increasing the number of viewpoints, it is possible to further improve the synthesis quality by appropriately using subject information from various fields.
  • FIG. 7 is a block diagram showing the second embodiment of the present invention.
  • the same numbers are assigned and only the correspondence is shown.
  • the difference between the first embodiment and the second embodiment is whether information indicating a correspondence relationship for each pixel of an image with a different viewpoint is input from the outside or information indicating the correspondence relationship is created internally. Therefore, in the second embodiment, there are no frame buffers 3 and 4 for storing the corresponding viewpoint information of the first embodiment.
  • the block added in the second embodiment is a disparity vector calculation unit 71.
  • the virtual viewpoint synthesis units 72 and 73 are different in processing from the virtual viewpoint synthesis units 5 and 5 because the contents of the corresponding viewpoint information to be input are different, and the numbers are changed from those in the first embodiment (FIG. 1).
  • An image at time t to be processed is extracted from the video captured by the selected real camera (S2-2, S2-3), and a disparity vector is calculated from these images (S2-4).
  • the viewpoint synthesis method described in Patent Document 1 can be used as a method for generating the corresponding viewpoint by internally generating the corresponding viewpoint information.
  • the viewpoint synthesis method described in Patent Document 1 can be used. According to this method, the correspondence between images is implemented by the parallax vector calculation unit 71, and can be obtained by calculating the parallax amount P that minimizes the following equation E (p).
  • W represents a local mask for performing matching, and is a mask having a size of 7 ⁇ 7, for example.
  • the virtual viewpoint camera 83 is located at a distance DL from the real camera 81 and a distance DR from the real camera 82.
  • the correspondence for each pixel it is assumed that there are also points on the horizontal line connecting two corresponding points. If the correspondence as shown in FIG. 9 is obtained, the pixels of the viewpoint between them can be obtained by the following equation based on the camera 81 (viewpoint i) (S2-5). This process is performed by the virtual viewpoint synthesis unit 72.
  • the viewpoint pixel in between can be obtained from the following equation (S2-6). This process is performed by the virtual viewpoint synthesis unit 73.
  • a virtual viewpoint image is obtained by adaptively switching between Equation (11) and Equation (12) depending on the distance between two corresponding points.
  • the synthesis result is calculated once and synthesized using the continuity of the local signal.
  • Subsequent processing after calculating a plurality of intermediate synthesis results is the same as in the first embodiment.
  • the present invention is a computer-readable recording medium that records a program to be executed by a computer. , And a composite viewpoint video that is obtained intermediately for each different viewpoint is created. A method of generating a virtual viewpoint video by calculating a synthesis ratio based on the obtained local continuity of the intermediate synthesized video and synthesizing it can also be recorded as software processing.
  • the recording medium may be a non-illustrated memory, for example, a program medium such as a ROM because processing is performed by a microcomputer, and a program reading device as an external storage device (not illustrated) is provided, and the recording medium is stored therein. It may be a program medium that can be read by being inserted.
  • the stored program may be configured to be accessed and executed by the microprocessor, and the program is read out, and the read program is stored in a program storage area (not shown) of the microcomputer. A method of downloading and executing the program may be used. In this case, it is assumed that the download program is stored in the main device in advance.
  • the program medium is a recording medium configured to be separable from the main body, and includes a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy disk (registered trademark) and a hard disk, and a CD-ROM / MO / Disk systems for optical disks such as MD / DVD, card systems such as IC cards (including memory cards) / optical cards, or mask ROM, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Programmable Programmable Read Only Memory), flash It may be a medium that carries a fixed program including a semiconductor memory such as a ROM.
  • the medium may be a medium that dynamically carries the program so as to download the program from the communication network.
  • the download program may be stored in the main device in advance or installed from another recording medium.
  • the recording medium is read by a program reading device provided in a digital color image forming apparatus or a computer system, whereby the above-described image processing method is executed.
  • the computer system includes a general-purpose image input device such as a WEB camera, a computer that performs various processes such as the image processing method by loading a predetermined program, a display / liquid crystal display that displays the processing results of the computer, and the like.
  • Image display device Furthermore, a network card, a modem, and the like are provided as communication means for connecting to a server or the like via a network.

Abstract

The present invention increases synthesis quality by precisely and appropriately selecting an intermediate synthesized viewpoint image by pixel that synthesizes a video from a desired viewpoint from differing conditions. A video processing device generates a video from any given viewpoint by capturing a subject from a plurality of different positions. Virtual viewpoint generating units (5, 6) generate an intermediate synthesized image using a plurality of camera videos selected for generating a video from the desired virtual viewpoint. A constant feature-quantity calculation unit calculates feature quantities indicating local constancy from the intermediate synthesized image. A synthesis ratio calculation unit (13) appropriately selects an intermediate synthesized image on the basis of the calculated feature quantities, or calculates a synthesis ratio for blending. The feature quantity is considered the entropy (average amount of information) of the amount of edges in a local region, and the intermediate synthesized image having the smallest value thereof (having the greatest constancy) is selected, or the weight thereof is increased.

Description

画像処理装置、画像処理方法、プログラム及び記録媒体Image processing apparatus, image processing method, program, and recording medium
 本発明は、画像処理装置、画像処理方法、プログラム及び記録媒体に関し、より具体的には、複数の異なる視点において撮影された映像を信号処理することにより、実際には撮影していない視点の映像を作り出す画像処理装置、画像処理方法、及び該画像処理の機能を実現するプログラム及び記録媒体に関する。 The present invention relates to an image processing apparatus, an image processing method, a program, and a recording medium. More specifically, the present invention relates to an image of a viewpoint that is not actually captured by performing signal processing on images captured at a plurality of different viewpoints. The present invention relates to an image processing apparatus, an image processing method, a program for realizing the image processing function, and a recording medium.
 左右の目に異なる映像を提示することで疑似的に立体視をさせるステレオ立体視テレビ(以下3Dテレビと称する)は、従来の2次元の映像では表現できない奥行き感を強く感じることができ、臨場感を高める効果がある。人間の左右の目は異なる場所に位置しており、実際に物を見るときには左右の目でわずかに異なる角度から物体を見ることになり、この左右の見えの差(視差)によって立体感を感じると考えられている。3Dテレビでは、この人間の視覚特性を利用し、左右の目に角度の異なる映像を提示することで立体視を実現させている。 Stereo stereoscopic televisions (hereinafter referred to as 3D televisions) that simulate stereoscopic viewing by presenting different images to the left and right eyes can strongly feel a sense of depth that cannot be expressed by conventional two-dimensional images. There is an effect to increase the feeling. The left and right eyes of a human being are located in different places. When you actually see an object, you will see the object from a slightly different angle with the left and right eyes. It is believed that. In 3D television, stereoscopic vision is realized by presenting images with different angles to the left and right eyes using this human visual characteristic.
 3Dテレビとは異なる方式として、裸眼多視点立体ディスプレイ(以下、多視点ディスプレイと称する)がある。多視点ディスプレイでは、ディスプレイの前面に微小の蒲鉾状のレンズを張り合わせたレンチキュラーレンズによって複数の方向に少しずつ角度の異なる映像を提示し、ディスプレイを見たときに、2つの異なる角度の映像が右目と左目に入ることで立体視することができる。この方式では、頭を動かすと次の位置の2つの異なる角度の映像が右目と左目に入るため、より自然な立体視をすることができる。視点位置を変えた時に、その動きに合わせて物体の見える角度が変わることを運動視差と呼び、両眼視差と並んで自然な立体視には必要な要素である。 As a system different from 3D television, there is a naked-eye multi-view display (hereinafter referred to as multi-view display). In a multi-viewpoint display, images with slightly different angles in multiple directions are presented by a lenticular lens with a small bowl-shaped lens attached to the front of the display. When you enter the left eye, you can see stereoscopically. In this method, when the head is moved, images of two different angles at the next position enter the right eye and the left eye, so that a more natural stereoscopic view can be achieved. When the viewpoint position is changed, the change in the angle at which an object is seen in accordance with the movement is called motion parallax, and it is an element necessary for natural stereoscopic vision along with binocular parallax.
 ところが、多視点ディスプレイで扱う全ての視点の映像を撮影し、伝送することは、様々な理由で困難である。特に、視点の数が多く、その間隔が密になるほど実現は困難になる。その代表的な理由は、カメラ自体の筐体の大きさや撮像素子そのものに大きさがあるため、カメラの設置間隔に物理的な限界があることと、仮に設置ができたとしても、多くの視点の映像をすべて伝送すると視点数に応じて伝送容量が増大することにある。 However, it is difficult for various reasons to shoot and transmit images of all viewpoints handled by a multi-view display. In particular, the greater the number of viewpoints and the closer the intervals, the more difficult the realization. A typical reason for this is that there is a physical limit to the camera installation interval due to the size of the camera housing and the size of the image pickup device itself. The transmission capacity increases in accordance with the number of viewpoints when all the videos are transmitted.
 以上の問題を解決するために、視点合成技術を導入し、少ない視点の映像から多くの間の視点の映像を作り出す方法が提案されている。視点合成技術では、疎に設置したカメラの間の映像を補間により生成でき、密な視点の映像を簡単に作り出すことが可能である。また視点の数についても、例えば少ない視点の映像を伝送し、受信側で間の視点の映像を作り出すことで、伝送容量を抑制することも可能である。 In order to solve the above problems, there has been proposed a method of introducing viewpoint synthesis technology to create many viewpoint videos from a few viewpoint videos. In view synthesis technology, images between sparsely installed cameras can be generated by interpolation, and it is possible to easily create images with dense viewpoints. As for the number of viewpoints, for example, it is possible to reduce the transmission capacity by transmitting a video with a small number of viewpoints and creating a video with a viewpoint between them on the receiving side.
 視点合成技術について、複数の異なる視点で撮影した映像を利用して合成品質を向上させる方法がある。具体的には、求めたい視点の映像を各視点ごとに一度中間的に生成し、その中間的な視点映像を画素単位で適宜品質の高いと想定できる中間生成結果を選択、あるいは品質の高いと想定できる中間生成結果に重みをつけてブレンドすることで、最終的な合成映像の品質を高めることが可能である。このような中間的な合成映像を各視点から算出し、適宜選択・ブレンドする方法について記載している先行技術が複数ある。 Regarding the viewpoint synthesis technology, there is a method for improving the synthesis quality by using videos taken from a plurality of different viewpoints. Specifically, once the video for the desired viewpoint is generated for each viewpoint in the middle, and the intermediate generation result that can be assumed that the intermediate viewpoint video is appropriately high quality in units of pixels is selected or the quality is high It is possible to improve the quality of the final synthesized video by blending weighted intermediate generation results that can be assumed. There are a plurality of prior arts describing a method for calculating such an intermediate composite video from each viewpoint, and selecting and blending appropriately.
 例えば、特許文献1では、左右のカメラ映像を用いて間の視点の映像を合成する時に、合成視点の画素値の算出方法を3つの条件に従って切り替える例が開示されている。合成視点の画素を横方向に跨ぐ近傍の2つの画素に対応する点を左右の合成元の映像から求めて置き、その合成視点の2つの画素に対応する2点の間の長さの違いによって切り替えを行う。具体的には、左側の映像において対応する2点間の長さが、右側の映像において対応する2点間の長さに比べて所定の条件より長い場合には、左カメラで撮影された映像を用いて合成を行う。反対に、右側の映像において対応する2点間の長さが、左側の映像において対応する2点間の長さに比べて所定の条件より長い場合には、右カメラで撮影された映像を用いて合成を行う。以上の2つの条件とも満足しない場合には、合成を行う視点の位置によって合成比率が定常的に確定され、その比率に基づいてブレンドされる。 For example, Patent Document 1 discloses an example in which the method for calculating the pixel value of the combined viewpoint is switched according to three conditions when the video of the viewpoint between the left and right camera images is combined. A point corresponding to two neighboring pixels straddling the pixel of the composite viewpoint in the horizontal direction is obtained from the left and right composite source images, and depending on the difference in length between the two points corresponding to the two pixels of the composite viewpoint Switch. Specifically, when the length between two corresponding points in the left video is longer than a predetermined condition compared to the length between the two corresponding points in the right video, the video shot by the left camera Is used for synthesis. On the other hand, when the length between two corresponding points in the right image is longer than a predetermined condition compared with the corresponding two points in the left image, the image taken with the right camera is used. To synthesize. If neither of the above two conditions is satisfied, the composition ratio is steadily determined according to the position of the viewpoint for composition, and blending is performed based on the ratio.
 また、特許文献2では、仮想的な視点の映像中の画素を算出する際に、異なる視点から合成して求めた画素値と、対応する視点において異なる時間から合成した画素の画素値に関して、それぞれの信頼度を算出し、その信頼度の高い方の画素値に合成比率が高くなるよう設定を行い、合成を行う方法について開示されている。特許文献2では、求める視点とは異なる視点から合成した画像(方式1)と、求める視点と同じではあるが異なる時間から合成した画像(方式2)の信頼度を算出し、信頼度が高い方式の合成比率が高くなるように視点合成を行っている。 Further, in Patent Document 2, when calculating the pixels in the video of the virtual viewpoint, regarding the pixel values obtained by combining from different viewpoints and the pixel values of the pixels combined from different times at the corresponding viewpoints, respectively. A method is disclosed in which the reliability is calculated, and the pixel value having the higher reliability is set so that the composition ratio is increased, and the composition is performed. In Patent Document 2, the reliability of an image (scheme 1) synthesized from a viewpoint different from the desired viewpoint and the image synthesized from a different time but the same viewpoint (scheme 2) is calculated, and the reliability is high The viewpoint synthesis is performed so that the synthesis ratio of.
 判定方法は、方式1から算出した信頼度を示す特徴量と方式2から算出した信頼度を示す特徴量を用いて、その大小関係によって判断する。方式1において信頼度を示す特徴量は、左右のカメラ映像において対応するブロックを求め、その左右のブロック間の画素値の差分を加算して算出した値(視差間平均誤差)である。方式2における信頼度を示す特徴量は、算出する時間の前後の時間において、お互いに対応するブロックを求めて、そのブロックにおける画素値の差分を加算して算出した値(時間平均誤差)である。このとき方式1と方式2のブロックの中心は、視点合成を行う処理対象となる画素の位置である。 The determination method uses the feature amount indicating the reliability calculated from the method 1 and the feature amount indicating the reliability calculated from the method 2 to determine the size relationship. The feature quantity indicating the reliability in the method 1 is a value (average error between parallaxes) calculated by obtaining corresponding blocks in the left and right camera images and adding the pixel value difference between the left and right blocks. The feature amount indicating the reliability in the method 2 is a value (time average error) calculated by obtaining blocks corresponding to each other and adding pixel value differences in the blocks before and after the time to be calculated. . At this time, the center of the block of method 1 and method 2 is the position of the pixel to be processed for viewpoint synthesis.
特開平8-201941号公報Japanese Patent Laid-Open No. 8-201941 特開2009-3507号公報JP 2009-3507 A
 特許文献1に記載されている視点合成方法では、最終合成結果として選択される画素は、その画素を挟んだ2点に対応した左右のカメラ映像における画素間の距離によって定常的に決まる。例えば、左側のカメラ映像において対応する2点間の長さが右側のカメラ映像において対応する2点間の長さより長い場合、左側のカメラ映像を用いて求める画素値が算出される。反対に、右側のカメラ映像において対応する2点間の長さが長い場合には、右側のカメラ映像を用いて求める画素値が算出される。 In the viewpoint synthesis method described in Patent Document 1, the pixel selected as the final synthesis result is steadily determined by the distance between the pixels in the left and right camera images corresponding to two points sandwiching the pixel. For example, when the length between two corresponding points in the left camera image is longer than the length between two corresponding points in the right camera image, a pixel value to be obtained is calculated using the left camera image. On the other hand, when the length between two corresponding points in the right camera image is long, a pixel value to be obtained is calculated using the right camera image.
 然しながら、特許文献1の場合、いずれにしても合成は2点間の補間によってなされるため、仮に対応する2点間の長さが長くても、求める画素の位置によっては合成のサンプリング位置が合わずに変換誤差をより大きくさせることがある。このように変換誤差を多く発生した場合には、合成品質が劣化する問題がある However, in the case of Patent Document 1, in any case, since the composition is performed by interpolation between two points, even if the length between the two corresponding points is long, the composition sampling position may be adjusted depending on the position of the pixel to be obtained. Without increasing the conversion error. Thus, when many conversion errors occur, there is a problem that the synthesis quality deteriorates.
 特許文献2に記載された視点合成方法は、異なる視点の映像から合成された画素と、視点は同じであるが撮影された時間の異なる画像から合成された画素を比較して、対応するブロック内の平均誤差の小さい方を信頼度が高いと判断をして合成を行っている。異なる視点あるいは異なる時間の片方の画像において見えない領域があるようなオクルージョン領域では、ブロック間の誤差が大きくなるため信頼度に差が出やすいため正しく合成方式を選択できるが、非オクルージョン領域では、特許文献1の課題で示したように、仮に信頼度が高くても求める画素位置によっては、合成のためのサンプリング位置が合わない問題が発生する可能性がある。 The viewpoint synthesis method described in Patent Document 2 compares a pixel synthesized from videos of different viewpoints with a pixel synthesized from images of the same viewpoint but taken at different times, and in a corresponding block. The one with the smaller average error is judged to be highly reliable and is synthesized. In an occlusion area where there is an area that cannot be seen in one image at a different viewpoint or at a different time, the error between blocks increases, so it is easy to select the synthesis method because the reliability is likely to be different, but in the non-occlusion area, As shown in the problem of Patent Document 1, there is a possibility that a sampling position for synthesis does not match depending on a pixel position to be obtained even if reliability is high.
 これは、特許文献1と共通の課題であるが、異なる条件によって得た合成結果を判断するために、合成結果に影響に与える2次的な状況(特許文献1では対応する画素間が広い方が合成には適しているという仮定。特許文献2では異なる視点間の対応するブロックの平均誤差と異なる時間の対応するブロックの誤差の小さい方が合成に適しているという仮定。以上のような、状況的な基準に基づいている。)を基準に判断を行うため、合成変換の時に発生する誤差を判断基準に盛り込めていないことが原因である。 This is a problem common to Patent Document 1, but in order to determine a composite result obtained under different conditions, a secondary situation that affects the composite result (Patent Document 1 has a wider range of corresponding pixels). In Japanese Patent Application Laid-Open No. 2004-26853, it is assumed that the average error of the corresponding block between different viewpoints and the error of the corresponding block at different times are suitable for the synthesis. This is based on the situational criteria.), And the error that occurs during the composite conversion is not included in the criteria.
 上記問題を鑑み、本発明は、状況基準によって判断するのではなく、一旦合成した結果そのものを用いて、その合成信号の定常性を判断基準として判断を行うことにより、異なる条件から目的の視点の映像を合成した中間的な合成視点映像を画素単位で精度よく適切に選択、あるいは適切に重みづけすることが可能になり、合成品質を向上させることができるようにした画像処理装置、画像処理方法、プログラム及び記録媒体を提供することを目的とする。 In view of the above problems, the present invention does not make a judgment based on the situation criteria, but uses the result of the synthesis once, and makes a judgment based on the continuity of the synthesized signal as a judgment criterion. An image processing apparatus and an image processing method capable of appropriately selecting or appropriately weighting an intermediate synthesized viewpoint video obtained by synthesizing videos with accuracy in units of pixels and improving the synthesis quality An object of the present invention is to provide a program and a recording medium.
 上記課題を解決するための第1の技術手段は、複数の視点のカメラ映像を用いて、該複数の視点の中間に位置する仮想視点画像を合成する画像処理装置であって、前記複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成部と、該仮想視点合成部が合成した前記中間的な仮想視点画像のそれぞれについて、各該仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出部と、該定常性算出部が算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出部と、該合成比率算出部で算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成部と、を有することを特徴としたものである。 A first technical means for solving the above-described problem is an image processing apparatus that synthesizes a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints. A virtual viewpoint synthesizing unit that generates a plurality of intermediate virtual viewpoint images based on each of the camera videos of the plurality of viewpoints using information indicating corresponding points between the images, and the virtual viewpoint synthesizing unit, For each of the synthesized intermediate virtual viewpoint images, a continuity calculation unit that calculates a local continuity of each virtual viewpoint image and the feature amount calculated by the continuity calculation unit A synthesis ratio calculation unit that calculates a ratio for combining the plurality of intermediate virtual viewpoint images, and combines the plurality of intermediate virtual viewpoint images according to the ratio calculated by the synthesis ratio calculation unit, Virtual viewpoint image A combining unit for forming, in which is characterized by having a.
 第2の技術手段は、第1の技術手段において、前記特徴量が、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴としたものである。 According to a second technical means, in the first technical means, the feature amount is entropy with an edge amount in a mask centering on a processing target pixel of the intermediate virtual viewpoint image as an event. It is a thing.
 第3の技術手段は、第1または第2の技術手段において、前記複数の視点が、2視点以上の視点であることを特徴としたものである。 The third technical means is characterized in that, in the first or second technical means, the plurality of viewpoints are two or more viewpoints.
 第4の技術手段は、第1~第3のいずれか1の技術手段において、前記対応点を示す情報を外部より入力することを特徴としたものである。 The fourth technical means is characterized in that in any one of the first to third technical means, information indicating the corresponding points is input from the outside.
 第5の技術手段は、第1~第3のいずれか1の技術手段において、前記仮想視点合成部が、前記複数の視点の画像間における対応関係を示す情報を算出し、該対応関係を示す情報に基づき、相互に対応性のある画素を補間することにより、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成することを特徴したものである。 According to a fifth technical means, in any one of the first to third technical means, the virtual viewpoint synthesis unit calculates information indicating a correspondence relationship between the images of the plurality of viewpoints, and indicates the correspondence relationship. Based on the information, a plurality of intermediate virtual viewpoint images based on each of the plurality of viewpoint camera images are generated by interpolating mutually compatible pixels.
 第6の技術手段は、複数の視点のカメラ映像を用いて、該複数の視点の中間に位置する仮想視点画像を合成する画像処理方法であって、
 前記複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成ステップと、該仮想視点合成ステップで合成した前記中間的な仮想視点画像のそれぞれについて、各該仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出ステップと、該定常性算出ステップで算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出ステップと、該合成比率算出ステップで算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成ステップと、を有することを特徴としたものである。
A sixth technical means is an image processing method for synthesizing a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints,
A virtual viewpoint synthesizing step for generating a plurality of intermediate virtual viewpoint images based on respective camera images of the plurality of viewpoints using information indicating corresponding points between the images of the plurality of viewpoints; For each of the intermediate virtual viewpoint images synthesized in the viewpoint synthesis step, a continuity calculation step for calculating a feature amount indicating local continuity of each virtual viewpoint image, and the continuity calculation step calculated above A synthesis ratio calculating step for calculating a ratio for combining the plurality of intermediate virtual viewpoint images based on the feature amount, and the plurality of intermediate virtual viewpoint images according to the ratio calculated in the combining ratio calculation step. And a synthesis step for synthesizing the final virtual viewpoint image.
 第7の技術手段は、第6の技術手段において、前記特徴量が、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴としたものである。 According to a seventh technical means, in the sixth technical means, the feature amount is entropy having an event of an edge amount in a mask centered on a processing target pixel of the intermediate virtual viewpoint image. It is a thing.
 第8の技術手段は、コンピュータに、複数の視点のカメラ映像から取得した複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成ステップと、該仮想視点合成ステップで合成した前記中間的な仮想視点画像のそれぞれについて、各該仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出ステップと、該定常性算出ステップで算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出ステップと、該合成比率算出ステップで算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成ステップと、を実行させるための画像処理プログラムである。 The eighth technical means uses the information indicating the corresponding points between the images of the plurality of viewpoints acquired from the camera images of the plurality of viewpoints to the computer, and uses the information indicating the corresponding points between the images of the plurality of viewpoints as an intermediate A feature value indicating local continuity of each virtual viewpoint image for each of the virtual viewpoint synthesis step for generating a plurality of virtual viewpoint images and the intermediate virtual viewpoint image synthesized in the virtual viewpoint synthesis step A continuity calculating step for calculating the ratio, a combining ratio calculating step for calculating a ratio for combining the plurality of intermediate virtual viewpoint images based on the feature amount calculated in the continuity calculating step, and the combining ratio calculation An image for executing the combining step of combining the plurality of intermediate virtual viewpoint images according to the ratio calculated in the step and combining the final virtual viewpoint image Is a management program.
 第9の技術手段は、第8の技術手段において、前記特徴量が、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴としたものである。 According to a ninth technical means, in the eighth technical means, the feature quantity is entropy having an event of an edge quantity in a mask centered on a processing target pixel of the intermediate virtual viewpoint image. It is a thing.
 第10の技術手段は、第8または第9の技術手段のプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The tenth technical means is a computer-readable recording medium on which the program of the eighth or ninth technical means is recorded.
 本発明によれば、一旦合成した結果そのものを用いて、その合成信号の定常性を判断基準として判断を行うことにより、異なる条件から目的の視点の映像を合成した中間的な合成視点映像を画素単位で精度よく適切に選択することが可能になり、合成品質を向上させることができる。
 特に、複数の異なる視点において撮影した映像を用いて、物理的にカメラの存在しない視点の映像を作りだす視点合成技術に関し、異なる視点から生成した中間的な合成視点映像を、画素単位で精度よく適切に選択・ブレンドすることで、合成映像の品質を向上させることが可能になる。また、本発明によって任意視点の映像を生成し、立体表示ディスプレイに表示させることによって、少ない視点の映像であっても、密な視点の多視点映像を疑似的に生成することが可能で、品質の高い多視点立体視が可能になる。
According to the present invention, an intermediate synthesized viewpoint video obtained by synthesizing a video of a target viewpoint from different conditions is determined by using the result of the synthesis once itself and making a determination using the continuity of the synthesized signal as a criterion. It becomes possible to select appropriately and accurately in units, and the synthesis quality can be improved.
In particular, with regard to viewpoint synthesis technology that creates videos of viewpoints that do not physically have a camera using videos taken from a plurality of different viewpoints, intermediate synthesized viewpoint videos generated from different viewpoints are accurately and accurately per pixel. By selecting and blending, it is possible to improve the quality of the composite video. In addition, by generating a video of an arbitrary viewpoint according to the present invention and displaying it on a stereoscopic display, it is possible to artificially generate a multi-view video of a dense viewpoint even with a video of few viewpoints. Multi-view stereoscopic viewing with high image quality.
本発明の第1の実施例に対応するブロック図である。It is a block diagram corresponding to the 1st example of the present invention. 複数のカメラを用いて被写体を撮影する様子の概観図である。It is a general-view figure of a mode that a photographic subject is photoed using a plurality of cameras. 複数の視点の映像から抽出した画像の時間方向と視点方向への2次元配置を示す図である。It is a figure which shows the two-dimensional arrangement | positioning to the time direction and viewpoint direction of the image extracted from the image | video of several viewpoints. 定常性を説明する図である。It is a figure explaining continuity. 第1の実施例に対応する処理フローチャートである。It is a process flowchart corresponding to a 1st Example. 2台の実カメラから間の視点の映像(画像列)を生成する概念図である。It is a conceptual diagram which produces | generates the image | video (image sequence) of the viewpoint between two real cameras. 本発明の第2の実施例に対応するブロック図である。It is a block diagram corresponding to 2nd Example of this invention. 第2の実施例に対応した2台のカメラと視点合成する位置の関係を示す図である。It is a figure which shows the relationship between the two cameras corresponding to a 2nd Example, and the position which carries out a viewpoint synthesis | combination. 第2の実施例に対応した間の視点を合成する方法に関する図である。It is a figure regarding the method of synthesize | combining the viewpoint between corresponding to a 2nd Example. 第2の実施例に対応する処理フローチャートである。It is a process flowchart corresponding to a 2nd Example.
(第1の実施例)
<構成>
 本発明の第1の実施の形態について図面を参照して説明する。図1は、本発明の画像処理装置の一実施の形態を示すブロック図である。図1に示すように、本発明の画像処理装置は、フレームバッファ1,2,3,4,7,8、仮想視点合成部5,6、マスク形成部9,10、定常性特徴量算出部11,12、合成比率算出部13、及び合成部14を備えている。
 フレームバッファ1,2は、所定の視点において撮影された映像から抽出された、ある時刻のフレーム(画像)を一時的に保持しておくためのフレームバッファである。フレームバッファ3は、フレームバッファ1において保持してある映像と同じ視点で、異なる視点の画像間における対応点を示す情報である対応点算出情報を格納しておくフレームバッファである。本実施例では、対応点を示す情報は外部から入力される。なお、フレームバッファ4は、フレームバッファ2の視点に対応する対応点算出情報が格納される。対応点を示す対応点算出情報は、例えば被写体までの距離を表すデプス情報であり、詳細は後述する。
(First embodiment)
<Configuration>
A first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of an image processing apparatus of the present invention. As shown in FIG. 1, the image processing apparatus of the present invention includes frame buffers 1, 2, 3, 4, 7, 8, virtual viewpoint synthesis units 5, 6, mask formation units 9, 10, and continuity feature amount calculation unit. 11, 12, a synthesis ratio calculation unit 13, and a synthesis unit 14.
The frame buffers 1 and 2 are frame buffers for temporarily holding a frame (image) at a certain time extracted from a video shot at a predetermined viewpoint. The frame buffer 3 is a frame buffer for storing corresponding point calculation information, which is information indicating corresponding points between images of the same viewpoint and different viewpoints from the video held in the frame buffer 1. In this embodiment, information indicating corresponding points is input from the outside. The frame buffer 4 stores corresponding point calculation information corresponding to the viewpoint of the frame buffer 2. Corresponding point calculation information indicating corresponding points is depth information indicating the distance to the subject, for example, and will be described in detail later.
 仮想視点合成部5(、6)では、フレームバッファ1(、2)に保持される特定の時刻の視点画像とフレームバッファ3(、4)に保持される同時刻の対応点算出情報が入力され、その入力された対応点算出情報を利用して入力された視点の画像を所望の視点に変換する。なお、対応点算出情報を用いて撮影された視点の画像を所望の視点に変換する方法については後述する。
 仮想視点合成部5(、6)において変換された画像は一旦別のフレームバッファ7(、8)に格納され、マスク形成部9(、10)において局所ブロックに分割され、画素ごとに位置をずらしながら画像の一部を出力する。
In the virtual viewpoint synthesis unit 5 (6), the viewpoint image at a specific time held in the frame buffer 1 (2) and the corresponding point calculation information at the same time held in the frame buffer 3 (4) are input. The input viewpoint image is converted to a desired viewpoint using the input corresponding point calculation information. Note that a method of converting a viewpoint image shot using the corresponding point calculation information into a desired viewpoint will be described later.
The image converted in the virtual viewpoint synthesis unit 5 (6) is temporarily stored in another frame buffer 7 (8), divided into local blocks in the mask formation unit 9 (10), and the position is shifted for each pixel. While outputting a part of the image.
 続いて、定常性特徴量算出部11(、12)は、マスク形成部9(、10)から入力されたブロック画像を使って定常性を示す特徴量を算出し、その結果を合成比率算出部13に出力する。合成比率算出部13では、各定常性特徴量算出部11,12より入力された定常性を示す特徴量に応じて、中間的に生成された各視点の合成結果に対する合成比率を算出し、合成部14に出力する。合成部14では、合成比率算出部13から入力された合成比率に従って、変換された画像が格納されているフレームバッファ7,8から変換画像を取り出し、各視点の中間的な視点合成画像に合成比率を乗じて合成視点画像を生成し、出力する。 Subsequently, the continuity feature quantity calculation unit 11 (, 12) calculates a feature quantity indicating continuity using the block image input from the mask formation unit 9 (10), and the result is a synthesis ratio calculation unit. 13 is output. The synthesis ratio calculation unit 13 calculates a synthesis ratio for the synthesis result of each viewpoint generated intermediately according to the feature quantity indicating the continuity input from each of the continuity feature quantity calculation units 11 and 12, and To the unit 14. The synthesizing unit 14 takes out the converted image from the frame buffers 7 and 8 in which the converted images are stored in accordance with the synthesizing ratio input from the synthesizing ratio calculating unit 13, and combines it into an intermediate viewpoint synthesized image of each viewpoint. Is used to generate and output a composite viewpoint image.
<概念>
 続いて、図2乃至図4を用いて本発明の視点合成処理の概念について説明する。
図2は、被写体を複数の異なる位置から撮影している様子を示している。図2における、21,22,23,24,25は被写体を撮影する実物のカメラとその位置関係を示している。カメラとカメラの間にあたる26,27,28,29は、例えばカメラの筐体の大きさ等によって物理的に設置できない領域、あるいはカメラを疎に置いたことによる隙間を示している。以下、実物のカメラが存在する位置を実カメラ位置と呼ぶ。
<Concept>
Next, the concept of the viewpoint synthesis process of the present invention will be described with reference to FIGS.
FIG. 2 shows a state where the subject is photographed from a plurality of different positions. In FIG. 2, 21, 22, 23, 24, and 25 indicate real cameras for photographing a subject and their positional relationships. Reference numerals 26, 27, 28, and 29 between the cameras indicate areas that cannot be physically installed due to, for example, the size of the camera casing, or gaps due to sparse cameras. Hereinafter, a position where a real camera exists is referred to as a real camera position.
 図2における実物のカメラ21,22,23,24,25に対応するように、視点i-2,i-1,i,i+1,i+2を規定し、各カメラによって撮影した映像をVi-2,Vi-1,Vi,Vi+1,Vi+2とする。実際の処理は、これらの映像の中から抽出した特定の時刻の画像に対して行うため、扱いやすいように次の規定を行う。各視点において撮影された映像の中から取り出した時刻tにおける画像(フレーム)をI(i-2,t),I(i-1,t),I(i,t),I(i+1,t),I(i+2,t)とする。説明のために視点と時刻について画像を並べると図3の様に2次元的に配置することができる。図3では、点線で囲んだ横方向の画像の列が、ある視点の画像の列、すなわち映像を表し、点線等で図示していないが縦方向の画像を集めたものが、ある時刻の異なる視点の画像集合となる。 The viewpoints i-2, i-1, i, i + 1, i + 2 are defined so as to correspond to the real cameras 21, 22, 23, 24, 25 in FIG. 2, and the video captured by each camera is represented by V i-2. , V i−1 , V i , V i + 1 , V i + 2 . Since the actual processing is performed on an image at a specific time extracted from these videos, the following provisions are made to facilitate handling. Images (frames) taken at time t taken out of the video shot at each viewpoint are I (i−2, t), I (i−1, t), I (i, t), I (i + 1, t ), I (i + 2, t). If the images are arranged with respect to the viewpoint and time for explanation, they can be arranged two-dimensionally as shown in FIG. In FIG. 3, a row of horizontal images surrounded by a dotted line represents a row of images of a certain viewpoint, that is, an image, and a collection of vertical images that are not shown by dotted lines or the like is different at a certain time. This is a set of viewpoint images.
 視点合成について説明するために、求める仮想視点位置がカメラ23とカメラ24の間にある場合を例にする。これとは異なる位置の合成視点を算出する場合についても、以降の説明と同様に処理することができる。以下、合成によって算出する所望の視点を仮想視点と呼び、その視点の合成映像を合成映像と呼ぶこととする。 In order to explain viewpoint synthesis, a case where the desired virtual viewpoint position is between the camera 23 and the camera 24 is taken as an example. The case of calculating a composite viewpoint at a position different from this can also be processed in the same manner as described below. Hereinafter, a desired viewpoint calculated by synthesis is referred to as a virtual viewpoint, and a synthesized video at that viewpoint is referred to as a synthesized video.
 本実施例で扱う視点合成の基本部分は、例えば非特許文献1に記載の技術を用いて実現できる。この方法によると、映像を取得するカメラの外部パラメータと内部パラメータが既知であるとともに、各視点に対応する距離情報(以下、デプス情報と呼ぶ)が視点合成を行うために必要である。ここでいう、カメラの外部パラメータとはカメラの3次元的な位置と姿勢を示す行列であり、内部パラメータとは、焦点距離、レンズ歪み、投影面の傾きを示す行列である。 The basic part of the viewpoint synthesis handled in this embodiment can be realized by using the technique described in Non-Patent Document 1, for example. According to this method, external parameters and internal parameters of a camera that acquires a video are known, and distance information corresponding to each viewpoint (hereinafter referred to as depth information) is necessary for performing viewpoint synthesis. The external parameter of the camera here is a matrix indicating the three-dimensional position and orientation of the camera, and the internal parameter is a matrix indicating the focal length, lens distortion, and inclination of the projection plane.
 対応点を示す対応点算出情報として利用するデプス情報は、国際標準化機構/国際電機標準会議(ISO/IEC)のワーキンググループであるMoving Picture Experts Group(MPEG)などで規定されており、距離深度を256段階、すなわち8ビットの輝度値で表現する。この結果、距離情報は8ビットのグレースケールとなる。距離が近いほど高い値の輝度を割り当てるため、手前の被写体ほど白く、奥にいくほど黒くなる。また、この距離情報を実際の距離としてデコードするために、一番大きい(白い)値の距離と一番小さい(黒い)値の距離が別途規定されており、この距離の間とデプス情報の値を線形に割り当てることで、実際の距離を求めることができる。 Depth information used as corresponding point calculation information indicating corresponding points is defined by Moving International Picture Standards (ISO / IEC) working group Moving Picture Experts Group (MPEG). It is expressed in 256 levels, that is, 8-bit luminance values. As a result, the distance information is an 8-bit gray scale. Since a higher value of brightness is assigned as the distance is shorter, the subject at the front is whiter and the farther away is the blacker. In addition, in order to decode this distance information as an actual distance, the distance of the largest (white) value and the distance of the smallest (black) value are separately defined. By assigning linearly, the actual distance can be obtained.
 非特許文献1によると、初めに所望の仮想視点を挟み一番近くの実カメラが2つ選択される。上述のカメラ23とカメラ24の間に仮想視点がある場合、選択される実カメラはカメラ23(視点i)とカメラ24(視点i+1)である。この2つのカメラ映像を用いて仮想視点映像を作り出す。実際には、映像の中から1フレーム分の画像が抽出され、その画像を用いて仮想視点画像を合成する。選択された視点ごとに(ここでは2視点画像から)中間的に仮想視点画像が作り出され、最終的に仮想視点のカメラ位置がどちらのカメラに近いかによって、近いほうの画像を一面分選択、あるいはその位置による合成比率に応じてブレンドし、仮想視点画像を作り出す。 According to Non-Patent Document 1, two nearest real cameras are selected with a desired virtual viewpoint in between. When there is a virtual viewpoint between the camera 23 and the camera 24 described above, the selected real cameras are the camera 23 (viewpoint i) and the camera 24 (viewpoint i + 1). A virtual viewpoint video is created using the two camera videos. Actually, an image for one frame is extracted from the video, and a virtual viewpoint image is synthesized using the image. For each selected viewpoint (in this case, from two viewpoint images), a virtual viewpoint image is created in the middle, and finally the closest image is selected for one screen depending on which camera the virtual viewpoint camera position is close to. Or it blends according to the composition ratio by the position, and produces a virtual viewpoint image.
 具体的に非特許文献1に記載の、外部パラメータと内部パラメータが既知なカメラ映像とデプス情報から、所望の視点の映像を算出する方法について説明する。非特許文献1に記載の視点合成技術には、3Dワーピング技術が用いられている。3Dワーピング技術は特性の既知なカメラで取得した画像とデプス情報を用いて、画像の各画素に1対1対応する3次元空間内の位置が決まり、さらにその3次元空間内の1点を仮想視点映像の投影面に投影することで、実カメラにおける画素とそれに対応する仮想視点の画素との対応関係を求めることができる。この対応関係をもとに、実カメラに対応する画素のテクスチャ(画素値)を取得して、仮想視点画像の対応する画素に割り当てることで合成画像を作り出すことができる。以上が視点合成の基本的な考え方である。 Specifically, a method described in Non-Patent Document 1 for calculating a video of a desired viewpoint from a camera video with known external parameters and internal parameters and depth information will be described. A 3D warping technique is used for the viewpoint synthesis technique described in Non-Patent Document 1. The 3D warping technology uses an image acquired with a camera with known characteristics and depth information to determine a position in the three-dimensional space that corresponds to each pixel of the image in a one-to-one manner. By projecting onto the projection plane of the viewpoint video, it is possible to obtain the correspondence between the pixels in the real camera and the corresponding virtual viewpoint pixels. Based on this correspondence, a texture (pixel value) of a pixel corresponding to the real camera is acquired and assigned to the corresponding pixel of the virtual viewpoint image, so that a composite image can be created. The above is the basic concept of viewpoint synthesis.
 合成品質を高めるために、2つ以上の異なる視点ごとに中間的な仮想視点画像を作り、その中から適宜選択あるいは合成比率を決めて合成する方法がある。非特許文献1では、この選択、あるいは合成比率を計算する基準が、仮想視点の位置と選択された合成元の実カメラ位置の関係によって確定的に決められている。 In order to improve the synthesis quality, there is a method of creating an intermediate virtual viewpoint image for each of two or more different viewpoints, and combining them by appropriately selecting from them or determining a synthesis ratio. In Non-Patent Document 1, a reference for calculating this selection or composition ratio is definitely determined by the relationship between the position of the virtual viewpoint and the actual camera position of the selected composition source.
 本発明は、中間的に求められた仮想視点画像の局所的な信号の定常性を比較し、定常性の高い中間合成結果を適宜選択し、あるいは合成比率の重みを高くして合成画像を作り出すことによって、最終的な合成品質を高めていることを特徴としている。局所的な信号の定常性とは、複数の視点ごとに求められた中間的な仮想視点画像から抽出した局所領域において、合成された信号がある特徴的な信号に集中している程度のことである。特徴的な信号に集中するとは、例えば、隣の画素との差分の絶対値を算出して求まるエッジ量が特定の大きさに集中していることである。仮に合成品質が高い場合、局所領域において特定のエッジ量に集中する。
 一方合成品質が低い場合は、変換の過程で混入される変換ノイズのため、元来持っている特定のエッジ量に変換誤差が加わるため、結果として得られるエッジ量の分布は分散する。
The present invention compares the steadiness of local signals of virtual viewpoint images obtained in the middle, selects intermediate synthesis results with high steadiness, or creates a synthesized image by increasing the weight of the synthesis ratio. It is characterized by improving the final synthesis quality. Local signal continuity is the degree to which a synthesized signal is concentrated on a characteristic signal in a local region extracted from an intermediate virtual viewpoint image obtained for each of multiple viewpoints. is there. Concentrating on characteristic signals means that, for example, the edge amount obtained by calculating the absolute value of the difference from the adjacent pixel is concentrated on a specific size. If the synthesis quality is high, the local area concentrates on a specific edge amount.
On the other hand, when the synthesis quality is low, a conversion error is added to the inherent specific edge amount because of the conversion noise mixed in the conversion process, and the resulting distribution of the edge amount is dispersed.
 図4は、中間的に生成した仮想視点画像について、局所領域のエッジ量の発生確率の違いと定常性の関係を説明するための図である。横軸はエッジ量を示しており、縦軸は局所領域におけるエッジ量の発生確率を示している。図4(A)は、特定のエッジ量eにピークを持っており、かつ発生確率がその周辺に集中していることを示す図である。一方、図4(B)は、図4(A)と同じくエッジ量eにピークを持つが、その集中度は低く全体にブロードになっていることを示す図である。 FIG. 4 is a diagram for explaining the relationship between the difference in the occurrence probability of the edge amount of the local region and the continuity of the intermediate viewpoint generated virtual viewpoint image. The horizontal axis indicates the edge amount, and the vertical axis indicates the occurrence probability of the edge amount in the local region. FIG. 4A is a diagram showing that a specific edge amount e has a peak and the occurrence probabilities are concentrated in the vicinity thereof. On the other hand, FIG. 4 (B) is a diagram showing that the edge amount e has a peak as in FIG. 4 (A), but the concentration is low and the whole is broad.
 図4(B)に比べて、図4(A)は特定のエッジ量eに集中している(定常性が高い)ため、合成で得られた信号の信頼度が高いといえる。従って、図4(A)の特徴を持つ仮想視点画像を選択した方が、合成品質を高める可能性が高いといえる。この定常性を用いた判定に基づいた選択、あるいはブレンディングを、画素をずらしながら画像全域で行うことで、最適な合成画像を作り出すことが可能になる。 Compared with FIG. 4 (B), FIG. 4 (A) concentrates on a specific edge amount e (high stationarity), so it can be said that the reliability of the signal obtained by the synthesis is high. Therefore, it can be said that selecting the virtual viewpoint image having the characteristics shown in FIG. By selecting or blending based on the determination using the stationarity over the entire image while shifting the pixels, it is possible to create an optimal composite image.
<処理内容>
 具体的に本発明の仮想視点映像を生成する方法について、ブロック図(図1)とフローチャート(図5)を用いて説明する。
 被写体を撮影する実カメラはカメラ21,22,23,24,25であって、カメラ23とカメラ24の実カメラ位置の間の視点を合成する例について記載する。初めにS1-1において、仮想視点映像を合成するために利用する実カメラが選択される。実カメラの選択は、合成する仮想視点位置を挟むように最も近いカメラを2つ選択するものとする。つまり、求める仮想視点の位置をPv’とし、各実カメラの位置をPvi(i=-2,-1,0,1,2)とすると、次の関係を満足する2つのカメラを選択する。但し、カメラ位置Pは、図2に示す様に1次元配置されているため、その大小関係で位置を確定できるものとする。
<Processing content>
A method for generating a virtual viewpoint video according to the present invention will be specifically described with reference to a block diagram (FIG. 1) and a flowchart (FIG. 5).
The real cameras for photographing the subject are cameras 21, 22, 23, 24, and 25, and an example in which the viewpoint between the real camera positions of the camera 23 and the camera 24 is synthesized will be described. First, in S1-1, an actual camera to be used for synthesizing a virtual viewpoint video is selected. The selection of the real camera is made by selecting the two closest cameras so as to sandwich the virtual viewpoint position to be synthesized. In other words, if the position of the desired virtual viewpoint is P v ′ and the position of each real camera is P vi (i = −2, −1, 0, 1, 2), two cameras satisfying the following relationship are selected. To do. However, since the camera position P is one-dimensionally arranged as shown in FIG. 2, it is assumed that the position can be determined by the size relationship.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 前述の仮想視点位置に関する前提によると、仮想視点位置はカメラ23とカメラ24の間であるため、式(1)のPvi,Pvi+1はそれぞれPv0,Pv1に該当する。 According to the premise regarding the virtual viewpoint position described above, since the virtual viewpoint position is between the camera 23 and the camera 24, P vi and P vi + 1 in Expression (1) correspond to P v0 and P v1 , respectively.
 続いて、選択された実カメラによって撮影された映像から、処理対象の時刻tにおける画像を抽出する(S1-2,S1-3)と、実カメラの映像V,Vi+1の時刻tにおける画像(フレーム)は、I(i,t),I(i+1,t)である。抽出した画像は一旦、フレームバッファ1,2に格納される。同時に、対応する視点のデプス情報(距離情報)についても同時刻の画像(フレーム)D(i,t)、D(i+1,t)が抽出され(S1-4,S1-5)、フレームバッファ3,4に格納される。 Subsequently, when an image at the time t to be processed is extracted from the video taken by the selected real camera (S1-2, S1-3), the image at the time t of the real camera videos V i and V i + 1. (Frame) is I (i, t), I (i + 1, t). The extracted image is temporarily stored in the frame buffers 1 and 2. At the same time, images (frames) D (i, t) and D (i + 1, t) at the same time are extracted from the corresponding viewpoint depth information (distance information) (S1-4, S1-5), and the frame buffer 3 , 4 are stored.
 デプス情報は、さまざまな方法で取得することが可能である。ここでは、赤外線を物体に照射し、その光が反射して戻ってくるまでの時間を計測し、物体までの距離を求めることができる測距機器を用いて測定するものとする。赤外線の進む速度をVIR、赤外線を照射してから測距機器に戻ってくるまでの時間をttofとすると、求める物体までの距離dは、以下の式によって算出することができる。この処理を撮影画像と同じ解像度で実施し、デプス画像(デプス情報)を得る。 Depth information can be obtained in various ways. Here, it is assumed that measurement is performed using a distance measuring device that can irradiate an object with infrared rays, measure the time until the light is reflected and returned, and obtain the distance to the object. The distance d 0 to the object to be calculated can be calculated by the following equation, where V IR is the speed at which the infrared light travels, and t tof is the time from when the infrared light is irradiated until it returns to the distance measuring device. This processing is performed at the same resolution as the captured image, and a depth image (depth information) is obtained.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 非特許文献1によると、次式によって仮想視点の画像を作り出すことができる。この処理は3Dワーピングと呼ばれ、仮想視点合成部5,6において実施される処理S1-6,S1-7に該当する。 According to Non-Patent Document 1, a virtual viewpoint image can be created by the following equation. This process is called 3D warping and corresponds to the processes S1-6 and S1-7 performed in the virtual viewpoint synthesis units 5 and 6.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 ここで、d,d´は、それぞれ実カメラ位置の距離情報と仮想視点位置の距離情報である。A,R,tは、それぞれ実カメラの内部パラメータと外部パラメータの一部であるカメラの回転角度、カメラの3次元位置を表している。A´,R´,t´は、仮想視点カメラの内部パラメータと外部パラメータの一部である回転角度、3次元位置を表している。R-1,A-1は、対応する行列の逆行列を示す。また、c,c´は実カメラの画像の座標と仮想カメラの画像の座標を、通常の2次元座標に1次元追加した斉次座標系で示したものである。例えば2次元の座標(x,y)を斉次座標系で表すと、(x,y,1)のように、次元数を一つ増やし、追加した次元部分には1を代入することで作ることができる。 Here, d 0 and d 0 ′ are distance information of the real camera position and virtual viewpoint position, respectively. A, R, and t represent the camera rotation angle and the three-dimensional position of the camera, which are part of the internal and external parameters of the actual camera, respectively. A ′, R ′, and t ′ represent a rotation angle and a three-dimensional position that are a part of the internal parameters and the external parameters of the virtual viewpoint camera. R −1 and A −1 are inverse matrices of the corresponding matrix. Further, c and c ′ indicate the coordinate of the real camera image and the coordinate of the virtual camera image in a homogeneous coordinate system in which one dimension is added to normal two-dimensional coordinates. For example, when a two-dimensional coordinate (x, y) is expressed in a homogeneous coordinate system, the number of dimensions is increased by one as in (x, y, 1), and 1 is assigned to the added dimension portion. be able to.
 式(3)によって実カメラの座標cと仮想視点の座標c´の間の対応関係が求まり、仮想視点のすべての画素に対応する実カメラの画素値を抽出し貼り付けることで、仮想視点の画像を作り出すことが可能になる。生成された仮想視点の画像は各視点ごとに一旦フレームバッファ7,8に格納される。 The correspondence relationship between the coordinate c of the real camera and the coordinate c ′ of the virtual viewpoint is obtained by the expression (3), and the pixel values of the real camera corresponding to all the pixels of the virtual viewpoint are extracted and pasted. Images can be created. The generated virtual viewpoint image is temporarily stored in the frame buffers 7 and 8 for each viewpoint.
 上記処理を実カメラ画像のI(i,t)、I(i+1,t)について行うことで、図6のように2枚の中間的な合成視点画像I(i´,t)、Ii+1(i´,t)が得られる。ここで、61,62は実カメラ、63は仮想視点カメラで、I,Ii+1は、それぞれ視点i,i+1より合成した仮想視点画像であることを示している。なお、合成視点をi´としている。 By performing the above process on I (i, t) and I (i + 1, t) of the real camera image, two intermediate composite viewpoint images I i ( i ′, t) and I i + 1 as shown in FIG. (I ′, t) is obtained. Here, 61 and 62 are real cameras, 63 is a virtual viewpoint camera, and I i and I i + 1 indicate virtual viewpoint images synthesized from the viewpoints i and i + 1, respectively. The composite viewpoint is i ′.
 生成された2枚の中間的な仮想視点画像I(i´,t)、Ii+1(i´,t)は2次元平面であり、x座標とy座標の位置を示すために、それぞれI(i´,t,x,y)、Ii+1(i´,t,x,y)とする。マスク形成部9、10において、処理対象画素(x,y)を中心に以下のように7×7のサイズのマスクを形成する(S1-8,S1-9)。 The two generated intermediate virtual viewpoint images I i ( i ′, t) and I i + 1 ( i ′, t) are two-dimensional planes, and each of them represents the position of the x coordinate and the y coordinate. Let i ( i ′, t, x, y) and I i + 1 ( i ′, t, x, y). In the mask forming units 9 and 10, a 7 × 7 size mask is formed as follows with the processing target pixel (x, y) as the center (S1-8, S1-9).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 続いて、定常性特徴量算出部11,12について説明する。本発明では定常性の判定に、情報理論で扱われるエントロピー(平均情報量)を適用する。はじめに情報量とは、複数の事象が起こり得るときに、ある事象が起きた際にそれがどれほど起こりにくいかを表す尺度である。そして、全ての事象の情報量の平均値(期待値)をエントロピーと呼ぶ。
 例えば、図4(A)と図4(B)のピーク値eの事象について比べると、図4(A)の方が図4(B)よりもピーク値eの事象の発生確率が高いため、図4(A)の場合はピーク値eの事象の情報を得たとしてもその情報量は高くない。なぜなら、簡単に予想が着くためである。
Next, the continuity feature amount calculation units 11 and 12 will be described. In the present invention, entropy (average amount of information) handled in information theory is applied to the determination of continuity. First, the amount of information is a measure that represents how difficult a certain event occurs when a plurality of events can occur. And the average value (expected value) of the information amount of all the events is called entropy.
For example, when the events of the peak value e in FIGS. 4A and 4B are compared, the occurrence probability of the event of the peak value e is higher in FIG. 4A than in FIG. In the case of FIG. 4A, even if information on the event of the peak value e is obtained, the amount of information is not high. This is because it is easy to predict.
 また、すべての事象の平均の情報量は、偏りのある図4(A)の方が平均情報量(エントロピー)の値は小さくなる。つまり、エントロピーは、発生確率に偏りがある様な、高い確率で発生事象を推定できる場合にはその値が小さくなる。従って、エントロピーの値が小さいほど、得られた信号の定常性は高いと判断することができる。 In addition, the average information amount (entropy) of the average information amount of all events is smaller in FIG. That is, the value of entropy decreases when the occurrence event can be estimated with a high probability such that the occurrence probability is biased. Therefore, it can be determined that the smaller the entropy value, the higher the continuity of the obtained signal.
 式(4)で求めたマスク内の各画素について隣の画素との差分の絶対値を求め、前記エントロピーを算出するための事象とする。各事象の発生頻度の算出は次式で行うことができる。 For each pixel in the mask obtained by Equation (4), an absolute value of a difference from an adjacent pixel is obtained, and this is an event for calculating the entropy. The frequency of occurrence of each event can be calculated by the following equation.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 なお、扱う画像の画素値は、一般的にRGB値やYC 等の3つの値によって構成されるが、ここでは説明を簡単にするために、以下の変換を行ったグレースケール値とする。 The pixel value of the deal image, generally by three values, such as RGB values and YC b C R is formed, wherein in order to simplify the explanation, gray scale values and performing the following conversion To do.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 さらに、式(5)の発生頻度をマスク内の画素数で除算することで各事象の発生確率を求めることができ、以下の式で求めることが可能である。 Furthermore, the occurrence probability of each event can be obtained by dividing the occurrence frequency of equation (5) by the number of pixels in the mask, and can be obtained by the following equation.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 なお、numMは定数で式(4)のマスク内の画素数である。 Note that numM is a constant and is the number of pixels in the mask of Equation (4).
 定常性特徴量算出部11,12で行われるエントロピーの算出(S1-10,S1-11)は、次の式で行う。 The entropy calculation (S1-10, S1-11) performed by the stationary feature quantity calculation units 11 and 12 is performed by the following equation.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 上記エントロピーを各視点から生成された中間的な仮想視点画像ごとに算出する。図6では、選択した実カメラが2つであるため、式(7)によって算出されるエントロピーの値は各画素ごとに2つである。
 得られたエントロピーの値をE、Ei+1とすると、次の式によって合成比率を決めることができる(S1-12)。この処理は、合成比率算出部13で行われる。
The entropy is calculated for each intermediate virtual viewpoint image generated from each viewpoint. In FIG. 6, since there are two selected real cameras, two entropy values are calculated for each pixel by equation (7).
When the obtained entropy values are E i and E i + 1 , the composition ratio can be determined by the following equation (S1-12). This process is performed by the synthesis ratio calculation unit 13.
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 エントロピーの値が小さい方が、定常性が高く合成結果として信頼できるため、合成比率を高くする必要がある。式(8)では、第2項によって選択した複数のカメラのエントロピーに対する所定のカメラのエントロピーの占める割合が算出される。エントロピーの値が小さいほど合成比率を高くする必要があるため、1.0から第2項を減算して合成比率としている。
 合成部14において、最終的に次式によって合成処理が実現される(S1-13)。
The smaller the entropy value, the higher the stationarity and the more reliable the synthesis result. Therefore, it is necessary to increase the synthesis ratio. In Expression (8), the ratio of the entropy of a predetermined camera to the entropy of a plurality of cameras selected by the second term is calculated. The smaller the entropy value, the higher the composition ratio needs to be. Therefore, the second term is subtracted from 1.0 to obtain the composition ratio.
In the synthesizing unit 14, the synthesizing process is finally realized by the following equation (S1-13).
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 以上の処理を、すべての画素が終了するまで(S1-14)繰り返すことで合成視点画像を生成することができる。 By repeating the above processing until all the pixels are completed (S1-14), a synthesized viewpoint image can be generated.
 本実施例では、合成部14において、異なる2つの視点から中間的に合成された合成視点画像を画素単位で適宜重みづけを行い合成する例について示してきたが、各処理部で行う計算式は3視点以上の複数の視点にも対応している。従って、図1において1つの視点分に対応する、フレームバッファ1,フレームバッファ3,仮想視点合成部5,フレームバッファ7,マスク形成部9,及び定常性特徴量算出部11の構成を扱う視点分追加することで、合成のために利用する実カメラの数を増やすことが可能である。視点数を増やすことで、多方面からの被写体の情報を適宜活用することでより合成品質を高めることが可能になる。 In the present embodiment, the synthesis unit 14 has shown an example in which a synthesized viewpoint image that is intermediately synthesized from two different viewpoints is appropriately weighted and synthesized in units of pixels. It also supports multiple viewpoints with 3 or more viewpoints. Accordingly, in FIG. 1, the view points corresponding to one view point handle the configurations of the frame buffer 1, the frame buffer 3, the virtual view point synthesis unit 5, the frame buffer 7, the mask formation unit 9, and the stationary feature value calculation unit 11. By adding, it is possible to increase the number of real cameras used for composition. By increasing the number of viewpoints, it is possible to further improve the synthesis quality by appropriately using subject information from various fields.
 また、式(8)においてエントロピー値に応じて合成比率を算出して、その結果を式(9)に適用してブレンドする例を示したが、エントロピーの最小となるカメラの合成比率のみ1.0にして、その他を0にすることで、ブレンド処理ではなく、合成画像を選択による方法で生成することも可能である。 In addition, the example in which the composition ratio is calculated according to the entropy value in Expression (8) and the result is applied to Expression (9) for blending has been shown, but only the composition ratio of the camera with the minimum entropy is 1. By setting it to 0 and setting the others to 0, it is also possible to generate a composite image by a method by selection instead of blending.
(第2の実施例)
 図7は、本発明の第2の実施例の形態を示すブロック図である。第1の実施例と共通するブロックについては、同じ番号を割り当てて対応関係のみを示す。
 第1の実施例と第2の実施例の違いは、視点の異なる画像の画素ごとの対応関係を示す情報を外部から入力するか、内部で対応関係を示す情報を作り出すのかの違いである。従って、第2の実施例では、第1の実施例の対応視点情報を格納するフレームバッファ3,4は存在しない。また、第2の実施例で追加したブロックは、視差ベクトル算出部71である。
(Second embodiment)
FIG. 7 is a block diagram showing the second embodiment of the present invention. For the blocks that are common to the first embodiment, the same numbers are assigned and only the correspondence is shown.
The difference between the first embodiment and the second embodiment is whether information indicating a correspondence relationship for each pixel of an image with a different viewpoint is input from the outside or information indicating the correspondence relationship is created internally. Therefore, in the second embodiment, there are no frame buffers 3 and 4 for storing the corresponding viewpoint information of the first embodiment. The block added in the second embodiment is a disparity vector calculation unit 71.
 仮想視点合成部72と73は入力する対応視点情報の内容が異なるため仮想視点合成部5、5と処理が異なり、番号を実施例1(図1)と変えている。以下、第1の実施例と異なる部分について、図10のフローチャートともに説明する。選択された実カメラによって撮影された映像から、処理対象の時刻tにおける画像を抽出し(S2-2,S2-3)これらの画像から視差ベクトルを算出する(S2-4)。
 対応視点情報を内部で生成して、合成視点を作成する方法については、特許文献1に記載の視点合成方法を用いることができる。本方式に従えば、画像の対応関係は視差ベクトル算出部71で実施され、以下の式E(p)を最小にする視差量Pを算出して求めることができる。
The virtual viewpoint synthesis units 72 and 73 are different in processing from the virtual viewpoint synthesis units 5 and 5 because the contents of the corresponding viewpoint information to be input are different, and the numbers are changed from those in the first embodiment (FIG. 1). Hereinafter, parts different from the first embodiment will be described together with the flowchart of FIG. An image at time t to be processed is extracted from the video captured by the selected real camera (S2-2, S2-3), and a disparity vector is calculated from these images (S2-4).
As a method for generating the corresponding viewpoint by internally generating the corresponding viewpoint information, the viewpoint synthesis method described in Patent Document 1 can be used. According to this method, the correspondence between images is implemented by the parallax vector calculation unit 71, and can be obtained by calculating the parallax amount P that minimizes the following equation E (p).
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
 ここで、画像は実施例1で示したものと同じで、視点合成で生成する視点に近い2つの視点iとi+1を用いるものとする。
 Wはマッチングを行う局所マスクを示しており、例えば7×7のサイズのマスクである。上記処理をすべての画素に対して行うことで、全画素の対応関係を求めることができる。
Here, the image is the same as that shown in the first embodiment, and two viewpoints i and i + 1 close to the viewpoint generated by viewpoint synthesis are used.
W represents a local mask for performing matching, and is a mask having a size of 7 × 7, for example. By performing the above process on all the pixels, the correspondence between all the pixels can be obtained.
 次に、対応関係の得られた2枚の画像を用いて中間の画像を合成する方法について図8を用いて説明する。図8に示す様に、実カメラ81,82の間の距離がLである時に、仮想視点カメラ83が実カメラ81から距離DL、実カメラ82から距離DRの場所に位置しているものとする。画素ごとの対応性を求めるために、2つの対応した点を結ぶ水平線分上に間の点も存在するものと仮定する。図9に示す様な対応関係が得られたとすると、間の視点の画素はカメラ81(視点i)を基準にすると次式で求めることができる(S2-5)。本処理は、仮想視点合成部72で行われる。 Next, a method for synthesizing an intermediate image using two images having a corresponding relationship will be described with reference to FIG. As shown in FIG. 8, when the distance between the real cameras 81 and 82 is L, the virtual viewpoint camera 83 is located at a distance DL from the real camera 81 and a distance DR from the real camera 82. . In order to obtain the correspondence for each pixel, it is assumed that there are also points on the horizontal line connecting two corresponding points. If the correspondence as shown in FIG. 9 is obtained, the pixels of the viewpoint between them can be obtained by the following equation based on the camera 81 (viewpoint i) (S2-5). This process is performed by the virtual viewpoint synthesis unit 72.
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 同様に、カメラ82(視点i+1)を基準にすると、間の視点の画素は次式から求めることができる(S2-6)。本処理は、仮想視点合成部73で行われる。 Similarly, when the camera 82 (viewpoint i + 1) is used as a reference, the viewpoint pixel in between can be obtained from the following equation (S2-6). This process is performed by the virtual viewpoint synthesis unit 73.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 特許文献1に記載の方式では、対応する2点間の距離によって式(11)と式(12)を適応的に切り替えて仮想視点画像を求めているが、本発明では、両方のカメラの中間合成結果を一旦算出しておき、局所的な信号の定常性を用いて合成する。
 複数の中間的な合成結果を算出してから以降の処理(定常性算出、合成比率算出、ブレンド処理)は第1の実施例と同じである。
In the method described in Patent Document 1, a virtual viewpoint image is obtained by adaptively switching between Equation (11) and Equation (12) depending on the distance between two corresponding points. The synthesis result is calculated once and synthesized using the continuity of the local signal.
Subsequent processing after calculating a plurality of intermediate synthesis results (stationaryness calculation, synthesis ratio calculation, blending processing) is the same as in the first embodiment.
(第3の実施例)プログラム
 また、本発明はコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体に、実施例1あるいは実施例2で示したように、複数の異なる視点の映像を入力し、その異なる視点ごとに中間的に求める合成視点映像作成する。求めた中間的な合成映像の局所的な定常性に基づいてその合成比率を算出し、合成することで仮想視点映像を生成する方法をソフトウエア処理として記録することもできる。
(Third Embodiment) Program In addition, the present invention is a computer-readable recording medium that records a program to be executed by a computer. , And a composite viewpoint video that is obtained intermediately for each different viewpoint is created. A method of generating a virtual viewpoint video by calculating a synthesis ratio based on the obtained local continuity of the intermediate synthesized video and synthesizing it can also be recorded as software processing.
 結果として、仮想視点画像の合成品質を向上させることが可能である。記録媒体としては、マイクロコンピュータで処理が行われるために図示しないメモリ、例えばROMのようなプログラムメディアであってもよく、図示しない外部記憶装置としてのプログラム読取装置が設けられ、そこに記録媒体を挿入することで読み取り可能なプログラムメディアであってもよい。いずれの場合においても、格納されているプログラムはマイクロプロセッサがアクセスして実行させる構成であってもよいし、プログラムを読み出し、読み出されたプログラムは、マイクロコンピュータの図示されていないプログラム記憶エリアにダウンロードされて、そのプログラムが実行される方式であってもよい。この場合、ダウンロード用のプログラムは予め本体装置に格納されているものとする。 As a result, it is possible to improve the synthesis quality of the virtual viewpoint image. The recording medium may be a non-illustrated memory, for example, a program medium such as a ROM because processing is performed by a microcomputer, and a program reading device as an external storage device (not illustrated) is provided, and the recording medium is stored therein. It may be a program medium that can be read by being inserted. In any case, the stored program may be configured to be accessed and executed by the microprocessor, and the program is read out, and the read program is stored in a program storage area (not shown) of the microcomputer. A method of downloading and executing the program may be used. In this case, it is assumed that the download program is stored in the main device in advance.
 ここで、上記プログラムメディアは、本体と分離可能に構成される記録媒体であり、磁気テープやカセットテープ等のテープ系、フロッピーディスク(登録商標)やハードディスク等の磁気ディスク並びにCD-ROM/MO/MD/DVD等の光ディスクのディスク系、ICカード(メモリカードを含む)/光カード等のカード系、あるいはマスクROM、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)、フラッシュROM等による半導体メモリを含めた固定的にプログラムを担持する媒体であってもよい。 Here, the program medium is a recording medium configured to be separable from the main body, and includes a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy disk (registered trademark) and a hard disk, and a CD-ROM / MO / Disk systems for optical disks such as MD / DVD, card systems such as IC cards (including memory cards) / optical cards, or mask ROM, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Programmable Programmable Read Only Memory), flash It may be a medium that carries a fixed program including a semiconductor memory such as a ROM.
 また、この場合、インターネットを含む通信ネットワークを接続可能なシステム構成であることから、通信ネットワークからプログラムをダウンロードするように流動的にプログラムを担持する媒体であってもよい。なお、このように通信ネットワークからプログラムをダウンロードする場合には、そのダウンロード用のプログラムは予め本体装置に格納しておくか、あるいは別の記録媒体からインストールされるものであってもよい。上記記録媒体は、デジタルカラー画像形成装置やコンピュータシステムに備えられるプログラム読み取り装置により読み取られることで上述した画像処理方法が実行される。なお、上記コンピュータシステムは、WEBカメラなどの汎用画像入力装置、所定のプログラムがロードされることにより上記画像処理方法など様々な処理が行われるコンピュータ、コンピュータの処理結果を表示するディスプレイ・液晶ディスプレイなどの画像表示装置より構成される。さらには、ネットワークを介してサーバーなどに接続するための通信手段としてのネットワークカードやモデムなどが備えられる。 Further, in this case, since the system configuration is capable of connecting to a communication network including the Internet, the medium may be a medium that dynamically carries the program so as to download the program from the communication network. When the program is downloaded from the communication network in this way, the download program may be stored in the main device in advance or installed from another recording medium. The recording medium is read by a program reading device provided in a digital color image forming apparatus or a computer system, whereby the above-described image processing method is executed. The computer system includes a general-purpose image input device such as a WEB camera, a computer that performs various processes such as the image processing method by loading a predetermined program, a display / liquid crystal display that displays the processing results of the computer, and the like. Image display device. Furthermore, a network card, a modem, and the like are provided as communication means for connecting to a server or the like via a network.
1,2,3,4,7,8…フレームバッファ、5…仮想視点合成部、5,6…仮視点合成部、9,10…マスク形成部、11,12…定常性特徴量算出部、13…合成比率算出部、14…合成部、23…カメラ、24…カメラ、71…視差ベクトル算出部、72…仮想視点合成部、73…仮想視点合成部、81,82…実カメラ、83…仮想視点カメラ。 1, 2, 3, 4, 7, 8 ... frame buffer, 5 ... virtual viewpoint synthesis unit, 5, 6 ... temporary viewpoint synthesis unit, 9, 10 ... mask formation unit, 11, 12 ... stationary feature quantity calculation unit, DESCRIPTION OF SYMBOLS 13 ... Composition ratio calculation part, 14 ... Composition part, 23 ... Camera, 24 ... Camera, 71 ... Disparity vector calculation part, 72 ... Virtual viewpoint composition part, 73 ... Virtual viewpoint composition part, 81, 82 ... Real camera, 83 ... Virtual viewpoint camera.

Claims (10)

  1.  複数の視点のカメラ映像を用いて、該複数の視点の中間に位置する仮想視点画像を合成する画像処理装置であって、
     前記複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成部と、
     該仮想視点合成部が合成した前記中間的な仮想視点画像のそれぞれについて、各前記仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出部と、
     該定常性算出部が算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出部と、
     該合成比率算出部で算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成部と、を有することを特徴とする画像処理装置。
    An image processing apparatus that synthesizes a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints,
    A virtual viewpoint synthesizing unit that generates a plurality of intermediate virtual viewpoint images based on each of the camera videos of the plurality of viewpoints using information indicating corresponding points between the images of the plurality of viewpoints;
    For each of the intermediate virtual viewpoint images synthesized by the virtual viewpoint synthesis unit, a continuity calculation unit that calculates a feature amount indicating local continuity of each virtual viewpoint image;
    A composition ratio calculation unit that calculates a ratio of combining the plurality of intermediate virtual viewpoint images based on the feature amount calculated by the continuity calculation unit;
    An image processing apparatus comprising: a combining unit that combines the plurality of intermediate virtual viewpoint images according to the ratio calculated by the combining ratio calculation unit and combines the final virtual viewpoint image.
  2.  前記特徴量は、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴とする請求項1に記載の画像処理装置。 2. The image processing apparatus according to claim 1, wherein the feature amount is entropy having an edge amount in a mask centering on a processing target pixel of the intermediate virtual viewpoint image as an event.
  3.  前記複数の視点は、2視点以上の視点であることを特徴とする請求項1または2に記載の画像処理装置。 The image processing apparatus according to claim 1 or 2, wherein the plurality of viewpoints are two or more viewpoints.
  4.  前記対応点を示す情報を外部より入力することを特徴とする請求項1~3のいずれか1に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 3, wherein information indicating the corresponding points is input from the outside.
  5.  前記仮想視点合成部は、前記複数の視点の画像間における対応関係を示す情報を算出し、該対応関係を示す情報に基づき、相互に対応性のある画素を補間することにより、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成することを特徴とする請求項1~3のいずれか1に記載の画像処理装置 The virtual viewpoint synthesizer calculates information indicating a correspondence relationship between the images of the plurality of viewpoints, and interpolates pixels having a correspondence with each other based on the information indicating the correspondence relationship. The image processing apparatus according to any one of claims 1 to 3, wherein a plurality of intermediate virtual viewpoint images are generated based on each of the camera images of
  6.  複数の視点のカメラ映像を用いて、該複数の視点の中間に位置する仮想視点画像を合成する画像処理方法であって、
     前記複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成ステップと、
     該仮想視点合成ステップで合成した前記中間的な仮想視点画像のそれぞれについて、各該仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出ステップと、
     該定常性算出ステップで算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出ステップと、
     該合成比率算出ステップで算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成ステップと、を有することを特徴とする画像処理方法。
    An image processing method for synthesizing a virtual viewpoint image located in the middle of a plurality of viewpoints using camera images of a plurality of viewpoints,
    A virtual viewpoint synthesis step of generating a plurality of intermediate virtual viewpoint images based on each of the camera videos of the plurality of viewpoints using information indicating corresponding points between the images of the plurality of viewpoints;
    For each of the intermediate virtual viewpoint images synthesized in the virtual viewpoint synthesis step, a continuity calculation step for calculating a feature amount indicating local continuity of each virtual viewpoint image;
    A composition ratio calculating step for calculating a ratio of combining the plurality of intermediate virtual viewpoint images based on the feature amount calculated in the continuity calculating step;
    An image processing method comprising: a combining step of combining the plurality of intermediate virtual viewpoint images according to the ratio calculated in the combining ratio calculating step and combining the final virtual viewpoint image.
  7.  前記特徴量は、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴とする請求項6に記載の画像処理方法。 The image processing method according to claim 6, wherein the feature amount is entropy with an edge amount in a mask centered on a processing target pixel of the intermediate virtual viewpoint image as an event.
  8.  コンピュータに、
     複数の視点のカメラ映像から取得した複数の視点の画像間における対応点を示す情報を使用して、前記複数の視点のカメラ映像のそれぞれを基準とした中間的な前記仮想視点画像を複数生成する仮想視点合成ステップと、
     該仮想視点合成ステップで合成した前記中間的な仮想視点画像のそれぞれについて、各該仮想視点画像の局所的な定常性を示す特徴量を算出する定常性算出ステップと、
     該定常性算出ステップで算出した前記特徴量に基づいて、複数の前記中間的な仮想視点画像を合成する比率を算出する合成比率算出ステップと、
     該合成比率算出ステップで算出した比率に応じて前記複数の中間的な仮想視点画像を合成し、最終の仮想視点画像を合成する合成ステップと、を実行させるための画像処理プログラム。
    On the computer,
    Generating a plurality of intermediate virtual viewpoint images based on each of the plurality of viewpoint camera images using information indicating corresponding points between the images of the plurality of viewpoints acquired from the camera images of the plurality of viewpoints. A virtual viewpoint synthesis step;
    For each of the intermediate virtual viewpoint images synthesized in the virtual viewpoint synthesis step, a continuity calculation step for calculating a feature amount indicating local continuity of each virtual viewpoint image;
    A composition ratio calculating step for calculating a ratio of combining the plurality of intermediate virtual viewpoint images based on the feature amount calculated in the continuity calculating step;
    An image processing program for executing a combining step of combining the plurality of intermediate virtual viewpoint images in accordance with the ratio calculated in the combining ratio calculating step and combining the final virtual viewpoint image.
  9.  前記特徴量は、前記中間的な仮想視点画像の処理対象画素を中心とするマスクにおけるエッジ量を事象とする、エントロピーであることを特徴とする請求項8に記載の画像処理プログラム。 The image processing program according to claim 8, wherein the feature amount is entropy with an edge amount in a mask centered on a processing target pixel of the intermediate virtual viewpoint image as an event.
  10.  請求項8または9に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the program according to claim 8 or 9 is recorded.
PCT/JP2011/065003 2010-09-28 2011-06-30 Image processing device, image processing method, program, and recording medium WO2012042998A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010-216385 2010-09-28
JP2010216385A JP4939639B2 (en) 2010-09-28 2010-09-28 Image processing apparatus, image processing method, program, and recording medium

Publications (1)

Publication Number Publication Date
WO2012042998A1 true WO2012042998A1 (en) 2012-04-05

Family

ID=45892478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/065003 WO2012042998A1 (en) 2010-09-28 2011-06-30 Image processing device, image processing method, program, and recording medium

Country Status (2)

Country Link
JP (1) JP4939639B2 (en)
WO (1) WO2012042998A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2954675A1 (en) * 2013-02-06 2015-12-16 Koninklijke Philips N.V. System for generating intermediate view images
EP2954674B1 (en) 2013-02-06 2017-03-08 Koninklijke Philips N.V. System for generating an intermediate view image
CN110476415A (en) * 2017-04-05 2019-11-19 夏普株式会社 Image data generating means, picture reproducer, image data generation method, control program and recording medium
CN113112613A (en) * 2021-04-22 2021-07-13 北京房江湖科技有限公司 Model display method and device, electronic equipment and storage medium
JP7469084B2 (en) 2019-03-19 2024-04-16 株式会社ソニー・インタラクティブエンタテインメント Method and system for generating an image - Patents.com

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5764097B2 (en) * 2012-07-03 2015-08-12 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program
WO2014083752A1 (en) 2012-11-30 2014-06-05 パナソニック株式会社 Alternate viewpoint image generating device and alternate viewpoint image generating method
WO2014192487A1 (en) 2013-05-29 2014-12-04 日本電気株式会社 Multi-eye imaging system, acquired image synthesis method, and program
US10757399B2 (en) 2015-09-10 2020-08-25 Google Llc Stereo rendering system
US20210183096A1 (en) * 2016-03-15 2021-06-17 Sony Corporation Image processing apparatus, imaging apparatus, image processing method and program
JP6734253B2 (en) 2017-12-20 2020-08-05 ファナック株式会社 Imaging device including a visual sensor for imaging a workpiece
JP7332326B2 (en) * 2019-04-18 2023-08-23 日本放送協会 Video effect device and program
CN110675404B (en) * 2019-09-03 2023-03-21 RealMe重庆移动通信有限公司 Image processing method, image processing apparatus, storage medium, and terminal device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08201941A (en) * 1995-01-12 1996-08-09 Texas Instr Inc <Ti> Three-dimensional image formation
JP2001175863A (en) * 1999-12-21 2001-06-29 Nippon Hoso Kyokai <Nhk> Method and device for multi-viewpoint image interpolation
JP2005149127A (en) * 2003-11-14 2005-06-09 Sony Corp Imaging display device and method, and image sending and receiving system
JP2006039770A (en) * 2004-07-23 2006-02-09 Sony Corp Image processing apparatus, method, and program
JP2006285415A (en) * 2005-03-31 2006-10-19 Sony Corp Image processing method, device and program
JP2009211335A (en) * 2008-03-04 2009-09-17 Nippon Telegr & Teleph Corp <Ntt> Virtual viewpoint image generation method, virtual viewpoint image generation apparatus, virtual viewpoint image generation program, and recording medium from which same recorded program can be read by computer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08201941A (en) * 1995-01-12 1996-08-09 Texas Instr Inc <Ti> Three-dimensional image formation
JP2001175863A (en) * 1999-12-21 2001-06-29 Nippon Hoso Kyokai <Nhk> Method and device for multi-viewpoint image interpolation
JP2005149127A (en) * 2003-11-14 2005-06-09 Sony Corp Imaging display device and method, and image sending and receiving system
JP2006039770A (en) * 2004-07-23 2006-02-09 Sony Corp Image processing apparatus, method, and program
JP2006285415A (en) * 2005-03-31 2006-10-19 Sony Corp Image processing method, device and program
JP2009211335A (en) * 2008-03-04 2009-09-17 Nippon Telegr & Teleph Corp <Ntt> Virtual viewpoint image generation method, virtual viewpoint image generation apparatus, virtual viewpoint image generation program, and recording medium from which same recorded program can be read by computer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHUN'ICHI KANEKO ET AL.: "Introducing Structurization of Edge Segment Groups Based on Prominency Entropy to Binocular Depth Calculation", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J75-D-II, no. 10, 25 October 1992 (1992-10-25), pages 1649 - 1659 *
YUTAKA KUNITA ET AL.: "Real-Time Rendering System of 3D Images Using Layered Probability Maps", THE JOURNAL OF THE INSTITUTE OF IMAGE INFORMATION AND TELEVISION ENGINEERS, vol. 60, no. 7, 1 July 2006 (2006-07-01), pages 1102 - 1110 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2954675A1 (en) * 2013-02-06 2015-12-16 Koninklijke Philips N.V. System for generating intermediate view images
EP2954674B1 (en) 2013-02-06 2017-03-08 Koninklijke Philips N.V. System for generating an intermediate view image
US9967537B2 (en) 2013-02-06 2018-05-08 Koninklijke Philips N.V. System for generating intermediate view images
CN110476415A (en) * 2017-04-05 2019-11-19 夏普株式会社 Image data generating means, picture reproducer, image data generation method, control program and recording medium
JP7469084B2 (en) 2019-03-19 2024-04-16 株式会社ソニー・インタラクティブエンタテインメント Method and system for generating an image - Patents.com
CN113112613A (en) * 2021-04-22 2021-07-13 北京房江湖科技有限公司 Model display method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
JP4939639B2 (en) 2012-05-30
JP2012073702A (en) 2012-04-12

Similar Documents

Publication Publication Date Title
JP4939639B2 (en) Image processing apparatus, image processing method, program, and recording medium
KR100950046B1 (en) Apparatus of multiview three-dimensional image synthesis for autostereoscopic 3d-tv displays and method thereof
US10070115B2 (en) Methods for full parallax compressed light field synthesis utilizing depth information
JP5238429B2 (en) Stereoscopic image capturing apparatus and stereoscopic image capturing system
JP6021541B2 (en) Image processing apparatus and method
JP5224124B2 (en) Imaging device
JP6027034B2 (en) 3D image error improving method and apparatus
ES2676055T3 (en) Effective image receiver for multiple views
JP4942221B2 (en) High resolution virtual focal plane image generation method
US7126598B2 (en) 3D image synthesis from depth encoded source view
EP2532166B1 (en) Method, apparatus and computer program for selecting a stereoscopic imaging viewpoint pair
JP4942106B2 (en) Depth data output device and depth data receiver
US20100134599A1 (en) Arrangement and method for the recording and display of images of a scene and/or an object
KR20110124473A (en) 3-dimensional image generation apparatus and method for multi-view image
JP2008257686A (en) Method and system for processing 3d scene light field
JP2013527646A5 (en)
JP6195076B2 (en) Different viewpoint image generation apparatus and different viewpoint image generation method
JP2010226500A (en) Device and method for displaying stereoscopic image
Berretty et al. Real-time rendering for multiview autostereoscopic displays
JP6128748B2 (en) Image processing apparatus and method
JP2010181826A (en) Three-dimensional image forming apparatus
WO2018127629A1 (en) Method and apparatus for video depth map coding and decoding
WO2011129164A1 (en) Multi-viewpoint image coding device
JP7437941B2 (en) Stereoscopic image generation device and its program
WO2019008233A1 (en) A method and apparatus for encoding media content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11828559

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11828559

Country of ref document: EP

Kind code of ref document: A1