JP2023127450A

JP2023127450A - Image processing device, image processing method, and program

Info

Publication number: JP2023127450A
Application number: JP2022031253A
Authority: JP
Inventors: 信一上村; Shinichi Kamimura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2023-09-13

Abstract

To provide a technology for improving the accuracy of interpolating a distance information image.SOLUTION: An image processing device obtains a photographed image obtained by a photographing device at a first frame rate, and a distance information image generated at a second frame rate lower than the first frame rate and formed of a pixel value based on a distance to a subject. The image processing device generates a distance information image corresponding to a to-be-interpolated frame time which is a frame time of the first frame rate but is not a frame time of the second frame rate, on the basis of the obtained distance information image and a photographed image corresponding to the to-be-interpolated frame time.SELECTED DRAWING: Figure 1B

Description

本開示は、画像処理装置、画像処理方法およびプログラム関する。 The present disclosure relates to an image processing device, an image processing method, and a program.

昨今、複数のカメラを異なる位置から同期して撮影することにより得られる複数の画像を用いて仮想視点からの画像（以下、仮想視点画像）を生成する技術が注目されている。この技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴閲覧することが出来るため、通常の画像と比較してユーザに高臨場感を与えることができる。複数の画像に基づく仮想視点画像は、複数のカメラが撮影した画像をサーバなどの画像処理部に集約し、画像処理部にてレンダリングなどの処理を施すことにより生成される。生成された仮想視点画像はユーザ端末に伝送され、閲覧される。 BACKGROUND ART Recently, a technology that generates an image from a virtual viewpoint (hereinafter referred to as a virtual viewpoint image) using a plurality of images obtained by synchronously capturing images from a plurality of cameras from different positions has been attracting attention. According to this technology, for example, highlight scenes of soccer or basketball games can be viewed and viewed from various angles, so it is possible to provide the user with a higher sense of realism compared to normal images. A virtual viewpoint image based on a plurality of images is generated by collecting images taken by a plurality of cameras in an image processing unit such as a server, and performing processing such as rendering in the image processing unit. The generated virtual viewpoint image is transmitted to the user terminal and viewed.

特開２０１７－０９２５４５号公報JP2017-092545A

このような仮想視点画像を生成する技術について様々な手法が開発されている。例えば、撮影画像から前景と背景を分離し、複数の撮影画像から３次元座標を算出し、仮想視点画像を生成する技術がある。前景画像の分離の精度は仮想視点画像の画質に関わるため、前景画像の分離精度を高めることは、仮想視点画像の生成において重要な課題の一つである。前景画像の分離精度を向上するために、被写体までの距離に基づいた画素値で構成される距離情報画像を利用することが検討されている。距離情報画像とは、例えば、被写体までの距離を画素値とする距離画像（深度画像）、深度画像から抽出された前景領域を表す距離情報に基づく前景画像などである。しかし、一般的に、距離情報画像のフレームレートは、撮影画像のフレームレートよりも低い為、距離情報のフレームが不足した時刻において、輝度情報と同時刻の距離情報を使用することができない。特許文献１では、フレームレートの比率をもとに、過去の距離情報を用いて不足している時刻の距離情報のフレームを補間する技術が開示されている。しかしながら、特許文献１の技術は、ブロックマッチングのようなブロック移動の推定を行うものではなく、満足のいく補間精度が得られない。 Various techniques have been developed for generating such virtual viewpoint images. For example, there is a technology that separates the foreground and background from a captured image, calculates three-dimensional coordinates from a plurality of captured images, and generates a virtual viewpoint image. Since the accuracy of foreground image separation is related to the image quality of the virtual viewpoint image, improving the separation accuracy of the foreground image is one of the important issues in generating the virtual viewpoint image. In order to improve the separation accuracy of foreground images, the use of distance information images that are composed of pixel values based on the distance to the subject is being considered. The distance information image is, for example, a distance image (depth image) whose pixel value is the distance to the subject, a foreground image based on distance information representing a foreground region extracted from the depth image, or the like. However, since the frame rate of the distance information image is generally lower than the frame rate of the photographed image, it is not possible to use the distance information at the same time as the luminance information at the time when there is a shortage of frames of distance information. Patent Document 1 discloses a technique of interpolating a frame of missing distance information at a time using past distance information based on a frame rate ratio. However, the technique of Patent Document 1 does not estimate block movement like block matching, and cannot obtain satisfactory interpolation accuracy.

本開示の目的は、距離情報画像の補間精度を向上する技術を提供することである。 An object of the present disclosure is to provide a technique that improves the interpolation accuracy of distance information images.

本開示の一態様による画像処理装置は、撮影装置により第１のフレームレートで撮影された撮影画像を取得する第１取得手段と、前記第１のフレームレートよりも遅い第２のフレームレートで生成された、被写体までの距離に基づく画素値で構成される距離情報画像を取得する第２取得手段と、前記第２取得手段により取得される距離情報画像と、前記第１のフレームレートのフレーム時刻であって前記第２のフレームレートのフレーム時刻ではない補間対象フレーム時刻に対応する撮影画像と、に基づいて、前記補間対象フレーム時刻に対応する距離情報画像を生成する生成手段と、を有する。 An image processing device according to an aspect of the present disclosure includes a first acquisition unit that acquires a photographed image photographed by a photographing device at a first frame rate, and a second frame rate that is slower than the first frame rate. a distance information image acquired by the second acquisition means, and a frame time of the first frame rate; and a generation means for generating a distance information image corresponding to the interpolation target frame time based on a photographed image corresponding to an interpolation target frame time which is not a frame time of the second frame rate.

本開示によれば、距離情報画像の補間精度が向上する。 According to the present disclosure, the interpolation accuracy of distance information images is improved.

（ａ）は第１実施形態による画像処理システムの構成例を示す図、（ｂ）は第１実施形態による画像処理装置のハードウェア構成例を示すブロック図。1A is a diagram showing an example of the configuration of an image processing system according to the first embodiment, and FIG. 1B is a block diagram showing an example of the hardware configuration of the image processing apparatus according to the first embodiment. 第１実施形態に係る画像処理装置の機能構成例を示すブロック図。FIG. 1 is a block diagram illustrating an example of a functional configuration of an image processing apparatus according to a first embodiment. 第１実施形態に係る距離情報前景マスクのフレームレート補完を説明する図。FIG. 3 is a diagram illustrating frame rate complementation of the distance information foreground mask according to the first embodiment. 第１実施形態に係る補間画像の判定方法を説明する図。FIG. 3 is a diagram illustrating a method for determining an interpolated image according to the first embodiment. 第１実施形態に係るフレームレート変換手順を示すフローチャート。5 is a flowchart showing a frame rate conversion procedure according to the first embodiment. 第２実施形態に係る画像処理装置の機能構成例を示すブロック図。FIG. 3 is a block diagram showing an example of a functional configuration of an image processing device according to a second embodiment. 第２実施形態に係る距離情報のフレームレート補完を説明する図。FIG. 7 is a diagram illustrating frame rate complementation of distance information according to the second embodiment. 第２実施形態に係る動きベクトルの判定方法を説明する図。FIG. 7 is a diagram illustrating a motion vector determination method according to the second embodiment. 第２実施形態に係るフレームレート変換手順を示すフローチャート。7 is a flowchart showing a frame rate conversion procedure according to the second embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Note that the following embodiments do not limit the claimed invention. Although a plurality of features are described in the embodiments, not all of these features are essential to the invention, and the plurality of features may be arbitrarily combined. Furthermore, in the accompanying drawings, the same or similar components are designated by the same reference numerals, and redundant description will be omitted.

［第１実施形態］
以下、画像間で対応する位置をブロック単位で検出して移動先を推定する手法（ブロックマッチング法）を利用して、距離情報画像を高精度に補間する構成を説明する。また、距離情報画像の補間では、形状が似ている物体を区別しにくいため、補間のブロックマッチングの精度が低下する可能性がある。例えば、似た形状の複数の物体が周期的な間隔で移動しているような画像では、ブロックの移動先の推定を誤ることがあり、補間精度が低下してしまう。本実施形態では、そのような状況でも、距離情報画像を高い精度で補間することが可能な構成を説明する。なお、本開示において、距離情報画像とは、被写体までの距離に基づいた画素値で構成される画像を指す。そのような距離情報画像の例としては、各画素の値が被写体までの距離を表す深度画像、深度画像から抽出された前景領域を表す前景画像（距離情報に基づく前景画像）などがあげられる。第１実施形態では、距離情報に基づく前景画像の一例である、前景領域が１で、他の領域が０で表わされた前景画像（以下、距離情報前景マスクという）について補間精度を向上する技術について説明する。 [First embodiment]
Hereinafter, a configuration will be described in which distance information images are interpolated with high precision using a method (block matching method) of detecting corresponding positions between images in units of blocks and estimating a movement destination. Furthermore, in interpolation of distance information images, it is difficult to distinguish between objects with similar shapes, which may reduce the accuracy of block matching in interpolation. For example, in an image in which a plurality of objects with similar shapes move at periodic intervals, the destination of a block may be estimated incorrectly, resulting in a decrease in interpolation accuracy. In this embodiment, a configuration that can interpolate distance information images with high accuracy even in such a situation will be described. Note that in the present disclosure, a distance information image refers to an image configured of pixel values based on the distance to a subject. Examples of such distance information images include a depth image in which each pixel value represents the distance to a subject, a foreground image (a foreground image based on distance information) representing a foreground region extracted from the depth image, and the like. In the first embodiment, the interpolation accuracy is improved for a foreground image (hereinafter referred to as a distance information foreground mask) in which the foreground area is represented by 1 and other areas are represented by 0, which is an example of a foreground image based on distance information. Explain the technology.

図１Ａの（ａ）は、第１実施形態による画像処理システムの構成例を示す図である。カメラ群１９０は、輝度情報に基づく画像である撮影画像を撮影により取得する複数のカメラで構成される。なお、単色の輝度情報を表す画像が撮影される場合、撮影画像はモノクロ画像であり、例えばＲＧＢの三色に対応する輝度情報を表す画像が撮影される場合、撮影画像はカラー画像である。距離測定装置群１９１は、深度画像を取得する複数の距離測定装置で構成される。画像処理装置１００は、カメラ群１９０により撮影された撮影画像と距離測定装置群１９１により取得された深度画像に基づいて、仮想視点画像を生成する。表示装置１９２は、例えば、液晶ディスプレイを含み、画像処理装置１００が生成した仮想視点画像を表示する。 FIG. 1A (a) is a diagram showing a configuration example of an image processing system according to the first embodiment. The camera group 190 is composed of a plurality of cameras that capture images based on brightness information. Note that when an image representing brightness information of a single color is photographed, the photographed image is a monochrome image. For example, when an image representing brightness information corresponding to three colors of RGB is photographed, the photographed image is a color image. The distance measuring device group 191 is composed of a plurality of distance measuring devices that acquire depth images. The image processing device 100 generates a virtual viewpoint image based on the photographed image taken by the camera group 190 and the depth image acquired by the distance measuring device group 191. The display device 192 includes, for example, a liquid crystal display, and displays the virtual viewpoint image generated by the image processing device 100.

画像処理装置１００のハードウェア構成について、図１Ａの（ｂ）を用いて説明する。画像処理装置１００は、ＣＰＵ１１１、ＲＯＭ１１２、ＲＡＭ１１３、補助記憶装置１１４、表示制御部１１５、操作部１１６、通信Ｉ／Ｆ１１７、及びバス１１８を有する。 The hardware configuration of the image processing device 100 will be explained using (b) of FIG. 1A. The image processing device 100 includes a CPU 111 , a ROM 112 , a RAM 113 , an auxiliary storage device 114 , a display control section 115 , an operation section 116 , a communication I/F 117 , and a bus 118 .

ＣＰＵ１１１は、ＲＯＭ１１２やＲＡＭ１１３に格納されているコンピュータプログラムやデータを用いて画像処理装置１００の全体を制御することで、画像処理装置１００の各機能を実現する。なお、画像処理装置１００がＣＰＵ１１１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ１１１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ１１２は、変更を必要としないプログラムなどを格納する。ＲＡＭ１１３は、補助記憶装置１１４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ１１７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１１４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどの種々のデータを記憶する。 The CPU 111 implements each function of the image processing apparatus 100 by controlling the entire image processing apparatus 100 using computer programs and data stored in the ROM 112 and RAM 113. Note that the image processing device 100 may include one or more dedicated hardware different from the CPU 111, and the dedicated hardware may execute at least part of the processing by the CPU 111. Examples of specialized hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors). The ROM 112 stores programs that do not require modification. The RAM 113 temporarily stores programs and data supplied from the auxiliary storage device 114, data supplied from the outside via the communication I/F 117, and the like. The auxiliary storage device 114 is composed of, for example, a hard disk drive, and stores various data such as image data and audio data.

表示制御部１１５は、表示装置１９２と接続され、ユーザが画像処理装置１００を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）、仮想視点画像などを表示装置１９２に表示するよう制御する。操作部１１６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１１１に入力する。ＣＰＵ１１１は、表示制御部１１５を制御する表示制御部、及び操作部１１６を制御する操作制御部として動作する。 The display control unit 115 is connected to the display device 192 and controls the display device 192 to display a GUI (Graphical User Interface) for a user to operate the image processing device 100, a virtual viewpoint image, and the like. The operation unit 116 includes, for example, a keyboard, a mouse, a joystick, a touch panel, etc., and inputs various instructions to the CPU 111 in response to user operations. The CPU 111 operates as a display control unit that controls the display control unit 115 and an operation control unit that controls the operation unit 116.

通信Ｉ／Ｆ１１７は、画像処理装置１００の外部の装置との通信に用いられる。例えば、画像処理装置１００が外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ１１７に接続される。画像処理装置１００が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ１１７はアンテナを備える。本実施形態では、通信Ｉ／Ｆ１１７によりカメラ群１９０、距離測定装置群１９１の各カメラと画像処理装置１００が接続される。バス１１８は、画像処理装置１００の各部をつないで情報を伝達する。 The communication I/F 117 is used for communication with an external device of the image processing apparatus 100. For example, when the image processing apparatus 100 is connected to an external device by wire, a communication cable is connected to the communication I/F 117. When the image processing device 100 has a function of wirelessly communicating with an external device, the communication I/F 117 includes an antenna. In this embodiment, each camera of the camera group 190 and the distance measuring device group 191 and the image processing device 100 are connected through the communication I/F 117. A bus 118 connects each part of the image processing apparatus 100 and transmits information.

なお、本実施形態では操作部１１６が画像処理装置１００の内部に存在するものとするが、操作部１１６が画像処理装置１００の外部に別の装置として存在していてもよい。また、表示装置１９２が画像処理装置１００の内部に存在するようにしてもよい。 Note that in this embodiment, it is assumed that the operation unit 116 exists inside the image processing apparatus 100, but the operation unit 116 may exist outside the image processing apparatus 100 as a separate device. Further, the display device 192 may be located inside the image processing device 100.

＜機能の説明＞
図１Ｂは、第１実施形態に係る画像処理装置１００の機能構成例を示すブロック図である。各機能部は、プロセッサーとしてのＣＰＵ１１１が所定のプログラムを実行することにより実現されてもよいし、専用のハードウェアにより実現されてもよいし、プログラムを実行するプロセッサーと専用のハードウェアとの協働により実現されてもよい。 <Function description>
FIG. 1B is a block diagram showing an example of the functional configuration of the image processing apparatus 100 according to the first embodiment. Each functional unit may be realized by the CPU 111 as a processor executing a predetermined program, may be realized by dedicated hardware, or may be realized by cooperation between a processor that executes a program and dedicated hardware. It may also be realized by working.

画像処理装置１００は、記憶部１０１、背景生成部１０２、第１分離部１０３、映像生成部１０４、記憶部１０５、第２分離部１０６、補間部１０７、判定部１０８、補正部１０９、制御部１１０を備える。なお、画像処理装置１００には、カメラ群１９０からの撮影画像と距離測定装置群１９１からの深度画像が対になった画像データが入力される。撮影画像は、例えばＲＧＢの色情報（輝度情報）により構成される画像である。深度画像は、被写体までの距離を表す距離情報を画素値とする画像である。距離測定装置群１９１が用い得る深度画像の取得手法としては、ＬｉＤＡＲ（ＴＯＦ）、赤外線によるドットパターン照射、超音波等を用いた手法があげられるが、これに限られるものではない。なお、本実施形態では、撮影画像と深度画像を別々の装置から取得しているが、撮影画像と深度画像の両方を取得できるカメラが用いられてもよい。 The image processing device 100 includes a storage section 101, a background generation section 102, a first separation section 103, a video generation section 104, a storage section 105, a second separation section 106, an interpolation section 107, a determination section 108, a correction section 109, and a control section. 110. Note that image data in which a captured image from the camera group 190 and a depth image from the distance measuring device group 191 are paired is input to the image processing device 100. The photographed image is an image composed of, for example, RGB color information (luminance information). A depth image is an image whose pixel values are distance information representing the distance to the subject. Depth image acquisition methods that can be used by the distance measuring device group 191 include, but are not limited to, methods using LiDAR (TOF), infrared dot pattern irradiation, ultrasound, and the like. Note that in this embodiment, the photographed image and the depth image are acquired from separate devices, but a camera that can acquire both the photographed image and the depth image may be used.

記憶部１０１は、カメラ群１９０から入力された、撮影画像をフレームメモリに記憶する。背景生成部１０２は、逐次背景更新法を用いて、フレームメモリに記憶された撮影画像から背景画像を生成する。具体的には、背景生成部１０２は、複数枚の撮影画像から動きのある被写体を前景、静止している領域を背景として識別し、背景のみを抽出することにより背景画像を生成する。第１分離部１０３は、撮影画像と背景生成部１０２が生成した背景画像とを用いて、前景領域を特定する為の二値画像を、背景差分法により生成する。以下、この二値画像を、色情報前景マスクという。このとき、第１分離部１０３は、例えば、撮影画像から背景画像を減算した差分画像に対して所定の閾値を用いて二値化を行うことにより色情報前景マスクを生成する。本実施形態に係る色情報前景マスクでは、「１」が前景領域を、「０」が背景領域を示すものとするが、この逆であっても構わない。 The storage unit 101 stores captured images input from the camera group 190 in a frame memory. The background generation unit 102 generates a background image from the photographed images stored in the frame memory using a sequential background update method. Specifically, the background generation unit 102 identifies a moving subject as a foreground and a stationary area as a background from a plurality of captured images, and generates a background image by extracting only the background. The first separation unit 103 uses the photographed image and the background image generated by the background generation unit 102 to generate a binary image for specifying the foreground area by a background subtraction method. Hereinafter, this binary image will be referred to as a color information foreground mask. At this time, the first separation unit 103 generates a color information foreground mask by, for example, binarizing the difference image obtained by subtracting the background image from the photographed image using a predetermined threshold. In the color information foreground mask according to this embodiment, "1" indicates a foreground region and "0" indicates a background region, but the reverse may be possible.

記憶部１０５は、距離測定装置群１９１から入力された深度画像をフレームメモリに記憶する。第２分離部１０６は、深度画像を用いて、前景領域を「１」、背景領域を「０」として識別した二値画像（以下、距離情報前景マスク）を生成する。第２分離部１０６は、例えば、予め背景の深度画像を取得しておき、入力される深度画像と背景の深度画像の差分を所定の閾値で二値化することで距離情報前景マスクを生成する。なお、背景の深度画像の取得方法は、上記に限定されるものではない。例えば、逐次背景更新法を深度画像に適用することにより取得されてもよい。尚、深度画像の解像度と撮影画像の解像度が異なる場合は、撮影画像の解像度に合わせるように深度画像に対して解像度変換が行われる。この解像度変換は、深度画像に対して行われてもよいし、距離情報前景マスクに対して行われてもよい。解像度変換に用いられる補間方法としては、ニアレストネイバー、バイリニア、バイキュービック等のフィルタ演算を用いることができる。 The storage unit 105 stores the depth image input from the distance measuring device group 191 in a frame memory. The second separation unit 106 uses the depth image to generate a binary image (hereinafter referred to as a distance information foreground mask) in which the foreground region is identified as "1" and the background region as "0". The second separation unit 106, for example, obtains a background depth image in advance, and generates a distance information foreground mask by binarizing the difference between the input depth image and the background depth image using a predetermined threshold. . Note that the method for acquiring the background depth image is not limited to the above method. For example, it may be obtained by applying a sequential background update method to the depth image. Note that if the resolution of the depth image is different from the resolution of the captured image, resolution conversion is performed on the depth image to match the resolution of the captured image. This resolution conversion may be performed on the depth image or may be performed on the distance information foreground mask. As an interpolation method used for resolution conversion, filter calculations such as nearest neighbor, bilinear, bicubic, etc. can be used.

補間部１０７は、距離情報前景マスクを補間することにより、距離情報前景マスクのフレームレートを変換する。補間部１０７によるフレームレートの変換の詳細については図２を用いて後述する。判定部１０８は、撮影画像と背景画像を用いて補間部１０７による距離情報前景マスクのレート変換時に発生したエラーを判定する。補正部１０９は、距離情報前景マスクの、判定部１０８で判定されたエラーを補正する。判定部１０８による処理の詳細、および、補正部１０９による処理の詳細については図３を用いて後述する。制御部１１０は、撮影画像と等しいフレームレートを有する距離情報前景マスクを用いて、第１分離部１０３により実行される背景差分法で用いられる二値化閾値を最適化する。具体的には、制御部１１０は、距離情報前景マスクの領域における、背景差分法により前景領域を特定する為の二値化閾値を下げる。このように、距離情報から前景と推測された領域について二値化閾値を下げることにより、背景領域のノイズを検出することなく、前景領域の欠けを防ぐことができる。例えば、前景領域と背景領域の画素値が近い場合に、この最適化によって前景領域の欠けを防ぐことができる。 The interpolation unit 107 converts the frame rate of the distance information foreground mask by interpolating the distance information foreground mask. Details of frame rate conversion by the interpolation unit 107 will be described later using FIG. 2. The determination unit 108 determines an error that occurs when the interpolation unit 107 converts the rate of the distance information foreground mask using the photographed image and the background image. The correction unit 109 corrects the error determined by the determination unit 108 in the distance information foreground mask. The details of the processing by the determination unit 108 and the details of the processing by the correction unit 109 will be described later using FIG. 3. The control unit 110 optimizes the binarization threshold used in the background subtraction method executed by the first separation unit 103 using a distance information foreground mask having the same frame rate as the captured image. Specifically, the control unit 110 lowers the binarization threshold for identifying the foreground region using the background subtraction method in the region of the distance information foreground mask. In this way, by lowering the binarization threshold for the region estimated to be the foreground from the distance information, it is possible to prevent the foreground region from being missing without detecting noise in the background region. For example, when the pixel values of the foreground region and the background region are close, this optimization can prevent the foreground region from being missing.

映像生成部１０４は、撮影画像と背景画像と色情報前景マスクを用いて仮想視点画像を生成する。映像生成部１０４は、例えば、背景生成部１０２で生成された背景画像を２次元平面上に投影することによって、ユーザにより指定された視点（仮想視点）に対応する仮想視点画像の生成に必要な背景を生成する。例えば、映像生成部１０４は、第１分離部１０３が生成した色情報前景マスクの前景領域（画素値が「１」の画素群）に対応する撮影画像の領域を前景領域として撮影画像から抽出する。そして、映像生成部１０４は、抽出した前景領域に基づいて前景の３次元形状データ（３次元モデル）を生成する。映像生成部１０４は、生成した３次元モデルにユーザから指定された視点に応じたテクスチャをマッピングして上記の２次元平面上に投影することで仮想視点画像を生成する。 The video generation unit 104 generates a virtual viewpoint image using the photographed image, the background image, and the color information foreground mask. For example, the video generation unit 104 projects the background image generated by the background generation unit 102 onto a two-dimensional plane, thereby generating images necessary for generating a virtual viewpoint image corresponding to a viewpoint (virtual viewpoint) specified by the user. Generate a background. For example, the video generation unit 104 extracts from the captured image a region of the captured image that corresponds to the foreground region (pixel group with a pixel value of “1”) of the color information foreground mask generated by the first separation unit 103 as a foreground region. . Then, the video generation unit 104 generates three-dimensional shape data (three-dimensional model) of the foreground based on the extracted foreground region. The video generation unit 104 generates a virtual viewpoint image by mapping a texture corresponding to a viewpoint specified by the user to the generated three-dimensional model and projecting the texture onto the two-dimensional plane.

＜フレームレート変換の概要＞
本実施形態における距離情報のフレームレートの変換方法について図２を用いて説明する。尚、本実施形態では、撮影画像のフレームレートを３０Ｈｚ、深度画像のフレームレートを１５Ｈｚとして説明する。但し、本実施形態におけるフレームレートの比率はあくまで一例であって他の比率であっても構わない。 <Overview of frame rate conversion>
A method of converting the frame rate of distance information in this embodiment will be explained using FIG. 2. In this embodiment, the frame rate of the captured image is 30 Hz, and the frame rate of the depth image is 15 Hz. However, the frame rate ratio in this embodiment is just an example, and other ratios may be used.

図２において、２ａ、２ｂ、２ｃは、カメラ群１９０から入力された撮影画像であり、３０ＨＺで取得される３フレームの撮影画像である。撮影画像２ａ、２ｂ、２ｃの３フレームにわたって、被写体２０１が撮影画像内を向かって左から右に移動している。２ｄ、２ｅは、距離測定装置群１９１から入力された深度画像であり、１５Ｈｚで取得される２フレームが示されている。撮影画像と深度画像のフレームレートの比率が２：１である為、撮影画像２ｂのフレーム時刻Ｆ（ｎ＋１）に相当する深度画像が存在しない。なお、被写体２０２は、深度画像２ｄ，２ｅの２フレームにわたって、向かって左から右に画像内を移動している。 In FIG. 2, 2a, 2b, and 2c are photographed images input from the camera group 190, and are three frames of photographed images acquired at 30Hz. The subject 201 is moving from left to right within the captured images over three frames of captured images 2a, 2b, and 2c. 2d and 2e are depth images input from the distance measuring device group 191, and two frames acquired at 15 Hz are shown. Since the frame rate ratio of the photographed image and the depth image is 2:1, there is no depth image corresponding to frame time F(n+1) of the photographed image 2b. Note that the subject 202 is moving within the image from left to right over two frames of the depth images 2d and 2e.

２ｆ、２ｇは、前景領域を「１」背景領域を「０」とした距離情報前景マスクであり、第２分離部１０６により、深度画像２ｄ，２ｅから生成される。距離情報前景マスク２ｆは前景領域２０３を含み、距離情報前景マスク２ｇは前景領域２０４を含んでいる。制御部１１０は、これら距離情報前景マスク２ｆ、２ｇを用いて、二値化閾値の制御対象となる撮影画像の前景領域を特定する。しかし、フレーム時刻Ｆ（ｎ＋１）の撮影画像２ｂに対応する距離情報前景マスクが存在しない為、制御部１１０は、フレーム時刻Ｆ（ｎ＋１）の前景領域を距離情報から特定することができない。よって、フレーム時刻Ｆ（ｎ＋１）に対応する距離情報前景マスクを補間する必要がある。フレーム時刻（Ｆ＋１）は、距離情報前景マスクの補間を行うフレーム時刻であり、補間対象フレーム時刻の一つである。 2f and 2g are distance information foreground masks in which the foreground region is "1" and the background region is "0", and are generated by the second separation unit 106 from the depth images 2d and 2e. The distance information foreground mask 2f includes a foreground region 203, and the distance information foreground mask 2g includes a foreground region 204. The control unit 110 uses these distance information foreground masks 2f and 2g to specify the foreground region of the photographed image to be controlled by the binarization threshold. However, since there is no distance information foreground mask corresponding to the captured image 2b at frame time F(n+1), the control unit 110 cannot specify the foreground region at frame time F(n+1) from the distance information. Therefore, it is necessary to interpolate the distance information foreground mask corresponding to frame time F(n+1). The frame time (F+1) is a frame time at which the distance information foreground mask is interpolated, and is one of the interpolation target frame times.

２ｈは、補間部１０７が、入力された複数の深度画像から得られる距離情報前景マスクを用いて補間することにより生成した、フレーム時刻Ｆ（ｎ＋１）の距離情報前景マスクに相当する補間画像である。補間画像２ｈを用いることにより距離情報前景マスクのフレームレートは撮影画像と同じ３０Ｈｚになる。補間部１０７は、前景領域がフレーム間でどの方向にどの程度動いているかを示す動きベクトルを検出して補間画像を生成する。一般的な動きベクトル検出方法として、例えば、ブロックマッチング法がある。ブロックマッチング法の適用による補間部１０７の処理の一例を説明する。まず処理の対象となるフレーム時刻Ｆ（ｎ）の距離情報前景マスク２ｆ（基準画像）を所定のサイズに分割して基準ブロックを生成する。次に、フレーム時刻Ｆ（ｎ）よりも時間的に後のフレーム時刻Ｆ（ｎ＋２）の距離情報前景マスク２ｇにおいて、基準ブロックに対応する位置を起点として、基準ブロックと同サイズの参照ブロックを抽出する。次に、補間部１０７は、基準ブロックと参照ブロックのそれぞれ対応する全ての画素について絶対差分和（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ：ＳＡＤ）を算出する。補間部１０７は、このＳＡＤ演算を、距離情報前景マスク２ｇに設定される所定の探索範囲内で、参照ブロックを１画素ずつ移動させながら繰り返して行う。そして、補間部１０７は、上記ＳＡＤが極小を示した点（極小ＳＡＤ）の起点に対する座標を動きベクトルとして検出する。補間部１０７は、検出された動きベクトルを用いてフレーム時刻Ｆ（ｎ＋１）における、補間された距離情報前景マスク（補間画像２ｈ）を得る。なお、不必要に多くの極小ＳＡＤが検出されないよう、極小を示すＳＡＤのうち、閾値より小さい値を有するＳＡＤが極小ＳＡＤとして用いられてもよい。 2h is an interpolated image corresponding to the distance information foreground mask at frame time F(n+1), which is generated by the interpolation unit 107 by interpolating using the distance information foreground mask obtained from the plurality of input depth images. . By using the interpolated image 2h, the frame rate of the distance information foreground mask becomes 30 Hz, which is the same as that of the captured image. The interpolation unit 107 generates an interpolated image by detecting a motion vector indicating in which direction and by how much the foreground region moves between frames. As a general motion vector detection method, for example, there is a block matching method. An example of processing by the interpolation unit 107 by applying the block matching method will be described. First, the distance information foreground mask 2f (reference image) of the frame time F(n) to be processed is divided into predetermined sizes to generate reference blocks. Next, in the distance information foreground mask 2g of frame time F(n+2) temporally later than frame time F(n), a reference block of the same size as the reference block is extracted from the position corresponding to the reference block. do. Next, the interpolation unit 107 calculates the sum of absolute differences (SAD) for all corresponding pixels of the standard block and the reference block. The interpolation unit 107 repeatedly performs this SAD calculation while moving the reference block pixel by pixel within a predetermined search range set in the distance information foreground mask 2g. Then, the interpolation unit 107 detects the coordinates of the point where the SAD shows a minimum (minimum SAD) with respect to the starting point as a motion vector. The interpolation unit 107 uses the detected motion vector to obtain an interpolated distance information foreground mask (interpolated image 2h) at frame time F(n+1). Note that, among the SADs indicating a minimum value, an SAD having a value smaller than a threshold value may be used as the minimum SAD so that an unnecessarily large number of minimum SADs are not detected.

補間画像２ｈは、距離情報前景マスク２ｆの前景領域２０３と距離情報前景マスク２ｇの前景領域２０４から、ブロックマッチング法を用いて補間された前景領域２０５を含む。しかしながら、ブロックマッチング法を用いた動きベクトル検出方法において、表示画像に周期的パターンが存在する場合、この周期的パターンよりも基準ブロックのサイズが小さいと、ＳＡＤが極小となる位置が複数検出されてしまう。このように複数の位置の検出が発生すると、正しく動きベクトルを検出することができず、正確な補間画像を生成することができなくなる。この課題の解決方法を次に説明する。 The interpolated image 2h includes a foreground region 205 interpolated from the foreground region 203 of the distance information foreground mask 2f and the foreground region 204 of the distance information foreground mask 2g using a block matching method. However, in the motion vector detection method using the block matching method, when a periodic pattern exists in the displayed image, if the size of the reference block is smaller than the periodic pattern, multiple positions where the SAD is minimum are detected. Put it away. When multiple positions are detected in this way, a motion vector cannot be detected correctly and an accurate interpolated image cannot be generated. A method for solving this problem will be explained next.

＜補間画像の判定及び補正方法の概要＞
第１実施形態における、フレームレート変換で生成された補間画像における前景領域の妥当性の判定及び補正方法について図３を用いて説明する。 <Overview of interpolated image determination and correction method>
A method for determining validity and correcting a foreground region in an interpolated image generated by frame rate conversion in the first embodiment will be described with reference to FIG. 3.

図３において、３ａ、３ｂ、３ｃは、カメラ群１９０から入力された撮影画像であり、３０ＨＺで取得される３フレームの画像である。撮影画像３ａの被写体像３０１は、３フレームに渡って向かって左から右に画像内を移動する被写体Ａの画像である。撮影画像３ｂの被写体像３０２は、被写体Ａが撮影画像３ａから１フレームの期間で移動した画像である。撮影画像３ｃの被写体像３０３は、被写体Ａが撮影画像３ａから２フレームの期間で移動した画像である。また、撮影画像３ｃの被写体像３０４は、周期的パターンの様に被写体Ａから一定間隔離れて移動する被写体Ｂの画像である。 In FIG. 3, 3a, 3b, and 3c are captured images input from the camera group 190, and are three-frame images acquired at 30Hz. The subject image 301 of the photographed image 3a is an image of subject A moving from left to right within the image over three frames. The subject image 302 of the photographed image 3b is an image in which the subject A has moved from the photographed image 3a in one frame period. The subject image 303 of the photographed image 3c is an image in which the subject A has moved from the photographed image 3a in a period of two frames. Further, the subject image 304 of the photographed image 3c is an image of subject B moving at a constant distance from subject A in a periodic pattern.

３ｄ、３ｅは、第２分離部１０６によって生成された距離情報前景マスクである。距離情報前景マスク３ｄにおいて、前景領域３１１は、撮影画像３ａの被写体像３０１に対応する。フレーム時刻Ｆ（ｎ）の距離情報前景マスク３ｄにおいて、基準ブロック３１４は、フレーム時刻Ｆ（ｎ）の距離情報前景マスク３ｄを所定サイズに分割した基準ブロックの一つであり、被写体Ａを含む。フレーム時刻Ｆ（ｎ＋２）の距離情報前景マスク３ｅにおいて、前景領域３１２は撮影画像３ｃの被写体像３０３に対応し、前景領域３１３は撮影画像３ｃの被写体像３０４に対応する。距離情報前景マスク３ｅにおいて、参照ブロック３１５は、距離情報前景マスク３ｅにおいて基準ブロック３１４に関して探索された、ＳＡＤが極小となる参照ブロックの一つであり、被写体Ｂを含む。また、距離情報前景マスク３ｅの参照ブロック３１６は、基準ブロック３１４に関して探索された、ＳＡＤが極小となる参照ブロックの一つであり、被写体Ａを含む。このように、図３の例では、基準ブロック３１４と似たパターンを持つ参照ブロックが二つ出現している。即ち、動きベクトルとして、左方向に僅かに動くことを示す動きベクトルと右方向に大きく動くことを示す動きベクトルの２つが検出される。 3d and 3e are distance information foreground masks generated by the second separation unit 106. In the distance information foreground mask 3d, a foreground region 311 corresponds to the subject image 301 of the photographed image 3a. In the distance information foreground mask 3d at frame time F(n), the reference block 314 is one of the reference blocks obtained by dividing the distance information foreground mask 3d at frame time F(n) into predetermined sizes, and includes the subject A. In the distance information foreground mask 3e at frame time F(n+2), the foreground region 312 corresponds to the subject image 303 of the photographed image 3c, and the foreground region 313 corresponds to the subject image 304 of the photographed image 3c. In the distance information foreground mask 3e, the reference block 315 is one of the reference blocks searched for the reference block 314 in the distance information foreground mask 3e and has the minimum SAD, and includes the subject B. Further, the reference block 316 of the distance information foreground mask 3e is one of the reference blocks searched for with respect to the reference block 314 and has the minimum SAD, and includes the subject A. In this way, in the example of FIG. 3, two reference blocks with patterns similar to the reference block 314 appear. That is, two motion vectors are detected: a motion vector indicating a slight movement to the left and a motion vector indicating a large movement to the right.

３ｆは、補間部１０７において生成された、フレーム時刻Ｆ（ｎ＋１）に対応する距離情報前景マスクの補間画像である。フレーム時刻Ｆ（ｎ＋１）は、距離情報前景マスクが存在しないフレーム時刻であり、補間対象フレーム時刻である。本実施形態では、複数検出された動きベクトルの全てを用いて補間画像が生成される。したがって、補間画像３ｆにおいて、前景領域３２１は、検出された右方向の動きベクトルによって被写体Ａが右方向に動いた前景領域である。また、補間画像３ｆにおいて、前景領域３２２は、検出された左方向の動きベクトルによって被写体Ａが左方向に動いた前景領域である。 3f is an interpolated image of the distance information foreground mask corresponding to frame time F(n+1), generated by the interpolation unit 107. Frame time F(n+1) is a frame time when no distance information foreground mask exists, and is an interpolation target frame time. In this embodiment, an interpolated image is generated using all of the multiple detected motion vectors. Therefore, in the interpolated image 3f, the foreground region 321 is a foreground region where the subject A has moved to the right due to the detected rightward motion vector. Furthermore, in the interpolated image 3f, the foreground region 322 is a foreground region where the subject A has moved to the left due to the detected leftward motion vector.

３ｇ、３ｈ、３ｉは、それぞれ背景生成部１０２で生成される背景画像である。３ｊは、判定部１０８が、複数生成された前景マスクの妥当性を撮影画像と背景画像を用いて判定するための判定情報を示す判定画像である。判定部１０８は、補間対象フレーム時刻と同時刻であるフレーム時刻Ｆ（ｎ＋１）の撮影画像３ｂと、補間された距離情報前景マスクである補間画像３ｆとを重ね合わせ、補間画像３ｆの前景領域３２１，３２２に存在する、撮影画像３ｂの画素群を抽出する。判定画像３ｊの画素群３３１は、補間画像３ｆの前景領域３２１と重なる領域に存在する撮影画像３ｂの画素群である。同様に、判定画像３ｊの画素群３３２は、補間画像３ｆの前景領域３２２と重なる領域に存在する、撮影画像３ｂの画素群である。次に、判定部１０８は、フレーム時刻Ｆ（ｎ＋１）の背景画像３ｈと、補間画像３ｆを重ね合わせて、前景領域３２１，３２２に重なる背景画像３ｈの画素群を抽出する。画素群３３４は、補間画像３ｆの前景領域３２１と重なる領域に存在する背景画像３ｈの画素群である。また、画素群３３５は、補間画像３ｆの前景領域３２２と重なる領域に存在する背景画像３ｈの画素群である。 3g, 3h, and 3i are background images generated by the background generation unit 102, respectively. 3j is a determination image indicating determination information used by the determination unit 108 to determine the validity of a plurality of generated foreground masks using a captured image and a background image. The determination unit 108 superimposes the captured image 3b at frame time F(n+1), which is the same time as the interpolation target frame time, and the interpolated image 3f, which is the interpolated distance information foreground mask, and determines the foreground area 321 of the interpolated image 3f. , 322 of the photographed image 3b are extracted. A pixel group 331 of the determination image 3j is a pixel group of the photographed image 3b that exists in an area overlapping with the foreground area 321 of the interpolated image 3f. Similarly, a pixel group 332 of the determination image 3j is a pixel group of the photographed image 3b that exists in an area overlapping with the foreground area 322 of the interpolated image 3f. Next, the determination unit 108 superimposes the background image 3h at frame time F(n+1) and the interpolated image 3f, and extracts a group of pixels of the background image 3h that overlap the foreground regions 321 and 322. The pixel group 334 is a pixel group of the background image 3h that exists in an area overlapping with the foreground area 321 of the interpolated image 3f. Furthermore, the pixel group 335 is a pixel group of the background image 3h that exists in an area overlapping with the foreground area 322 of the interpolated image 3f.

次に、判定部１０８は、判定画像３ｊの画素群３３１，３３２と、背景画像３ｈの画素群３３４，３３５の各画素の値を比較する。判定部１０８は、この比較の結果、画素値の差が一定値を超える画素を補正画像３ｈにおける前景領域の画素と判定し、画素値の差が一定値以下の画素は前景領域の画素ではないと判定する。図３の例では、右方向の動きベクトルによって補間された補間画像３ｆの前景領域３２１の画素群が残り、前景領域３２２を構成していた画素群はマスクではないと判定され、前景領域３２２が除外されている。 Next, the determination unit 108 compares the values of each pixel in the pixel groups 331 and 332 of the determination image 3j and the pixel groups 334 and 335 of the background image 3h. As a result of this comparison, the determining unit 108 determines that pixels whose pixel values have a difference exceeding a certain value are pixels in the foreground region in the corrected image 3h, and pixels whose pixel values have a difference below a certain value are not pixels in the foreground region. It is determined that In the example of FIG. 3, the pixel group of the foreground area 321 of the interpolated image 3f interpolated by the rightward motion vector remains, and the pixel group that constituted the foreground area 322 is determined to be not a mask, and the foreground area 322 is Excluded.

ここで、判定部１０８が、各画素を前景領域の画素として残すか否かを判定するために用いる式を以下の式（１）に示す。式（１）の不等式が偽の場合に「誤検出（前景領域を構成する画素ではない）」と判定され、真の場合に「正解（前景領域を構成する画素である）」と判定される。 Here, the equation used by the determining unit 108 to determine whether to leave each pixel as a pixel in the foreground area is shown in the following equation (1). If the inequality in equation (1) is false, it is determined to be a "false detection (the pixel does not constitute the foreground region)", and if it is true, it is determined to be "correct (the pixel constitutes the foreground region)". .

・Ｂｇ（ｘ，ｙ）は、座標（ｘ，ｙ）に対応した背景画像の画素値を示す。
・Ｉｎ（ｘ，ｙ）は、座標（ｘ，ｙ）に対応した撮影画像の画素値を示す。
・Ｘは、後述するＭ（ｘ，ｙ）の画素値が「１」即ち前景である時のｘ座標値。
・Ｙは、後述するＭ（ｘ，ｙ）の画素値が「１」即ち前景である時のｙ座標値。
・Ｍ（ｘ，ｙ）は、座標（ｘ，ｙ）に対応した補間画像の画素値（二値）。
・αは、背景画像と撮影画像との類似の程度を示すパラメータであって、例えば０～５の値が設定される。

-Bg(x,y) indicates the pixel value of the background image corresponding to the coordinates (x,y).
-In(x,y) indicates the pixel value of the captured image corresponding to the coordinates (x,y).
-X is the x coordinate value when the pixel value of M(x, y), which will be described later, is "1", that is, the foreground.
- Y is the y coordinate value when the pixel value of M(x, y), which will be described later, is "1", that is, the foreground.
- M(x, y) is the pixel value (binary) of the interpolated image corresponding to the coordinates (x, y).
- α is a parameter indicating the degree of similarity between the background image and the photographed image, and is set to a value of 0 to 5, for example.

３ｋは、判定部１０８による判定結果に基づいて補正部１０９が補間画像３ｆを補正した結果得られる、補正後の補間画像である。補正部１０９は、判定部１０８で誤検出と判定された前景領域を削除する。こうして、補正後の補間画像３ｋは、前景領域３４１のみを有する画像となる。尚、上記補間処理において、被写体の動きが検出されなかった場合は、動きベクトルを用いた補間画像の生成を行わず、補間対象フレーム時刻の前後の深度画像のいずれかを補間画像として用いるようにしてもよい。被写体の動きが検出されなかった場合とは、例えば、ブロックの移動量が閾値より小さい場合である。 3k is a corrected interpolated image obtained as a result of the correction unit 109 correcting the interpolated image 3f based on the determination result by the determining unit 108. The correction unit 109 deletes the foreground area determined to be erroneously detected by the determination unit 108. In this way, the interpolated image 3k after correction becomes an image having only the foreground area 341. In addition, in the above interpolation process, if no movement of the subject is detected, an interpolated image is not generated using the motion vector, and one of the depth images before and after the interpolation target frame time is used as the interpolated image. It's okay. The case where the movement of the subject is not detected is, for example, the case where the amount of movement of the block is smaller than the threshold value.

＜処理手順＞
続いて、図４のフローチャートを参照して、第１実施形態に係る画像処理装置１００の処理手順を説明する。画像処理装置１００は外部から撮影画像を受信することで、図４に示される処理を開始する。 <Processing procedure>
Next, the processing procedure of the image processing apparatus 100 according to the first embodiment will be described with reference to the flowchart in FIG. 4. The image processing device 100 starts the processing shown in FIG. 4 by receiving a captured image from the outside.

Ｓ１０１において、背景生成部１０２は、逐次背景更新法を用いて、撮影画像から背景画像を生成する。Ｓ１０２において、画像処理装置１００は、処理対象のフレーム時刻に対応する深度画像が受信されているかを判断する。処理対象のフレーム時刻に対応する深度画像が受信されている場合（Ｓ１０２でＹＥＳ）、Ｓ１０３において、第２分離部１０６は、受信された深度画像を用いて距離情報前景マスクを生成する。距離情報前景マスクは、前景領域を「１」、背景領域を「０」として識別する二値画像である。一方、処理対象のフレーム時刻に対応する深度画像が受信されていない場合（Ｓ１０２でＮＯ）、Ｓ１０４において、補間部１０７は、当該フレーム時刻（補間対象フレーム時刻）に対応する距離情報前景マスクの補間画像を生成する。これにより、距離情報前景マスクのフレームレートが撮影画像のフレームレートに変換される。補間画像が生成されると、判定部１０８と補正部１０９により補間画像の補正処理が行われる（Ｓ１０５～Ｓ１０７）。 In S101, the background generation unit 102 generates a background image from the captured image using a sequential background update method. In S102, the image processing apparatus 100 determines whether a depth image corresponding to the frame time to be processed has been received. If a depth image corresponding to the frame time to be processed has been received (YES in S102), in S103, the second separation unit 106 generates a distance information foreground mask using the received depth image. The distance information foreground mask is a binary image that identifies the foreground region as "1" and the background region as "0". On the other hand, if the depth image corresponding to the frame time to be processed has not been received (NO in S102), in S104, the interpolation unit 107 interpolates the distance information foreground mask corresponding to the frame time (frame time to be interpolated). Generate an image. As a result, the frame rate of the distance information foreground mask is converted to the frame rate of the photographed image. Once the interpolated image is generated, the determining unit 108 and the correcting unit 109 perform correction processing on the interpolated image (S105 to S107).

Ｓ１０５～Ｓ１０７の処理を行うループでは、Ｓ１０４で生成された補間画像を、補間対象フレーム時刻の撮影画像と背景画像に基づいて補正する処理が実行される。まず、Ｓ１０５において、判定部１０８は、補間画像の処理対象の画素値が「１」か否か、即ち前景領域の画素であるか否かを判定する。前景領域の画素ではないと判定された場合（Ｓ１０５でＮＯ）、次の画素の処理に進む。前景領域の画素であると判定された場合（Ｓ１０５でＥＹＳ）、処理はＳ１０６に進む。Ｓ１０６において、判定部１０８は、上述した式（１）を用いて、当該画素が前景領域として妥当か否かを判定する。式（１）が満たされる場合（Ｓ１０６でＹＥＳ）、当該画素は前景領域で妥当であるとして、処理は次の画素に進む。他方、式（１）が満たされない場合（Ｓ１０６でＮＯ）、当該画素は前景領域として不適当であるとして、処理はＳ１０７に進む。Ｓ１０７において、補正部１０９は、補間画像における現在処理対象の画素の値を「０」に設定する。補間画像に存在する前景領域の全画素についてＳ１０５～Ｓ１０７の処理が繰り返され、補正された補間画像が得られる。 In the loop that performs the processes of S105 to S107, a process of correcting the interpolated image generated in S104 based on the captured image at the interpolation target frame time and the background image is executed. First, in S105, the determination unit 108 determines whether the pixel value of the interpolated image to be processed is "1", that is, whether it is a pixel in the foreground area. If it is determined that the pixel is not in the foreground area (NO in S105), the process proceeds to the next pixel. If it is determined that the pixel is in the foreground area (YES in S105), the process proceeds to S106. In S106, the determination unit 108 determines whether the pixel is appropriate as a foreground region using the above-mentioned equation (1). If formula (1) is satisfied (YES in S106), the pixel is determined to be valid in the foreground region, and the process proceeds to the next pixel. On the other hand, if equation (1) is not satisfied (NO in S106), the pixel is determined to be inappropriate as a foreground region, and the process proceeds to S107. In S107, the correction unit 109 sets the value of the pixel currently being processed in the interpolated image to "0". The processes of S105 to S107 are repeated for all pixels in the foreground area existing in the interpolated image, and a corrected interpolated image is obtained.

Ｓ１０８において、制御部１１０は、撮影画像のフレーム時刻に対応する距離情報前景マスクまたは補正部１０９から得られる補正された補間画像である距離情報前景マスクを用いて、第１分離部１０３で用いられる、背景差分法の二値化閾値を最適化する。第１分離部１０３は、最適化された二値化閾値を用いて、撮影画像と背景画像から前景領域を特定する為の二値画像（色情報前景マスク）を背景差分法により生成する。Ｓ１０９において、映像生成部１０４は、撮影画像と背景画像と色情報前景マスクを用いて、任意に指定された仮想視点からの仮想視点画像を生成する。 In S108, the control unit 110 uses the distance information foreground mask corresponding to the frame time of the photographed image or the distance information foreground mask that is the corrected interpolated image obtained from the correction unit 109 to be used in the first separation unit 103. , optimize the binarization threshold of the background subtraction method. The first separation unit 103 uses the optimized binarization threshold to generate a binary image (color information foreground mask) for specifying a foreground region from the photographed image and the background image by a background subtraction method. In S109, the video generation unit 104 generates a virtual viewpoint image from an arbitrarily designated virtual viewpoint using the photographed image, the background image, and the color information foreground mask.

以上、述べたように、第１実施形態によれば、動きベクトルを検出して距離情報マスクを補間することにより、撮影画像の各フレーム時刻に対応した距離情報マスクを得ることができる。また、補間された距離情報前景マスクは、補間対象フレーム時刻に対応する撮影画像とその撮影画像から生成される背景画像とに基づいてその妥当性が判定され、必要に応じて補正される。そのため、ブロックマッチング法において距離の情報のみではブロックの移動先を正確に特定できないような場合でも、より正確に移動先を特定できる。従って、例えば、似た形状の複数の物体が周期的な間隔で移動している場合でも、正しく距離情報マスクの移動先を推定することが可能となる。 As described above, according to the first embodiment, a distance information mask corresponding to each frame time of a captured image can be obtained by detecting a motion vector and interpolating the distance information mask. Further, the validity of the interpolated distance information foreground mask is determined based on the photographed image corresponding to the interpolation target frame time and the background image generated from the photographed image, and is corrected as necessary. Therefore, even when the block matching method cannot accurately specify the destination of a block using distance information alone, the destination can be specified more accurately. Therefore, for example, even if a plurality of objects with similar shapes are moving at periodic intervals, it is possible to accurately estimate the destination of the distance information mask.

［第２実施形態］
第１実施形態では、距離情報前景マスクを補間することにより、距離情報前景マスクのフレームレート変換を実施した。しかし、フレームレート変換の対象は、距離情報前景マスクに限られるものではなく、距離測定装置から入力された距離情報（深度画像）であってもよい。距離情報前景マスクを構成する画素の値は二値であるのに対して、距離情報を構成する深度画像の画素の値は多ビット値である。このため、フレーレート変換処理の対象を距離情報にすることでブロックマッチング法による類似パターン探索時の探索精度が向上することが期待できる。また、第１実施形態では、補間された距離情報前景マスクに対して妥当性を判定したが、動きベクトルの算出結果に対して妥当性を判定してもよい。第２実施形態では、動きベクトルに対して判定を行って距離情報に対してフレームレート変換を実施する構成を説明する。 [Second embodiment]
In the first embodiment, the frame rate conversion of the distance information foreground mask is performed by interpolating the distance information foreground mask. However, the target of frame rate conversion is not limited to the distance information foreground mask, but may also be distance information (depth image) input from a distance measuring device. The values of the pixels forming the distance information foreground mask are binary values, whereas the values of the pixels of the depth image forming the distance information are multi-bit values. Therefore, by using distance information as the object of frame rate conversion processing, it can be expected that the search accuracy when searching for similar patterns using the block matching method will be improved. Further, in the first embodiment, the validity is determined for the interpolated distance information foreground mask, but the validity may be determined for the motion vector calculation result. In the second embodiment, a configuration will be described in which a determination is made on a motion vector and frame rate conversion is performed on distance information.

＜機能概略説明＞
図５は、第２実施形態に係る画像処理装置１００の概略構成図である。図５において、第１実施形態（図１Ｂ）の機能ブロックと類似した機能ブロックには、共通の参照番号が付されている。第２実施形態の画像処理装置１００では、補間部５００が記憶部１０５と第２分離部１０６との間に配置されている。補間部５００は、深度画像に対してフレームレート変換を行う。補間部５００が行うるフレームレート変換の詳細は図６を用いて後述する。補間部５００は、算出部５０１、動きベクトル生成部５０２、判定部５０３、補正部５０４、フレーム補間部５０５を備える。 <Summary explanation of functions>
FIG. 5 is a schematic configuration diagram of an image processing apparatus 100 according to the second embodiment. In FIG. 5, functional blocks similar to those of the first embodiment (FIG. 1B) are given common reference numbers. In the image processing device 100 of the second embodiment, the interpolation section 500 is arranged between the storage section 105 and the second separation section 106. The interpolation unit 500 performs frame rate conversion on the depth image. Details of the frame rate conversion performed by the interpolation unit 500 will be described later using FIG. 6. The interpolation unit 500 includes a calculation unit 501, a motion vector generation unit 502, a determination unit 503, a correction unit 504, and a frame interpolation unit 505.

算出部５０１は、ブロックマッチング法による以下の処理を行う。算出部５０１は、処理の対象となるフレーム時刻Ｆ（ｎ）の深度画像（基準画像）を所定のサイズに分割して基準ブロックを生成する。次に、算出部５０１は、基準画像よりも時間的に後のフレーム時刻Ｆ（ｎ＋２）の深度画像（参照画像）において、基準ブロックに対応する位置を起点として、基準ブロックと同サイズの参照ブロックを抽出する。次に、算出部５０１は、基準ブロックと参照ブロックのそれぞれ対応する全ての画素について絶対差分和（ＳＡＤ）を算出する。算出部５０１は、このＳＡＤ演算を、参照画像に設定される所定の探索範囲内で、参照ブロックを１画素ずつ移動させながら繰り返す。算出部５０１は、算出されたＳＡＤが極小（極小ＳＡＤ）を示した参照ブロックの起点に対する座標を取得する。表示画像に周期的なパターンが存在するなどの理由により、極小ＳＡＤを示す参照ブロックの起点の座標（以下、極小ＳＡＤの座標）が複数存在した場合は、算出部５０１は、それら複数の座標を取得する。なお、不必要に多くの極小ＳＡＤが検出されないよう、極小を示すＳＡＤのうち、閾値より小さい値を有するＳＡＤが極小ＳＡＤとして用いられてもよい。 The calculation unit 501 performs the following processing using the block matching method. The calculation unit 501 divides the depth image (reference image) at frame time F(n) to be processed into predetermined sizes to generate reference blocks. Next, in the depth image (reference image) at a frame time F(n+2) temporally later than the reference image, the calculation unit 501 calculates a reference block having the same size as the reference block, starting from the position corresponding to the reference block. Extract. Next, the calculation unit 501 calculates the sum of absolute differences (SAD) for all corresponding pixels of the standard block and the reference block. The calculation unit 501 repeats this SAD calculation while moving the reference block pixel by pixel within a predetermined search range set in the reference image. The calculation unit 501 acquires coordinates with respect to the starting point of the reference block whose calculated SAD is minimum (minimum SAD). If there are multiple coordinates of the starting point of the reference block indicating the minimum SAD (hereinafter referred to as coordinates of the minimum SAD) due to the presence of a periodic pattern in the displayed image, the calculation unit 501 calculates the coordinates of the multiple coordinates. get. Note that, among the SADs indicating a minimum value, an SAD having a value smaller than a threshold value may be used as the minimum SAD so that an unnecessarily large number of minimum SADs are not detected.

動きベクトル生成部５０２は、基準ブロックの座標を始点とし、取得された極小ＳＡＤの座標を終点とする場合の、フレーム時刻Ｆ（ｎ＋１）に対応する補間画像における該基準ブロックの動きベクトルを生成する。この動きベクトルは、基準ブロック毎に生成される。判定部５０３は、撮影画像とその撮影画像から得られる背景画像とを使って動きベクトルの妥当性を判定する。判定部５０３による妥当性の判定の詳細は図７を用いて後述する。補正部５０４は、判定部５０３の判定結果に基づいて、動きベクトル生成部５０２により生成された動きベクトルから正解のベクトルを選択する。フレーム補間部５０５は、フレーム時刻Ｆ（ｎ）の基準画像とフレーム時刻Ｆ（ｎ＋２）の参照画像と選択された動きベクトルを用いてフレーム時刻Ｆ（ｎ＋１）の補間画像を生成する。 The motion vector generation unit 502 generates a motion vector of the reference block in the interpolated image corresponding to frame time F(n+1), where the coordinates of the reference block are the starting point and the coordinates of the acquired minimum SAD are the end point. . This motion vector is generated for each reference block. The determination unit 503 determines the validity of the motion vector using a captured image and a background image obtained from the captured image. Details of the validity determination by the determination unit 503 will be described later using FIG. The correction unit 504 selects a correct vector from the motion vectors generated by the motion vector generation unit 502 based on the determination result of the determination unit 503. The frame interpolation unit 505 generates an interpolated image at frame time F(n+1) using the reference image at frame time F(n), the reference image at frame time F(n+2), and the selected motion vector.

＜フレームレート変換の概要＞
第２実施形態における距離情報の画像のフレームレート変換方法について図６を用いて説明する。 <Overview of frame rate conversion>
A method of converting the frame rate of an image of distance information in the second embodiment will be described using FIG. 6.

図６において、６ａ、６ｂ、６ｃは、カメラ群１９０から得られる撮影画像であり、３０ＨＺで得られる３フレームの画像である。撮影画像６ａ～６ｃの３フレームにわたって、被写体が向かって左から右に画像内を移動している。６ｄ、６ｅは、距離測定装置群１９１から１５Ｈｚで入力さる、２フレームの深度画像である。深度画像が存在しないフレーム時刻Ｆ（ｎ＋１）が補間対象フレーム時刻である。６ｆは、補間部５００が２フレームの深度画像を用いて生成した補間画像である。補間画像６ｆは、補間対象フレーム時刻であるフレーム時刻Ｆ（ｎ＋１）に相当する補間画像である。取得さる深度画像と補間画像から距離情報前景マスクを生成することにより、距離情報前景マスクのフレームレートは撮影画像と同じ３０Ｈｚになる。６ｇ、６ｈ、６ｉは、第２分離部１０６により生成される距離情報前景マスクである。距離情報前景マスク６ｇ，６ｉは深度画像６ｄ、６ｅから生成され、距離情報前景マスク６ｈは、補間画像６ｆから生成される。距離情報前景マスクは、被写体即ち前景領域を「１」とし、背景領域を「０」とした二値画像である。尚、被写体の動きが検出されなかった場合、動きベクトルを用いた補間画像の生成を行わず、補間対象フレーム時刻の前後の深度画像のいずれかを補間画像として用いるようにしてもよい。被写体が動いたか否かは、例えば、２つの深度画像から検出される動きベクトルに基づいて判定され得る。例えば、動きベクトルにより表される移動量が閾値より小さい場合、被写体の動きが検出されていないと判定される。 In FIG. 6, 6a, 6b, and 6c are photographed images obtained from the camera group 190, and are three frame images obtained at 30Hz. The subject moves within the image from left to right over three frames of captured images 6a to 6c. 6d and 6e are two-frame depth images input at 15 Hz from the distance measuring device group 191. The frame time F(n+1) at which no depth image exists is the interpolation target frame time. 6f is an interpolated image generated by the interpolation unit 500 using two frames of depth images. The interpolated image 6f is an interpolated image corresponding to frame time F(n+1), which is the interpolation target frame time. By generating a distance information foreground mask from the acquired depth image and interpolated image, the frame rate of the distance information foreground mask becomes 30 Hz, which is the same as that of the captured image. 6g, 6h, and 6i are distance information foreground masks generated by the second separation unit 106. The distance information foreground masks 6g and 6i are generated from the depth images 6d and 6e, and the distance information foreground mask 6h is generated from the interpolated image 6f. The distance information foreground mask is a binary image in which the subject, that is, the foreground region is set to "1" and the background region is set to "0". Note that if no movement of the subject is detected, the interpolation image may not be generated using the motion vector, and either depth images before or after the interpolation target frame time may be used as the interpolation image. Whether or not the subject has moved can be determined, for example, based on motion vectors detected from two depth images. For example, if the amount of movement represented by the motion vector is smaller than the threshold value, it is determined that no movement of the subject has been detected.

＜動きベクトルの補正の概要＞
第２実施形態における、複数生成された動きベクトルの補正の概要を、図７を用いて説明する。 <Overview of motion vector correction>
An overview of correction of a plurality of generated motion vectors in the second embodiment will be described using FIG. 7.

７ａ、７ｂ、７ｃは、カメラ群１９０から３０Ｈｚで入力された、３フレームの撮影画像である。撮影画像７ａ～７ｃは、第１実施形態で例示された撮影画像（図３）と同じである。７ｄ、７ｅは、距離測定装置群１９１から１５Ｈｚで入力された、２フレームの深度画像である。深度画像７ｄにおける基準ブロック７０１は、動きベクトルの算出のために基準画像に設定される基準ブロックの例である。深度画像７ｅに示されている参照ブロック７０２と参照ブロック７０３は、基準ブロック７０１に対して極小ＳＡＤが検出された参照画像上の参照ブロックの例である。これらの基準ブロックと参照ブロックは、算出部５０１で算出、生成される。 7a, 7b, and 7c are three frames of captured images input from the camera group 190 at 30 Hz. The photographed images 7a to 7c are the same as the photographed images (FIG. 3) exemplified in the first embodiment. 7d and 7e are two-frame depth images input from the distance measuring device group 191 at 15 Hz. The reference block 701 in the depth image 7d is an example of a reference block set in the reference image for motion vector calculation. A reference block 702 and a reference block 703 shown in the depth image 7e are examples of reference blocks on the reference image in which minimal SAD is detected with respect to the reference block 701. These standard blocks and reference blocks are calculated and generated by the calculation unit 501.

動きベクトル情報７ｆは、動きベクトル生成部５０２で生成された動きベクトルを、補間画像の二次元座標に配置した情報である。位置７１１は、深度画像７ｄにおける基準ブロック７０１の位置を示す。動きベクトル７１２は、位置７１１を始点とした移動方向と移動量を表しており、基準ブロック７０１が深度画像７ｅの参照ブロック７０２の位置まで動く場合の、フレーム時刻Ｆ（ｎ＋１）における基準ブロック７０１の推定された移動先を示す。参照領域７１３は、動きベクトル７１２により基準ブロック７０１が移動した状態を示す。同様に、動きベクトル７１４は、位置７１１を始点とした移動方向と移動量を表しており、基準ブロック７０１が深度画像７ｅの参照ブロック７０３まで動く場合の、基準ブロック７０１のフレーム時刻Ｆ（ｎ＋１）における推定された移動先を示す。参照領域７１５は、動きベクトル７１４により基準ブロック７０１が移動した状態を示す。図示の例では、１つの基準ブロック７０１に２つの移動先（参照領域７１３，７１５）が得られている。 The motion vector information 7f is information in which the motion vectors generated by the motion vector generation unit 502 are arranged at two-dimensional coordinates of the interpolated image. A position 711 indicates the position of the reference block 701 in the depth image 7d. The motion vector 712 represents the direction and amount of movement starting from the position 711, and indicates the movement direction and amount of movement of the reference block 701 at frame time F(n+1) when the reference block 701 moves to the position of the reference block 702 in the depth image 7e. Indicates the estimated destination. The reference area 713 indicates a state in which the reference block 701 has been moved by the motion vector 712. Similarly, the motion vector 714 represents the direction and amount of movement starting from the position 711, and the frame time F(n+1) of the reference block 701 when the reference block 701 moves to the reference block 703 of the depth image 7e. Indicates the estimated destination. A reference area 715 indicates a state in which the reference block 701 has been moved by the motion vector 714. In the illustrated example, two movement destinations (reference areas 713 and 715) are obtained for one reference block 701.

７ｇ、７ｈ、７ｉは、背景生成部１０２で生成された背景画像である。背景画像７ｇ、７ｈ、７ｉは、それぞれ撮影画像７ａ、７ｂ、７ｃから得られる背景画像である。フレーム時刻Ｆ（ｎ＋１）に対応する背景画像７ｈにおいて、画素群７２１は、参照領域７１３に存在する画素群を示し、画素群７２２は、参照領域７１５に存在する画素群を示す。 7g, 7h, and 7i are background images generated by the background generation unit 102. Background images 7g, 7h, and 7i are background images obtained from photographed images 7a, 7b, and 7c, respectively. In the background image 7h corresponding to frame time F(n+1), a pixel group 721 indicates a pixel group existing in the reference area 713, and a pixel group 722 indicates a pixel group existing in the reference area 715.

判定情報７ｊは、判定部５０３が、動きベクトル生成部５０２が生成した動きベクトルについて妥当性を判定するのに用いる情報を図示したものである。判定部５０３は、補間対象フレーム時刻であるフレーム時刻Ｆ（ｎ＋１）の撮影画像７ｂと、フレーム時刻Ｆ（ｎ＋１）の動きベクトル情報７ｆを重ね合わせて、参照領域７１３，７１５の範囲内に存在する、撮影画像７ｂの画素群を特定する。判定情報７ｊの画素群７３１は、参照領域７１３の範囲内に存在する撮影画像７ｂの画素群である。判定情報７ｊの画素群７３２は、参照領域７１５の範囲内に存在する撮影画像７ｂの画素群である。次に、判定部５０３は、検出された領域において、撮影画像７ｂの画素群と背景画像７ｈの画素群との類似度を算出する。例えば、判定部５０３は、画素群７３１と画素群７２１の類似度、画素群７３２と画素群７２２の類似度を算出する。判定部５０３は、１つの基準ブロックについて複数の動きベクトルが存在する場合には、最も低い類似度が算出された参照領域に対応する動きベクトルを妥当な動きベクトル（正解の動きベクトル）と判定する。例えば、判定情報７ｊの画素群７３１と背景画像７ｈの画素群７２１は、ほぼ一致する為、高い類似度が算出される。一方、判定情報７ｊの画素群７３２と背景画像７ｈの画素群７２２は、ほとんど一致しない為、低い類似度が算出される。よって、判定部５０３は、右方向の動きベクトル７１４が正解であると判定する。 The determination information 7j is an illustration of information used by the determination unit 503 to determine the validity of the motion vector generated by the motion vector generation unit 502. The determination unit 503 superimposes the captured image 7b at frame time F(n+1), which is the interpolation target frame time, and the motion vector information 7f at frame time F(n+1), and determines whether the captured image 7b exists within the reference areas 713, 715. , specify the pixel group of the photographed image 7b. A pixel group 731 of the determination information 7j is a pixel group of the photographed image 7b that exists within the range of the reference area 713. A pixel group 732 of the determination information 7j is a pixel group of the photographed image 7b that exists within the reference area 715. Next, the determination unit 503 calculates the degree of similarity between the pixel group of the captured image 7b and the pixel group of the background image 7h in the detected area. For example, the determination unit 503 calculates the similarity between the pixel group 731 and the pixel group 721 and the similarity between the pixel group 732 and the pixel group 722. If a plurality of motion vectors exist for one reference block, the determination unit 503 determines the motion vector corresponding to the reference region for which the lowest degree of similarity has been calculated to be an appropriate motion vector (correct motion vector). . For example, since the pixel group 731 of the determination information 7j and the pixel group 721 of the background image 7h almost match, a high degree of similarity is calculated. On the other hand, since the pixel group 732 of the determination information 7j and the pixel group 722 of the background image 7h hardly match, a low degree of similarity is calculated. Therefore, the determining unit 503 determines that the rightward motion vector 714 is correct.

ここで、判定部５０３による類似度の算出方法の例を式（２）に示す。式（２）により算出されるＳは、値が小さいほど類似度が高いことを表す。判定部５０３は、検出された複数の動きベクトルに対応した参照ブロックに対して、下記の類似度Ｓの算出を行い、もっとも算出値が大きい即ち類似度の低い参照領域に対応した動きベクトルを正解の動きベクトルと判定する。 Here, an example of a method for calculating the degree of similarity by the determination unit 503 is shown in equation (2). The smaller the value of S calculated by Equation (2), the higher the degree of similarity. The determination unit 503 calculates the similarity S as described below for the reference blocks corresponding to the plurality of detected motion vectors, and determines the motion vector corresponding to the reference area with the largest calculated value, that is, the lowest similarity, as the correct answer. It is determined that the motion vector is

・Ｓは、背景画像と撮影画像に存在する双方の参照ブロックの類似度を示す。計算される値Ｓが大きい程、類似度が低い。
・Ｂｇ（Ｘｎ，Ｙｎ）は、座標（Ｘｎ，Ｙｎ）に対応した背景画像の画素値を示す。
・Ｉｎ（Ｘｎ，Ｙｎ）は、座標（Ｘｎ，Ｙｎ）に対応した撮影画像の画素値を示す。
・Ｘｎは、参照領域の範囲に存在するｎ番目の画素のＸ座標値である。
・Ｙｎは、参照領域の範囲に存在するｎ番目の画素のＹ座標値である。
・ｎは、参照領域の範囲に存在する画素の番号を示す。値は、０～Ｎｍａｘ－１（参照領域を構成する全画素数－１）の値をとる。例えば、参照領域が１６×１６のサイズの場合、Ｎｍａｘの値は２５６となり、ｎの値は０～２５５となる。

- S indicates the degree of similarity between both reference blocks existing in the background image and the photographed image. The larger the calculated value S, the lower the degree of similarity.
-Bg (Xn, Yn) indicates the pixel value of the background image corresponding to the coordinates (Xn, Yn).
-In(Xn, Yn) indicates the pixel value of the captured image corresponding to the coordinates (Xn, Yn).
-Xn is the X coordinate value of the n-th pixel existing in the range of the reference area.
- Yn is the Y coordinate value of the nth pixel existing in the range of the reference area.
-n indicates the number of pixels existing in the range of the reference area. The value takes a value from 0 to Nmax-1 (total number of pixels constituting the reference area-1). For example, if the reference area has a size of 16×16, the value of Nmax is 256, and the value of n is 0 to 255.

補正後の動きベクトル情報７ｋは、補正部５０４で補正された動きベクトルを示す。ここでは、動きベクトルを二次元座標図に配置することにより、動きベクトルが可視化して示されている。補正部５０４は、動きベクトル生成部５０２により生成された動きベクトルのうち判定部５０３で正解であると判定された動きベクトルを選択することにより、補正された動きベクトルを得る。補正後動きベクトル情報７ｋの動きベクトル７４１は、補正部５０４により選択された動きベクトルであり、動きベクトル情報７ｆの動きベクトル７１４に対応する。 The corrected motion vector information 7k indicates the motion vector corrected by the correction unit 504. Here, the motion vectors are visualized and shown by arranging them on a two-dimensional coordinate diagram. The correction unit 504 obtains a corrected motion vector by selecting the motion vector determined to be correct by the determination unit 503 from among the motion vectors generated by the motion vector generation unit 502. The motion vector 741 of the corrected motion vector information 7k is a motion vector selected by the correction unit 504, and corresponds to the motion vector 714 of the motion vector information 7f.

＜処理手順＞
続いて、図８のフローチャートを参照して、第２本実施形態に係る画像処理装置１００の処理手順を説明する。画像処理装置１００は外部から撮影画像を受信することで処理を開始する。Ｓ１０１、Ｓ１０２、Ｓ１０８、Ｓ１０９は第１実施形態（図４）と同様の処理である。 <Processing procedure>
Next, the processing procedure of the image processing apparatus 100 according to the second embodiment will be described with reference to the flowchart in FIG. 8. The image processing device 100 starts processing by receiving a captured image from the outside. S101, S102, S108, and S109 are the same processes as in the first embodiment (FIG. 4).

Ｓ２０１において、補間部５００は、処理対象の現在のフレーム時刻について距離情報（深度画像）の補間画像を生成するかしないかを判定する。処理対象のフレーム時刻において、深度画像が存在する場合は補間画像を生成しないと判定され（Ｓ２０１でＮＯ）、処理はＳ１０３に進む。処理対象のフレーム時刻に深度画像が存在しない場合は、当該フレーム時刻が補間対象フレーム時刻であると判定され（Ｓ２０１でＹＥＳ）、処理はＳ２０２に進む。Ｓ２０２において、算出部５０１は、処理対象のフレーム時刻の前後の深度画像から極小ＳＡＤの座標を検出する。Ｓ２０３において、動きベクトル生成部５０２は、基準画像に設定された全ての基準ブロックについて動きベクトルを生成する。 In S201, the interpolation unit 500 determines whether to generate an interpolated image of distance information (depth image) for the current frame time to be processed. If a depth image exists at the frame time to be processed, it is determined not to generate an interpolated image (NO in S201), and the process proceeds to S103. If no depth image exists at the frame time to be processed, it is determined that the frame time is the interpolation target frame time (YES in S201), and the process proceeds to S202. In S202, the calculation unit 501 detects the coordinates of the minimum SAD from the depth images before and after the frame time to be processed. In S203, the motion vector generation unit 502 generates motion vectors for all reference blocks set in the reference image.

Ｓ２０４において、判定部５０３は、基準ブロックごとに、複数の動きベクトルが生成されたかどうかを判定する。複数の動きブロックが生成されたと判定された場合は（Ｓ２０４でＹＥＳ）、生成された動きベクトルを補正する為、処理はＳ２０５に進む。１つの動きベクトルのみが生成されたと判定された場合は（Ｓ２０４でＮＯ）、次の基準ブロックの処理に進む。Ｓ２０５～Ｓ２０９の処理は、補間対象フレーム時刻の撮影画像と背景画像を用いて動きベクトルを補正する処理である。Ｓ２０５において、判定部５０３は、基準ブロックを動きベクトルに従って移動した先の参照領域に対応する範囲に存在する背景画像の画素群を抽出する。Ｓ２０６において、判定部５０３は、その参照領域に対応する範囲に存在する撮影画像の画素群を抽出する。Ｓ２０７において、判定部５０３は、Ｓ２０５で抽出された背景画像の画素群とＳ２０６で抽出された撮影画像の画素群との類似度を、式（２）を用いて算出する。判定部５０３は、Ｓ２０３で生成された動きベクトルに対応する参照ブロックの全てに対して上述したＳ２０５～Ｓ２０７の処理を繰り返す。全ての参照ブロックに対する処理が完了すると、処理はＳ２０８に進む。 In S204, the determination unit 503 determines whether a plurality of motion vectors have been generated for each reference block. If it is determined that a plurality of motion blocks have been generated (YES in S204), the process proceeds to S205 in order to correct the generated motion vector. If it is determined that only one motion vector has been generated (NO in S204), the process proceeds to the next reference block. The processing in S205 to S209 is a process of correcting a motion vector using the captured image at the interpolation target frame time and the background image. In S205, the determination unit 503 extracts a group of pixels of the background image existing in a range corresponding to the reference area after moving the reference block according to the motion vector. In S206, the determination unit 503 extracts a group of pixels of the photographed image existing in a range corresponding to the reference area. In S207, the determination unit 503 calculates the degree of similarity between the pixel group of the background image extracted in S205 and the pixel group of the captured image extracted in S206 using equation (2). The determination unit 503 repeats the processes of S205 to S207 described above for all reference blocks corresponding to the motion vector generated in S203. When the processing for all reference blocks is completed, the processing advances to S208.

Ｓ２０８において、判定部５０３は、上記処理により算出された類似度のうち、最も小さい類似度が得られた参照領域を判定する。Ｓ２０９において、補正部５０４は、最も小さい類似度が得られた動きベクトルを処理対象となっている基準ブロックの正解の動きベクトルとして選択する。設定された基準ブロック全てに対してＳ２０４～Ｓ２０９の処理が繰り返され、全ての基準ブロックに対する処理が完了したら処理はＳ２１０に進む。Ｓ２１０において、フレーム補間部５０５は、フレーム時刻Ｆ（ｎ）の深度画像とフレーム時刻Ｆ（ｎ＋２）の深度画像と補正後の動きベクトルとを用いて、フレーム時刻Ｆ（ｎ＋１）の深度画像を表す補間画像を生成する。Ｓ１０３において、第２分離部１０６が、深度画像または補間画像から距離情報前景マスクを生成する。 In S208, the determination unit 503 determines the reference region from which the smallest degree of similarity is obtained among the degrees of similarity calculated by the above process. In S209, the correction unit 504 selects the motion vector with the smallest degree of similarity as the correct motion vector of the reference block being processed. The processing of S204 to S209 is repeated for all the set reference blocks, and when the processing for all the reference blocks is completed, the processing proceeds to S210. In S210, the frame interpolation unit 505 represents the depth image at frame time F(n+1) using the depth image at frame time F(n), the depth image at frame time F(n+2), and the corrected motion vector. Generate an interpolated image. In S103, the second separation unit 106 generates a distance information foreground mask from the depth image or the interpolated image.

以上、述べたように、第２実施形態によれば、深度画像の補間を実施する際に、補間対象フレーム時刻と同時刻の撮影画像を用いて動きベクトルの妥当性を判断することで、より正しく動きベクトルを推定することができる。これにより、似た形状の複数の物体が周期的な間隔で移動している場合でも、正確に深度画像を補間することができる。なお、第２実施形態では、深度画像の補間において動きベクトルの補正を行ったが、距離情報前景マスクの補間において上述したような動きベクトルの補正を行うようにしてもよい。すなわち、第１実施形態で説明した距離情報前景マスクの画素単位の補正に代えて、上述したような動きベクトルの補正を距離情報前景マスクの補間に適用してもよい。 As described above, according to the second embodiment, when interpolating a depth image, the validity of a motion vector is determined using an image taken at the same time as the interpolation target frame time, thereby improving the accuracy of the motion vector. The motion vector can be estimated correctly. Thereby, even when a plurality of objects with similar shapes are moving at periodic intervals, it is possible to accurately interpolate the depth image. Note that in the second embodiment, the motion vector is corrected in the interpolation of the depth image, but the motion vector may be corrected as described above in the interpolation of the distance information foreground mask. That is, instead of the pixel-by-pixel correction of the distance information foreground mask described in the first embodiment, the above-described motion vector correction may be applied to the interpolation of the distance information foreground mask.

（他の実施形態）
本開示は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present disclosure provides a system or device with a program that implements one or more functions of the embodiments described above via a network or a storage medium, and one or more processors in a computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１０１：記憶部、１０２：背景生成部、１０３：第１分離部、１０４：映像生成部、１０５：記憶部、１０６：第２分離部、１０７：補間部、１０８：判定部、１０９：補正部、１１０：制御部、５００：補間部、５０１：算出部、５０２：動きベクトル生成部、５０３：判定部、５０４：補正部、５０５：フレーム補間部 101: Storage unit, 102: Background generation unit, 103: First separation unit, 104: Video generation unit, 105: Storage unit, 106: Second separation unit, 107: Interpolation unit, 108: Determination unit, 109: Correction unit , 110: Control unit, 500: Interpolation unit, 501: Calculation unit, 502: Motion vector generation unit, 503: Determination unit, 504: Correction unit, 505: Frame interpolation unit

Claims

a first acquisition means for acquiring a photographed image photographed at a first frame rate by the photographing device;
a second acquisition unit that acquires a distance information image that is generated at a second frame rate that is slower than the first frame rate and is composed of pixel values based on the distance to the subject;
Based on a distance information image acquired by the second acquisition means and a photographed image corresponding to an interpolation target frame time that is a frame time of the first frame rate but not a frame time of the second frame rate. An image processing apparatus comprising: generating means for generating a distance information image corresponding to the interpolation target frame time.

The image processing apparatus according to claim 1, wherein the distance information image is a foreground image representing a foreground region extracted from a depth image in which each pixel represents a distance to the subject.

The generating means generates a foreground image at the interpolation target frame time from foreground images before and after the interpolation target frame time, and corrects the generated foreground image based on the captured image at the interpolation target frame time. The image processing device according to claim 2.

further comprising background generation means for generating a background image for separating a foreground region from the photographed image based on the photographed image,
The generation means generates a pixel value between a photographed image corresponding to the interpolation target frame time and a background image in an area corresponding to a foreground area represented by the distance information image at the interpolation target frame time. 4. The image processing apparatus according to claim 3, wherein the foreground region represented by the foreground image is corrected.

5. The image processing apparatus according to claim 4, wherein the generating means excludes a region of pixels in which the difference is smaller than a threshold value from a foreground region represented by the generated foreground image.

The generating means generates a motion vector for each area based on foreground images before and after the interpolation target frame time, and uses the generated motion vectors to generate a foreground image at the interpolation target frame time. The image processing apparatus according to any one of claims 2 to 5.

The generating means is
Generating a motion vector for each area based on foreground images before and after the interpolation target frame time, and using the generated motion vectors, generating a foreground image at the interpolation target frame time;
10. When a plurality of motion vectors are generated for one region, one motion vector is selected from the plurality of motion vectors based on a captured image corresponding to the interpolation target frame time. 2. The image processing device according to 2.

When the amount of movement indicated by the motion vector of each region in the foreground images before and after the interpolation target frame time is smaller than a threshold value, the generating means generates one of the foreground images before and after the interpolation target frame time. 8. The image processing apparatus according to claim 6, wherein the image processing apparatus is used as a foreground image at an interpolation target frame time.

The image processing apparatus according to claim 1, wherein the distance information image is a depth image in which each pixel represents a distance to the subject.

The generating means is
Generate a motion vector for each region from depth images before and after the interpolation target frame time, and use the generated motion vectors to generate a depth image at the interpolation target frame time;
10. When a plurality of motion vectors are generated for one region, one motion vector is selected from the plurality of motion vectors based on a captured image corresponding to the interpolation target frame time. 9. The image processing device according to 9.

further comprising background generation means for generating a background image for separating a foreground region from the photographed image based on the photographed image,
The generating means calculates, for each of the plurality of motion vectors, a difference in pixel values of a group of pixels in an area associated with the motion vector in the photographed image and the background image corresponding to the interpolation target frame time, and calculates the difference in pixel values of a pixel group in an area associated with the motion vector. The image processing apparatus according to claim 10, wherein the motion vector for which the difference has been calculated is selected from the plurality of motion vectors.

When the amount of movement indicated by the motion vector of each region in the depth images before and after the interpolation target frame time is smaller than a threshold value, the generating means generates one of the depth images before and after the interpolation target frame time. The image processing device according to claim 10 or 11, wherein the image processing device is used as a depth image of an interpolation target frame time.

By generating a foreground image representing a foreground region from each of the depth image acquired by the second acquisition means and the depth image generated by the generation means, a foreground image corresponding to each frame time of the first frame rate is generated. The image processing apparatus according to any one of claims 9 to 12, further comprising a third acquisition means for acquiring a foreground image.

further comprising a separating means for separating a foreground region from the photographed image by comparing a difference between the photographed image and a background image generated based on the photographed image with a threshold value;
Claims 2 to 8, wherein the separation means sets the threshold value differently between a foreground region indicated by a foreground image acquired at a frame time of a photographed image to be processed and another region. , 13. The image processing device according to any one of .

a first acquisition step of acquiring a photographed image photographed at a first frame rate by a photographing device;
a second acquisition step of acquiring a distance information image composed of pixel values based on the distance to the subject, generated at a second frame rate slower than the first frame rate;
Based on the distance information image acquired in the second acquisition step and a photographed image corresponding to an interpolation target frame time that is a frame time of the first frame rate and not a frame time of the second frame rate. An image processing method comprising: a generation step of generating a distance information image corresponding to the interpolation target frame time.

A program for causing a computer to function as each means of the image processing apparatus according to claim 1.