JP6389452B2

JP6389452B2 - Image processing apparatus, image processing method, and computer program

Info

Publication number: JP6389452B2
Application number: JP2015232343A
Authority: JP
Inventors: 麻理子五十川; 弾三上; 康輔高橋; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2018-09-12
Anticipated expiration: 2035-11-27
Also published as: JP2017097800A

Description

本発明は、複数の撮影装置によって撮影された画像の処理技術に関する。 The present invention relates to a technique for processing an image photographed by a plurality of photographing devices.

近年、３６０度のパノラマ画像を撮影できるカメラ（以下、「全天球カメラ」という。）が普及し始めている。全天球カメラによって撮影されたパノラマ画像（以下、「全天球画像」という。）は、所望の視点位置に全天球カメラを設置することで撮影することができる。しかしながら、競技中の競技者の邪魔となるためサッカーコートやバスケットコートなどの競技用コートの中には全天球カメラを設置することができない。そのため、競技用コートの中の所望の視点位置における競技中の全天球画像を撮影することができない。 In recent years, cameras capable of capturing 360-degree panoramic images (hereinafter referred to as “global cameras”) have begun to spread. A panoramic image captured by the omnidirectional camera (hereinafter referred to as “spherical image”) can be captured by installing the omnidirectional camera at a desired viewpoint position. However, a spherical camera cannot be installed in a competition court such as a soccer court or a basketball court because it interferes with the competitors during the competition. For this reason, it is not possible to take an omnidirectional image during competition at a desired viewpoint position in the competition court.

そこで、全天球カメラを設置することのできない場所に仮想的な視点である仮想視点を設定して、この仮想視点において全天球カメラで撮影したかのような全天球画像を、コートの外側に設置された複数のカメラによって撮影された画像を合成することによって得る技術が提案されている（例えば、非特許文献１参照）。以下の説明において、仮想視点における全天球画像を、仮想全天球画像と記載する。 Therefore, a virtual viewpoint, which is a virtual viewpoint, is set at a place where the omnidirectional camera cannot be installed, and an omnidirectional image as if taken with the omnidirectional camera at this virtual viewpoint is displayed on the court. There has been proposed a technique obtained by combining images taken by a plurality of cameras installed on the outside (see, for example, Non-Patent Document 1). In the following description, the omnidirectional image at the virtual viewpoint is referred to as a virtual omnidirectional image.

仮想全天球画像を複数のカメラによって撮影された画像の合成によって得るシステムの具体例について説明する。
図１０は、従来システムにおいて仮想全天球画像を得るためのシステムを示す図である。図１０に示すように、画像処理システム１は、全天球カメラ２と、複数のカメラ３−１、３−２、３−３、・・・、３−Ｎ（以下、「カメラ群３」という。）（Ｎは４以上の整数）と、画像処理装置４と、表示装置５とを備える。画像処理システム１は、競技用コート１０内に仮想視点１１を設定した場合に、競技用コート１０外に設置したカメラ群３によって撮影された画像の合成によって仮想視点１１における仮想全天球画像を得る。 A specific example of a system that obtains a virtual omnidirectional image by combining images captured by a plurality of cameras will be described.
FIG. 10 is a diagram showing a system for obtaining a virtual omnidirectional image in a conventional system. As shown in FIG. 10, the image processing system 1 includes an omnidirectional camera 2 and a plurality of cameras 3-1, 3-2, 3-3,..., 3-N (hereinafter “camera group 3”). (N is an integer of 4 or more), an image processing device 4, and a display device 5. When the virtual viewpoint 11 is set in the competition court 10, the image processing system 1 generates a virtual omnidirectional image at the virtual viewpoint 11 by combining the images photographed by the camera group 3 installed outside the competition court 10. obtain.

全天球カメラ２は、全天球画像を撮影するカメラである。全天球カメラ２は、競技が行われる前のタイミングで競技用コート１０内の仮想視点１１の位置に設置される。全天球カメラ２は、仮想視点１１の位置から、仮想全天球画像の背景となる画像（以下、「背景画像」という。）を撮影する。全天球カメラ２で撮影された背景画像は、画像処理装置４に入力されて蓄積される。このように、画像処理装置４は、予め背景画像を蓄積する。 The omnidirectional camera 2 is a camera that captures an omnidirectional image. The omnidirectional camera 2 is installed at the position of the virtual viewpoint 11 in the competition court 10 at a timing before the competition is performed. The omnidirectional camera 2 captures an image (hereinafter referred to as “background image”) that is the background of the virtual omnidirectional image from the position of the virtual viewpoint 11. The background image captured by the omnidirectional camera 2 is input to the image processing device 4 and accumulated. As described above, the image processing apparatus 4 stores the background image in advance.

競技用コート１０の周囲には、カメラ群３が設置されている。カメラ群３の各カメラ３−１、３−２、３−３、・・・、３−Ｎは、それぞれ仮想視点１１を含む画角となるように競技用コート１０の周囲に設置されている。カメラ群３は、仮想視点１１を含む領域を撮影する。画像処理装置４は、カメラ群３の各カメラ３−１、３−２、３−３、・・・、３−Ｎによって撮影された画像に対して画像処理を施して、背景画像に画像処理後の画像を合成して仮想全天球画像を生成する。表示装置５は、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ、ＣＲＴ（Cathode Ray Tube）ディスプレイ等の画像表示装置である。表示装置５は、画像処理装置４で生成した仮想全天球画像を表示する。 A camera group 3 is installed around the competition court 10. Each of the cameras 3-1, 3-2, 3-3,..., 3 -N of the camera group 3 is installed around the competition court 10 so as to have an angle of view including the virtual viewpoint 11. . The camera group 3 captures an area including the virtual viewpoint 11. The image processing device 4 performs image processing on images captured by the cameras 3-1, 3-2, 3-3,..., 3-N of the camera group 3, and performs image processing on the background image. The virtual celestial sphere image is generated by combining the subsequent images. The display device 5 is an image display device such as a liquid crystal display, an organic EL (Electro Luminescence) display, or a CRT (Cathode Ray Tube) display. The display device 5 displays the virtual omnidirectional image generated by the image processing device 4.

次に、画像処理システム１における画像処理の具体例について図１１を用いて説明する。
図１１は、画像処理システム１における画像処理の流れを説明するための図である。図１１（Ａ）は、背景画像２０の具体例を示す図である。背景画像２０には、仮想視点１１を中心として全方位（３６０度）の被写体が撮影されている。背景画像２０は、競技用コート１０内に人物がいない状態で撮影される画像であるので競技用コート１０内には人物が撮影されない。 Next, a specific example of image processing in the image processing system 1 will be described with reference to FIG.
FIG. 11 is a diagram for explaining the flow of image processing in the image processing system 1. FIG. 11A is a diagram illustrating a specific example of the background image 20. In the background image 20, a subject in all directions (360 degrees) with the virtual viewpoint 11 as the center is photographed. Since the background image 20 is an image shot in a state where no person is present in the competition court 10, no person is photographed in the competition court 10.

図１１（Ｂ）は、各カメラ３−１、３−２及び３−３で撮影された画像を示す図である。図１１（Ｂ）には、左からカメラ３−１で撮影された画像２１と、カメラ３−２で撮影された画像２２と、カメラ３−３で撮影された画像２３とが示されている。画像処理装置４は、画像２１〜２３のそれぞれから仮想視点１１を含む領域２１１、２２１、２３１を抽出する。画像処理装置４は、抽出した領域２１１、２２１、２３１の画像に対して、画像処理を行うことで背景画像２０に合成可能な部分画像２１１ａ、２２１ａ、２３１ａを生成する。 FIG. 11B is a diagram showing images taken by the cameras 3-1, 3-2, and 3-3. FIG. 11B shows an image 21 captured by the camera 3-1, an image 22 captured by the camera 3-2, and an image 23 captured by the camera 3-3 from the left. . The image processing apparatus 4 extracts regions 211, 221, and 231 including the virtual viewpoint 11 from each of the images 21 to 23. The image processing apparatus 4 generates partial images 211a, 221a, and 231a that can be combined with the background image 20 by performing image processing on the extracted images of the regions 211, 221, and 231.

画像処理装置４は、背景画像２０に対して部分画像２１１ａ、２２１ａ、２３１ａを合成することによって仮想全天球画像２４を生成する。図１１（Ｃ）は、画像処理装置４が生成する仮想全天球画像２４の例を示す図である。図１１（Ｃ）に示すように、仮想全天球画像２４の所定の領域には部分画像２１１ａ、２２１ａ、２３１ａが合成されている。そのため、仮想全天球画像２４として、競技用コート１０上に物体（例えば、人物）が撮影されている画像が生成される。 The image processing device 4 generates the virtual omnidirectional image 24 by combining the partial images 211a, 221a, and 231a with the background image 20. FIG. 11C is a diagram illustrating an example of the virtual omnidirectional image 24 generated by the image processing device 4. As shown in FIG. 11C, partial images 211 a, 221 a, and 231 a are synthesized in a predetermined area of the virtual omnidirectional image 24. Therefore, an image in which an object (for example, a person) is photographed on the competition court 10 is generated as the virtual omnidirectional image 24.

高橋康輔、外３名、「複数カメラ映像を用いた仮想全天球映像合成に関する検討」、社団法人電子情報通信学会、信学技報、2015年6月、vol.115、no.76、p.43-48Kosuke Takahashi, 3 others, “Study on virtual spherical image synthesis using multiple camera images”, The Institute of Electronics, Information and Communication Engineers, IEICE Technical Report, June 2015, vol.115, no.76, p. .43-48

仮想全天球画像２４は、仮想視点１１において撮影されたかのような全天球画像に対して、コートの外側に設置されたカメラ群３によって撮影された画像を合成することによって生成される。ところが、カメラ３から見て仮想視点１１よりも手前に被写体が存在する場合には、仮想視点から本来見えないはずの手前の被写体が映り込んでしまい、被写体によって仮想視点から本来見えるはずの背景及び被写体のいずれかが遮蔽されてしまうという問題があった。このような問題が発生すると、適切な仮想全天球画像を生成することが出来なくなってしまう。なお、このような問題は、仮想視点に関係なく、複数の撮影装置のそれぞれで撮影された複数の画像のいずれかに除去対象となる被写体が映り込んでいる場合全てに共通する問題である。 The virtual omnidirectional image 24 is generated by combining an image captured by the camera group 3 installed outside the court with an omnidirectional image as if it was captured at the virtual viewpoint 11. However, when a subject is present in front of the virtual viewpoint 11 when viewed from the camera 3, a foreground subject that should not be seen from the virtual viewpoint is reflected, and the background and the subject that should be originally seen from the virtual viewpoint by the subject are reflected. There was a problem that one of the subjects was shielded. When such a problem occurs, an appropriate virtual omnidirectional image cannot be generated. Such a problem is common to all cases where a subject to be removed is reflected in one of a plurality of images captured by each of a plurality of imaging devices, regardless of the virtual viewpoint.

上記事情に鑑み、本発明は、複数の撮影装置のそれぞれで撮影された複数の画像のいずれかに存在する除去対象となる被写体による画像の品質低下を抑制することができる技術の提供を目的としている。 In view of the above circumstances, an object of the present invention is to provide a technique capable of suppressing deterioration in image quality caused by a subject to be removed that exists in any of a plurality of images captured by a plurality of imaging devices. Yes.

本発明の一態様は、複数の撮影装置それぞれで撮影された映像のうち特定の映像中の特定の入力画像において前記特定の入力画像に存在する除去対象となる対象物が存在する補完対象領域に基づいて、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完することの難しさの度合いを表す難易性を判定する判定部と、前記判定部によって、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完しやすいと判定された場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第１補完手法を利用して前記補完対象領域を補完し、それ以外の場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像及び他の撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第２補完手法を利用して前記補完対象領域を補完する補完処理部と、を備える画像処理装置である。 According to one aspect of the present invention, in a specific input image in a specific video among videos captured by each of a plurality of imaging devices, a complementary target region in which a target object to be removed that exists in the specific input image exists. A determination unit configured to determine a difficulty representing a degree of difficulty in complementing the region to be complemented using another input image captured by an imaging device that captured the specific input image, and the determination When it is determined by the unit that it is easy to complement the complement target region using another input image captured by the imaging device that captured the specific input image, the image is captured by the imaging device that captured the specific input image. Using the first complement method for complementing the region to be complemented using any of the input images as a reference image, otherwise, the specifying The complement using a second complementing method that complements the region to be complemented using either an input image photographed by the photographing device that photographed the input image or an input image photographed by another photographing device as a reference image. And an interpolation processing unit that complements a target area.

本発明の一態様は、上記の画像処理装置であって、前記補完処理部は、前記第２補完手法を利用する場合、全ての撮影装置のうち、前記特定の入力画像を撮影した撮影装置に近接する撮影装置によって撮影された入力画像を参照画像として用いる。 One aspect of the present invention is the above-described image processing device, wherein the complement processing unit includes a photographing device that captures the specific input image among all the photographing devices when the second complementing method is used. An input image photographed by a nearby photographing device is used as a reference image.

本発明の一態様は、上記の画像処理装置であって、前記補完処理部は、前記第１補完手法を利用する場合、前記特定の入力画像が撮影された時刻と同じ時刻に撮影していた他の撮影装置の入力画像において、前記他の撮影装置が過去に撮影した入力画像の中から前記時刻に他の撮影装置が撮影した入力画像と類似する入力画像を検索し、検索された前記入力画像が撮影された時刻の時間近傍の入力画像を前記特定の入力画像を撮影した撮影装置の入力画像から取得し、取得した前記入力画像を用いて前記補完対象領域を補完する。 One aspect of the present invention is the above-described image processing device, wherein the complement processing unit captures the same input image at the same time as when the first complement method is used. In an input image of another photographing device, an input image similar to the input image photographed by the other photographing device at the time is searched from input images photographed by the other photographing device in the past, and the retrieved input An input image in the vicinity of the time at which the image was captured is acquired from the input image of the imaging device that captured the specific input image, and the complement target area is supplemented using the acquired input image.

本発明の一態様は、上記の画像処理装置であって、前記判定部は、前記補完対象領域が背景である場合、又は、前記特定の入力画像と類似する入力画像を他の撮影装置の入力画像から検索し、検索された入力画像が撮影された時刻の近傍に前記特定の入力画像を撮影した撮影装置によって撮影された入力画像に前記対象物が存在する場合には前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完しやすいと判定し、その他の場合にはそれ以外と判定する。 One aspect of the present invention is the above-described image processing device, wherein the determination unit inputs an input image similar to the specific input image to another imaging device when the region to be complemented is a background. When the object is present in the input image captured by the imaging device that captured the specific input image in the vicinity of the time when the retrieved input image was captured, the specific input image is selected. It is determined that it is easy to complement the region to be complemented using another input image photographed by the photographing device, and in other cases, it is determined that it is not.

本発明の一態様は、複数の撮影装置それぞれで撮影された映像のうち特定の映像中の特定の入力画像において前記特定の入力画像に存在する除去対象となる対象物が存在する補完対象領域に基づいて、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完することの難しさの度合いを表す難易性を判定する判定ステップと、前記判定ステップにおいて、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完しやすいと判定された場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第１補完手法を利用して前記補完対象領域を補完し、それ以外の場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像及び他の撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第２補完手法を利用して前記補完対象領域を補完する補完処理ステップと、を有する画像処理方法である。 According to one aspect of the present invention, in a specific input image in a specific video among videos captured by each of a plurality of imaging devices, a complementary target region in which a target object to be removed that exists in the specific input image exists. A determination step for determining a difficulty representing a degree of difficulty of complementing the region to be complemented using another input image captured by an imaging device that captured the specific input image, and the determination If it is determined in the step that it is easy to complement the region to be complemented using another input image captured by the image capturing device that captured the specific input image, the image capturing is performed by the image capturing device that captured the specific input image. Using the first complement method for complementing the region to be complemented using any of the input images as a reference image, And using a second complementing method that complements the region to be complemented using, as a reference image, either an input image captured by an imaging device that captured the specific input image or an input image captured by another imaging device And a complementing process step of complementing the complementing target area.

本発明の一態様は、複数の撮影装置それぞれで撮影された映像のうち特定の映像中の特定の入力画像において前記特定の入力画像に存在する除去対象となる対象物が存在する補完対象領域に基づいて、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完することの難しさの度合いを表す難易性を判定する判定ステップと、前記判定ステップにおいて、前記特定の入力画像を撮影した撮影装置によって撮影された他の入力画像を用いて前記補完対象領域を補完しやすいと判定された場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第１補完手法を利用して前記補完対象領域を補完し、それ以外の場合、前記特定の入力画像を撮影した撮影装置によって撮影された入力画像及び他の撮影装置によって撮影された入力画像のいずれかを参照画像として用いて前記補完対象領域を補完する第２補完手法を利用して前記補完対象領域を補完する補完処理ステップと、をコンピュータに実行させるためのコンピュータプログラムである。 According to one aspect of the present invention, in a specific input image in a specific video among videos captured by each of a plurality of imaging devices, a complementary target region in which a target object to be removed that exists in the specific input image exists. A determination step for determining a difficulty representing a degree of difficulty of complementing the region to be complemented using another input image captured by an imaging device that captured the specific input image, and the determination If it is determined in the step that it is easy to complement the region to be complemented using another input image captured by the image capturing device that captured the specific input image, the image capturing is performed by the image capturing device that captured the specific input image. Using the first complement method for complementing the region to be complemented using any of the input images as a reference image, And using a second complementing method that complements the region to be complemented using, as a reference image, either an input image captured by an imaging device that captured the specific input image or an input image captured by another imaging device And a complement processing step of complementing the complement target area.

本発明により、複数の撮影装置のそれぞれで撮影された複数の画像のいずれかに存在する除去対象となる被写体による画像の品質低下を抑制することが可能となる。 According to the present invention, it is possible to suppress deterioration in image quality due to a subject to be removed that exists in any of a plurality of images captured by each of a plurality of imaging devices.

本発明における画像処理システム１００のシステム構成を示す図である。1 is a diagram illustrating a system configuration of an image processing system 100 according to the present invention. 合成情報テーブルの具体例を示す図である。It is a figure which shows the specific example of a synthetic | combination information table. 画像処理装置８０の処理の流れを示すフローチャートである。4 is a flowchart showing a processing flow of the image processing apparatus 80. 判定部８０３による判定処理の流れを示すフローチャートである。10 is a flowchart illustrating a flow of determination processing by a determination unit 803. 第１補完手法における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process in a 1st complementation method. 第２補完手法における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in a 2nd complementation method. 入力画像及びマスク画像の具体例を示す図である。It is a figure which shows the specific example of an input image and a mask image. マスク画像毎のマスク領域の大きさを表す図である。It is a figure showing the magnitude | size of the mask area | region for every mask image. アフィン変換後のマスク画像の具体例を示す図である。It is a figure which shows the specific example of the mask image after an affine transformation. 従来システムにおいて仮想全天球画像を得るためのシステムを示す図である。It is a figure which shows the system for obtaining a virtual omnidirectional image in a conventional system. 画像処理システム１における画像処理の流れを説明するための図である。FIG. 3 is a diagram for explaining a flow of image processing in the image processing system 1;

以下、本発明の一実施形態を、図面を参照しながら説明する。
図１は、本発明における画像処理システム１００のシステム構成を示す図である。
画像処理システム１００は、全天球カメラ６０、複数のカメラ７０−１〜７０−Ｍ（Ｍは２以上の整数）及び画像処理装置８０を備える。なお、以下の説明では、カメラ７０−１〜７０−Ｍについて特に区別しない場合には、カメラ７０と記載する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing a system configuration of an image processing system 100 according to the present invention.
The image processing system 100 includes an omnidirectional camera 60, a plurality of cameras 70-1 to 70-M (M is an integer of 2 or more), and an image processing device 80. In the following description, the cameras 70-1 to 70-M will be referred to as cameras 70 unless otherwise distinguished.

全天球カメラ６０は、撮影対象領域８１内の仮想視点８２の位置に設置される。撮影対象領域８１は、例えばサッカーコートやバスケットコートなどの競技用コートなどである。仮想視点８２は、所定の領域（本実施形態では、撮影対象領域８１）内に仮想的に設定された視点である。全天球カメラ６０は、仮想視点８２の位置における全天球画像を撮影する。本実施形態における全天球画像は、仮想視点８２を中心として撮影対象領域８１全体を含む。全天球カメラ６０による処理は、画像処理装置８０による処理の開始前に行われる。全天球カメラ６０は、撮影した全天球画像を背景画像として画像処理装置８０に出力する。 The omnidirectional camera 60 is installed at the position of the virtual viewpoint 82 in the shooting target area 81. The imaging target area 81 is, for example, a competition court such as a soccer court or a basket court. The virtual viewpoint 82 is a viewpoint virtually set in a predetermined area (in this embodiment, the imaging target area 81). The omnidirectional camera 60 captures an omnidirectional image at the position of the virtual viewpoint 82. The omnidirectional image in the present embodiment includes the entire imaging target region 81 with the virtual viewpoint 82 as the center. The processing by the omnidirectional camera 60 is performed before the processing by the image processing device 80 is started. The omnidirectional camera 60 outputs the captured omnidirectional image to the image processing apparatus 80 as a background image.

Ｍ台のカメラ７０−１、７０−２、・・・、７０−Ｍは、撮影対象領域８１の外側に設けられ、画像を動画（映像）で撮影するカメラであり、仮想視点８２を含む領域を撮影する。Ｍ台のカメラ７０−１、７０−２、・・・、７０−Ｍのそれぞれで撮影された動画は、複数フレームの画像により構成される。図１に示すように、カメラ７０−１には仮想視点８２の位置上を通過する光線７１が入力され、カメラ７０−２には仮想視点８２の位置上を通過する光線７２が入力される。以下、カメラ７０に入力される光線を実光線と記載する。図１では示していないが、カメラ７０は撮影対象領域８１の周囲に設置される。つまり、カメラ７０は、それぞれ仮想視点８２を含む画角となるように撮影対象領域８１の周囲を取り囲むように設置される。図１においてＭは、２以上の整数であり、同程度の画質の仮想全天球画像を得ようとするのであれば撮影対象領域８１が大きいほど大きな値となる。また、撮影対象領域８１の大きさが同じであれば、Ｍの値が大きい程、合成領域（仮想全天球画像において、Ｍ台のカメラ７０からの画像を合成した領域）の面積が大きくなり、あるいは合成領域の大きさが同じであれば合成領域における画質が向上する仮想全天球画像の画質を高いものにしようとするほど大きな値となる。 M cameras 70-1, 70-2,..., 70 -M are cameras that are provided outside the shooting target area 81 and shoot an image as a moving image (video), and include a virtual viewpoint 82. Shoot. The moving images shot by each of the M cameras 70-1, 70-2,..., 70-M are composed of a plurality of frames. As shown in FIG. 1, a light beam 71 passing over the position of the virtual viewpoint 82 is input to the camera 70-1, and a light beam 72 passing over the position of the virtual viewpoint 82 is input to the camera 70-2. Hereinafter, the light beam input to the camera 70 is referred to as a real light beam. Although not shown in FIG. 1, the camera 70 is installed around the shooting target area 81. That is, the camera 70 is installed so as to surround the photographing target area 81 so that the angle of view includes the virtual viewpoint 82. In FIG. 1, M is an integer equal to or greater than 2. If a virtual omnidirectional image having the same image quality is to be obtained, the value M increases as the shooting target area 81 increases. Further, if the size of the imaging target area 81 is the same, the larger the value of M, the larger the area of the synthesis area (the area where the images from the M cameras 70 are synthesized in the virtual omnidirectional image). Alternatively, if the size of the synthesis area is the same, the value becomes larger as the image quality of the virtual omnidirectional image that improves the image quality in the synthesis area is increased.

画像処理装置８０は、Ｍ台のカメラ７０−１、７０−２、・・・、７０−Ｍのそれぞれで撮影されたそれぞれの動画から入力画像を事前に取得する。撮影されたそれぞれの動画は複数フレームの画像で構成されており、本実施形態における画像処理装置８０は処理対象となるフレームの画像を入力画像として取得する。画像処理装置８０は、全天球カメラ６０によって撮影された全天球画像と、Ｍ台のカメラ７０−１、７０−２、・・・、７０−Ｍのそれぞれで撮影された動画からそれぞれ取得された入力画像とに基づいて仮想全天球画像を生成する。また、画像処理装置８０は、入力画像に前景オブジェクトが存在するか否か判定する。ここで、前景オブジェクトとは、仮想視点８２よりもカメラ７０に近い位置に存在するオブジェクトである。オブジェクトとは、背景画像に含まれていないが入力画像に含まれている人物、物体（例えばボール）等である。画像処理装置８０は、入力画像に前景オブジェクトが存在する場合には、前景オブジェクトを除去し、除去された前景オブジェクトが存在した領域（以下、「補完対象領域」という。）を補完する。例えば、画像処理装置８０は、補完対象領域を第１補完手法及び第２補完手法のいずれかの手法を利用して補完する。第１補完手法は、前景オブジェクトが存在する入力画像を撮影したカメラ７０の過去の入力画像を参照画像として用いて補完対象領域を補完する補完手法である。第２補完手法は、前景オブジェクトが存在する入力画像を撮影したカメラ７０の過去の入力画像と他のカメラ７０の過去の入力画像のいずれかを参照画像として用いて補完対象領域を補完する補完手法である。 The image processing apparatus 80 acquires an input image in advance from each moving image captured by each of the M cameras 70-1, 70-2, ..., 70-M. Each captured moving image is composed of images of a plurality of frames, and the image processing apparatus 80 in this embodiment acquires an image of a frame to be processed as an input image. The image processing device 80 is acquired from the omnidirectional image captured by the omnidirectional camera 60 and the moving images captured by the M cameras 70-1, 70-2,. A virtual omnidirectional image is generated based on the input image. Further, the image processing apparatus 80 determines whether or not a foreground object exists in the input image. Here, the foreground object is an object that is present at a position closer to the camera 70 than the virtual viewpoint 82. The object is a person, an object (for example, a ball) or the like that is not included in the background image but is included in the input image. When the foreground object exists in the input image, the image processing apparatus 80 removes the foreground object and complements the area where the removed foreground object exists (hereinafter referred to as “complementation target area”). For example, the image processing apparatus 80 supplements the complement target region using either the first complement method or the second complement method. The first complementing method is a complementing method that complements a region to be complemented using a past input image of the camera 70 that captured an input image in which a foreground object exists as a reference image. The second complementing method is a complementing method for complementing a complementation target region using either a past input image of a camera 70 that captured an input image in which a foreground object is present or a past input image of another camera 70 as a reference image. It is.

次に、画像処理装置８０の機能構成について説明する。画像処理装置８０は、バスで接続されたＣＰＵ（Central Processing Unit）やメモリや補助記憶装置などを備え、画像処理プログラムを実行する。画像処理プログラムの実行によって、画像処理装置８０は、入力画像記憶部８０１、オブジェクト解析部８０２、判定部８０３、補完処理部８０４、合成情報記憶部８０５、部分領域抽出部８０６、背景画像記憶部８０７、画像合成部８０８を備える装置として機能する。なお、画像処理装置８０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されてもよい。また、画像処理プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。また、画像処理プログラムは、電気通信回線を介して送受信されてもよい。 Next, the functional configuration of the image processing apparatus 80 will be described. The image processing apparatus 80 includes a CPU (Central Processing Unit), a memory, an auxiliary storage device, and the like connected by a bus, and executes an image processing program. By executing the image processing program, the image processing apparatus 80 includes an input image storage unit 801, an object analysis unit 802, a determination unit 803, a complement processing unit 804, a composite information storage unit 805, a partial region extraction unit 806, and a background image storage unit 807. Functions as an apparatus including the image composition unit 808. Note that all or part of the functions of the image processing apparatus 80 may be realized using hardware such as an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA). . The image processing program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system. Further, the image processing program may be transmitted / received via a telecommunication line.

入力画像記憶部８０１は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。入力画像記憶部８０１は、各カメラ７０を識別するためのカメラＩＤに関連付けて、カメラ７０毎の入力画像を時系列順に記憶する。入力画像は、撮影時刻及び動画の画像データを含む。
オブジェクト解析部８０２は、入力画像記憶部８０１に記憶されている入力画像を入力とする。オブジェクト解析部８０２は、入力された入力画像中に前景オブジェクトが存在するか否か解析し、入力画像中に前景オブジェクトが存在する場合には入力画像中の当該オブジェクトの領域（補完対象領域）を示す情報を判定部８０３に出力する。入力画像中に前景オブジェクトが存在しない場合には、オブジェクト解析部８０２は入力画像を部分領域抽出部８０６に出力する。オブジェクト解析部８０２が行う解析の方法としては、例えば、セグメンテーションにより入力画像に含まれるオブジェクト領域を切り出す方法が挙げられる。次に、オブジェクト解析部８０２は、切り出したオブジェクト領域に基づいて、仮想視点８２よりも手前に位置するか否かを判別する。なお、判別する手法はどのような方法であってもよい。例えば、オブジェクト解析部８０２は、図示しないセンサ等で、あるカメラ７０から入力画像中のオブジェクトまでの距離を取得してもよいし、複数のカメラ７０で撮影された入力画像から幾何的な解析により、あるカメラ７０から入力画像中のオブジェクトまでの距離を取得してもよい。そして、オブジェクト解析部８０２は、取得した距離が、あるカメラ７０から仮想視点までの距離以下であれば入力画像中の対応するオブジェクトを仮想視点より手前にある前景オブジェクトであると判定し、それ以外の場合には入力画像中の対応するオブジェクトを後景オブジェクトであると判定する。そして、オブジェクト解析部８０２は、入力画像と、入力画像中の補完対象領域を示す情報と、入力画像における後景オブジェクトの有無を示す情報とを判定部８０３に出力する。オブジェクト解析部８０２は、オブジェクト認識により他のカメラ７０の入力画像にも同一オブジェクトが存在する場合には、それも前景オブジェクトとする。 The input image storage unit 801 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The input image storage unit 801 stores the input images for each camera 70 in time series in association with the camera ID for identifying each camera 70. The input image includes shooting time and moving image data.
The object analysis unit 802 receives an input image stored in the input image storage unit 801 as an input. The object analysis unit 802 analyzes whether or not a foreground object exists in the input image that has been input. If a foreground object exists in the input image, the object analysis unit 802 determines the region (complementation target region) of the object in the input image. The information shown is output to the determination unit 803. If there is no foreground object in the input image, the object analysis unit 802 outputs the input image to the partial region extraction unit 806. Examples of the analysis method performed by the object analysis unit 802 include a method of cutting out an object region included in an input image by segmentation. Next, the object analysis unit 802 determines whether or not the object analysis unit 802 is positioned in front of the virtual viewpoint 82 based on the extracted object region. Note that any method may be used for the determination. For example, the object analysis unit 802 may acquire the distance from a certain camera 70 to an object in the input image with a sensor or the like (not shown), or may perform geometric analysis from input images captured by a plurality of cameras 70. The distance from a certain camera 70 to an object in the input image may be acquired. The object analysis unit 802 determines that the corresponding object in the input image is a foreground object in front of the virtual viewpoint if the acquired distance is equal to or less than the distance from the camera 70 to the virtual viewpoint, and otherwise In the case of, it is determined that the corresponding object in the input image is a background object. Then, the object analysis unit 802 outputs an input image, information indicating a complement target area in the input image, and information indicating the presence or absence of a background object in the input image to the determination unit 803. If the same object exists in the input image of another camera 70 by object recognition, the object analysis unit 802 also sets it as the foreground object.

判定部８０３は、オブジェクト解析部８０２から出力された補完対象領域の情報と、入力画像と、後景オブジェクトの有無を示す情報とを入力とする。判定部８０３は、入力された入力画像と、補完対象領域の情報と、後景オブジェクトの有無を示す情報とに基づいて、入力画像を撮影したカメラ７０によって撮影された他の入力画像を用いて補完対象領域を補完することの難しさの度合いを表す難易性を判定する。そして、判定部８０３は、判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。具体的な判定方法については後述する。なお、判定結果には、入力画像を撮影したカメラ７０によって撮影された他の入力画像を用いて補完対象領域を補完しやすいか否かを示す情報が含まれる。 The determination unit 803 receives as input the information on the complement target area output from the object analysis unit 802, the input image, and information indicating the presence or absence of the foreground object. The determination unit 803 uses another input image captured by the camera 70 that captured the input image based on the input image that has been input, information on the complement target area, and information indicating the presence or absence of the background object. The difficulty representing the degree of difficulty in complementing the complement target area is determined. Then, the determination unit 803 outputs the determination result and information on the complement target area to the complement processing unit 804. A specific determination method will be described later. Note that the determination result includes information indicating whether or not it is easy to complement the complement target area using another input image captured by the camera 70 that captured the input image.

補完処理部８０４は、判定部８０３から出力された判定結果と、補完対象領域の情報と、入力画像記憶部８０１に記憶されている入力画像とを入力とする。補完処理部８０４は、判定結果が補完対象領域を補完しやすいことを示している場合、第１補完手法を利用して入力画像中の補完対象領域を補完する。一方、補完処理部８０４は、判定結果が補完対象領域を補完しやすいことを示していない場合、第２補完手法を利用して入力画像中の補完対象領域を補完する。具体的な処理については後述する。補完処理部８０４は、補完がなされた後の入力画像（以下、「補完後画像」という。）を部分領域抽出部８０６に出力する。
合成情報記憶部８０５は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。合成情報記憶部８０５は、合成情報テーブルを記憶する。合成情報テーブルは、入力画像から生成される画像を背景画像に重畳するための情報（以下、「合成情報」という。）を表すレコード（以下、「合成情報レコード」という。）によって構成される。 The complement processing unit 804 receives the determination result output from the determination unit 803, the information on the complement target area, and the input image stored in the input image storage unit 801. When the determination result indicates that the complement target region is easily supplemented, the complement processing unit 804 supplements the supplement target region in the input image using the first complement method. On the other hand, when the determination result does not indicate that the complement target area is easily supplemented, the complement processing unit 804 supplements the complement target area in the input image using the second complement method. Specific processing will be described later. The complement processing unit 804 outputs the input image after completion (hereinafter referred to as “post-complement image”) to the partial region extraction unit 806.
The combined information storage unit 805 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The composite information storage unit 805 stores a composite information table. The composite information table is composed of records (hereinafter referred to as “composite information records”) representing information for superimposing an image generated from the input image on the background image (hereinafter referred to as “composite information”).

図２は、合成情報テーブルの具体例を示す図である。
合成情報テーブルは、合成情報レコードを複数有する。合成情報レコードは、カメラＩＤ、抽出領域情報及び変換情報の各値を有する。カメラＩＤの値は、カメラ７０を識別するための識別情報を表す。例えば、図２におけるカメラＩＤ“Ｃ１”で識別されるカメラ７０はカメラ７０−１であり、カメラＩＤ“ＣＭ”で識別されるカメラ７０はカメラ７０−Ｍである。 FIG. 2 is a diagram illustrating a specific example of the synthesis information table.
The composite information table has a plurality of composite information records. The composite information record has each value of camera ID, extraction area information, and conversion information. The value of the camera ID represents identification information for identifying the camera 70. For example, the camera 70 identified by the camera ID “C1” in FIG. 2 is the camera 70-1, and the camera 70 identified by the camera ID “CM” is the camera 70-M.

抽出領域情報の値は、同じ合成情報レコードのカメラＩＤで識別されるカメラ７０で撮影された画像（入力画像）から抽出する領域（以下、「抽出領域」という。）に関する情報を表す。抽出領域情報の具体例として、左上座標、幅及び高さがある。左上座標は、抽出領域の左上の座標を表す。幅は、抽出領域の幅を表す。高さは、抽出領域の高さを表す。なお、幅及び高さは、抽出領域の左上座標を基準とし、かつ、仮想視点を含む範囲に設定される。抽出領域は、合成された画像上で隣接するカメラ７０の部分画像との間に隙間ができないような領域に設定されることが望ましい。 The value of the extraction area information represents information regarding an area (hereinafter referred to as “extraction area”) extracted from an image (input image) photographed by the camera 70 identified by the camera ID of the same composite information record. Specific examples of the extraction area information include upper left coordinates, width, and height. The upper left coordinate represents the upper left coordinate of the extraction region. The width represents the width of the extraction area. The height represents the height of the extraction area. Note that the width and height are set to a range including the virtual viewpoint with the upper left coordinate of the extraction region as a reference. It is desirable that the extraction area is set to an area where there is no gap between adjacent images of the camera 70 on the synthesized image.

変換情報の値は、同じ合成情報レコードの抽出領域情報に応じて抽出された部分領域画像を部分画像に変換するための情報を表す。部分画像は、部分領域画像を背景画像の対応領域に違和感なく重畳するために、部分領域画像に対して上記変換情報に応じて拡大、縮小、回転等の変形処理を行うことによって生成される。この変形処理は、例えば、画像に対してアフィン変換を施すことによって行う。画像に対してアフィン変換を施す場合の変換情報は、例えばアフィン変換行列である。以下、部分領域画像に対して行う変形処理としてアフィン変換を用いる例を示すが、変形処理はアフィン変換に限定される必要はなく、変換情報に応じて拡大、縮小、回転等による画像の変換を行う処理であればどのような処理であってもよい。なお、アフィン変換行列には、背景画像において部分画像を重畳する領域を示す情報が含まれる。 The value of the conversion information represents information for converting a partial area image extracted according to the extracted area information of the same composite information record into a partial image. The partial image is generated by subjecting the partial region image to deformation processing such as enlargement, reduction, and rotation in accordance with the conversion information in order to superimpose the partial region image on the corresponding region of the background image. This deformation process is performed, for example, by performing affine transformation on the image. The conversion information when performing affine transformation on an image is, for example, an affine transformation matrix. The following shows an example of using affine transformation as the deformation processing performed on the partial area image. However, the deformation processing is not limited to affine transformation, and image conversion by enlargement, reduction, rotation, etc. is performed according to conversion information. Any process may be used as long as the process is performed. Note that the affine transformation matrix includes information indicating a region in which the partial image is superimposed on the background image.

アフィン変換行列は、以下に示す方法により予め取得して合成情報記憶部８０５に記憶される。例えば、仮想視点８２から複数種類の距離（奥行）の位置に格子模様のチェスボードを設置して、仮想視点８２に設置した全天球カメラ６０で撮影したチェスボードを含む画像と、カメラ７０で撮影したチェスボードを含む画像とを比較する。そして、チェスボードの各格子について、全天球カメラ６０で撮影した画像中のチェスボードの格子と、カメラ７０で撮影した画像中のチェスボードの格子が対応するように画像を変換するアフィン変換行列を求める。このようにして、チェスボードを設置した奥行に対応したアフィン変換行列を求める。 The affine transformation matrix is acquired in advance by the following method and stored in the composite information storage unit 805. For example, an image including a chess board photographed by the omnidirectional camera 60 installed at the virtual viewpoint 82 with a lattice-shaped chess board installed at a plurality of types of distances (depths) from the virtual viewpoint 82, and the camera 70 Compare the captured image with the chess board. Then, for each grid of the chess board, an affine transformation matrix for converting the image so that the grid of the chess board in the image captured by the omnidirectional camera 60 corresponds to the grid of the chess board in the image captured by the camera 70. Ask for. In this way, an affine transformation matrix corresponding to the depth at which the chess board is installed is obtained.

図２に示される例では、合成情報テーブルには複数の合成情報レコードが登録されている。図２において、合成情報テーブルの最上段に登録されている合成情報レコードは、カメラＩＤの値が“Ｃ１”、左上座標の値が“（Ａ，Ｂ）”、幅の値が“Ｃ”、高さの値が“Ｄ”、変換情報の値が“Ａ１_ｉ”である。すなわち、カメラＩＤ“Ｃ１”で識別されるカメラ７０−１の入力画像から左上座標（Ａ，Ｂ）、幅Ｃ、高さＤで表される領域を抽出し、抽出された画像に対して“Ａ１_ｉ”の変形処理を施すことが表されている。 In the example shown in FIG. 2, a plurality of composite information records are registered in the composite information table. In FIG. 2, the composite information record registered at the top of the composite information table has a camera ID value “C1”, an upper left coordinate value “(A, B)”, a width value “C”, The height value is “D” and the conversion information value is “A1 _i ”. That is, an area represented by upper left coordinates (A, B), width C, and height D is extracted from the input image of the camera 70-1 identified by the camera ID “C1”. It is shown that the transformation process of A1 _i ″ is performed.

図１に戻って、画像処理装置８０の説明を続ける。
部分領域抽出部８０６は、オブジェクト解析部８０２から出力された入力画像と、合成情報記憶部８０５に記憶されている合成情報テーブルとを入力とする。また、部分領域抽出部８０６は、補完処理部８０４から出力された補完後画像と、合成情報記憶部８０５に記憶されている合成情報テーブルとを入力とする。部分領域抽出部８０６は、合成情報テーブルに基づいて、画像（入力画像又は補完後画像）から部分領域を抽出することによって部分領域画像を生成する。部分領域抽出部８０６は、生成した部分領域画像を画像合成部８０８に出力する。
背景画像記憶部８０７は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。背景画像記憶部８０７は、全天球カメラ６０によって撮影された全天球画像を背景画像として記憶する。 Returning to FIG. 1, the description of the image processing apparatus 80 will be continued.
The partial area extraction unit 806 receives the input image output from the object analysis unit 802 and the synthesis information table stored in the synthesis information storage unit 805. Further, the partial region extraction unit 806 receives the post-complementation image output from the complementation processing unit 804 and the synthesis information table stored in the synthesis information storage unit 805. The partial area extraction unit 806 generates a partial area image by extracting the partial area from the image (input image or post-complementation image) based on the synthesis information table. The partial region extraction unit 806 outputs the generated partial region image to the image composition unit 808.
The background image storage unit 807 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The background image storage unit 807 stores the omnidirectional image captured by the omnidirectional camera 60 as a background image.

画像合成部８０８は、部分領域抽出部８０６から出力された部分領域画像と、合成情報記憶部８０５に記憶されている合成情報テーブルと、背景画像記憶部８０７に記憶されている背景画像とを入力とする。画像合成部８０８は、部分領域画像に対して、合成情報テーブルに含まれる変換情報のアフィン変換行列に基づいて変形処理を行うことによって部分画像を生成する。画像合成部８０８は、生成した部分画像をアフィン変換行列に基づいて背景画像に重畳することで仮想全天球画像を生成する。より具体的には、画像合成部８０８は、部分画像の画素値で、背景画像上の部分画像を重畳する領域の画素値を置き換えることによって背景画像に部分領域画像を重畳することで仮想全天球画像を生成する。 The image composition unit 808 receives the partial region image output from the partial region extraction unit 806, the composite information table stored in the composite information storage unit 805, and the background image stored in the background image storage unit 807. And The image synthesis unit 808 generates a partial image by performing a deformation process on the partial region image based on the affine transformation matrix of the transformation information included in the synthesis information table. The image composition unit 808 generates a virtual omnidirectional image by superimposing the generated partial image on the background image based on the affine transformation matrix. More specifically, the image composition unit 808 superimposes the partial region image on the background image by replacing the pixel value of the region on the background image with the pixel value of the partial image, thereby superimposing the partial region image on the background image. Generate a sphere image.

図３は、画像処理装置８０の処理の流れを示すフローチャートである。
オブジェクト解析部８０２は、あるカメラｘを選択する（ステップＳ１０１）。カメラｘの選択方法は、どのような方法であってもよい。予め特定のカメラをカメラｘとして定めておいてもよい。次に、オブジェクト解析部８０２は、選択したカメラｘの入力画像を入力画像記憶部８０１から読み出す（ステップＳ１０２）。例えば、オブジェクト解析部８０２は、選択したカメラｘが撮影した動画中のある入力画像を入力画像記憶部８０１から読み出す。オブジェクト解析部８０２は、読み出した入力画像を入力とし、入力された入力画像を解析することによって入力画像に前景オブジェクトが存在するか否か判定する（ステップＳ１０３）。なお、この際、オブジェクト解析部８０２は、入力画像に後景オブジェクトが存在するか否かについても解析する。入力画像に前景オブジェクトが存在する場合（ステップＳ１０３−ＹＥＳ）、オブジェクト解析部８０２は入力画像と、補完対象領域を示す情報と、後景オブジェクトの有無を示す情報とを判定部８０３に出力する。判定部８０３は、オブジェクト解析部８０２から出力された入力画像と、補完対象領域を示す情報と、後景オブジェクトの有無を示す情報とを入力とし、入力された入力画像及び補完対象領域と、後景オブジェクトの有無を示す情報とに基づいて難易性を判定、即ち、補完対象領域を補完しやすいか否か判定する（ステップＳ１０４）。 FIG. 3 is a flowchart showing a process flow of the image processing apparatus 80.
The object analysis unit 802 selects a certain camera x (step S101). The method for selecting the camera x may be any method. A specific camera may be determined in advance as the camera x. Next, the object analysis unit 802 reads the input image of the selected camera x from the input image storage unit 801 (step S102). For example, the object analysis unit 802 reads an input image in a moving image captured by the selected camera x from the input image storage unit 801. The object analysis unit 802 uses the read input image as input, and analyzes the input image to determine whether a foreground object exists in the input image (step S103). At this time, the object analysis unit 802 also analyzes whether or not a background object exists in the input image. When the foreground object exists in the input image (step S103—YES), the object analysis unit 802 outputs the input image, information indicating the complement target area, and information indicating the presence or absence of the foreground object to the determination unit 803. The determination unit 803 receives the input image output from the object analysis unit 802, information indicating the complement target area, and information indicating the presence / absence of the background object, and inputs the input image and complement target area, The difficulty is determined based on the information indicating the presence / absence of a scene object, that is, it is determined whether or not the region to be complemented is easily complemented (step S104).

判定部８０３は、補完しやすいと判定した場合（ステップＳ１０４−ＹＥＳ）、補完しやすいことを示す判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。補完処理部８０４は、判定部８０３から出力された判定結果と、補完対象領域の情報と、入力画像記憶部８０１に記憶されている入力画像とを入力とし、入力された判定結果が補完しやすいことを示しているため第１補完手法を利用して入力画像中の補完対象領域を補完する（ステップＳ１０５）。第１補完手法の具体的な処理については後述する。補完処理部８０４は、補完後画像を部分領域抽出部８０６に出力する。
一方、判定部８０３は、補完しやすいと判定しなかった場合（ステップＳ１０４−ＮＯ）、その旨を示す判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。補完しやすいと判定しなかった場合に出力される判定結果の一例として、例えば、補完しやすいわけではないことを示す判定結果、あるいは、補完しにくいことを示す判定結果が挙げられる。補完処理部８０４は、判定部８０３から出力された判定結果と、補完対象領域の情報と、入力画像記憶部８０１に記憶されている入力画像とを入力とし、入力された判定結果が補完しやすいことを示していないため第２補完手法を利用して入力画像中の補完対象領域を補完する（ステップＳ１０６）。第２補完手法の具体的な処理については後述する。補完処理部８０４は、補完後画像を部分領域抽出部８０６に出力する。 If the determination unit 803 determines that complementation is easy (YES in step S <b> 104), the determination unit 803 outputs a determination result indicating that complementation is easy and information on the complement target area to the complement processing unit 804. The complement processing unit 804 inputs the determination result output from the determination unit 803, the information on the region to be complemented, and the input image stored in the input image storage unit 801, and the input determination result is easily supplemented. Therefore, the first interpolation method is used to complement the region to be complemented in the input image (step S105). Specific processing of the first complementing method will be described later. The complement processing unit 804 outputs the post-complementation image to the partial region extraction unit 806.
On the other hand, if the determination unit 803 does not determine that complementation is easy (step S104—NO), the determination unit 803 outputs a determination result indicating that and information on the complement target area to the complement processing unit 804. As an example of a determination result that is output when it is not determined that complementation is easy, for example, a determination result indicating that supplementation is not easy or a determination result indicating that complementation is difficult is given. The complement processing unit 804 receives the determination result output from the determination unit 803, the information on the region to be complemented, and the input image stored in the input image storage unit 801, and the input determination result is easily supplemented. Since this is not indicated, the complement target region in the input image is complemented using the second complement method (step S106). Specific processing of the second complement method will be described later. The complement processing unit 804 outputs the post-complementation image to the partial region extraction unit 806.

部分領域抽出部８０６は、オブジェクト解析部８０２から出力された入力画像と、合成情報記憶部８０５に記憶されている合成情報テーブル、又は、補完処理部８０４から出力された補完後画像と、合成情報記憶部８０５に記憶されている合成情報テーブルとを入力とし、合成情報記憶部８０５に記憶されている合成情報テーブルで示される部分領域を画像から抽出することによって部分領域画像を生成する（ステップＳ１０７）。具体的には、部分領域抽出部８０６は、オブジェクト解析部８０２から出力された入力画像と、合成情報テーブルとを入力とした場合、つまりステップＳ１０３の処理において、入力画像に前景オブジェクトが存在しないと判定された場合（ステップ１０３−ＮＯ）、入力画像から合成情報テーブルで示される部分領域を抽出することによって部分領域画像を生成する。また、部分領域抽出部８０６は、補完処理部８０４から出力された補完後画像と、合成情報テーブルとを入力とした場合、つまりステップＳ１０６又は１０７の処理がなされた後、補完後画像から合成情報テーブルで示される部分領域を抽出することによって部分領域画像を生成する。その後、部分領域抽出部８０６は、生成した部分領域画像を画像合成部８０８に出力する。例えば、部分領域抽出部８０６は、カメラ７０−１の入力画像に対して、合成情報テーブルのカメラ７０−１に対応する合成情報レコードの抽出領域情報で表される領域を抽出することによって、カメラ７０−１の入力画像の部分領域画像を生成する。 The partial area extraction unit 806 includes the input image output from the object analysis unit 802, the synthesis information table stored in the synthesis information storage unit 805, or the post-complementation image output from the complement processing unit 804, and the synthesis information. The composite information table stored in the storage unit 805 is used as an input, and a partial region image is generated by extracting the partial region indicated by the composite information table stored in the composite information storage unit 805 from the image (step S107). ). Specifically, the partial region extraction unit 806 receives the input image output from the object analysis unit 802 and the composite information table, that is, in the process of step S103, if there is no foreground object in the input image. If it is determined (step 103 -NO), a partial region image is generated by extracting the partial region indicated by the composite information table from the input image. Further, the partial region extraction unit 806 receives the post-complementation image output from the complementation processing unit 804 and the synthesis information table, that is, after the processing of step S106 or 107 is performed, the synthesis information is obtained from the post-complementation image. A partial area image is generated by extracting the partial area shown in the table. Thereafter, the partial region extraction unit 806 outputs the generated partial region image to the image composition unit 808. For example, the partial region extraction unit 806 extracts the region represented by the extraction region information of the composite information record corresponding to the camera 70-1 in the composite information table from the input image of the camera 70-1, thereby A partial area image of the input image 70-1 is generated.

画像合成部８０８は、部分領域抽出部８０６から出力された部分領域画像と、合成情報記憶部８０５に記憶されている合成情報テーブルと、背景画像記憶部８０７に記憶されている背景画像とを入力とし、合成情報テーブルの変換情報に基づいて、生成された部分領域画像に対して変形処理を行うことによって部分画像を生成し、生成した部分画像を背景画像上の対応する位置に重畳する（ステップＳ１０８）。例えば、画像合成部８０８は、カメラ７０−１の入力画像から生成された部分領域画像に対して、合成情報テーブルのカメラ７０−１に対応する合成情報レコードの変換情報に応じて変形処理を行うことによって、カメラ７０−１の部分領域画像の部分画像を生成し、生成した部分画像を、変換情報に含まれる背景画像において部分画像を重畳する領域を示す情報に基づいて背景画像に重畳する。 The image composition unit 808 receives the partial region image output from the partial region extraction unit 806, the composite information table stored in the composite information storage unit 805, and the background image stored in the background image storage unit 807. Based on the conversion information in the synthesis information table, a partial image is generated by performing a deformation process on the generated partial region image, and the generated partial image is superimposed on a corresponding position on the background image (step S108). For example, the image composition unit 808 performs a deformation process on the partial region image generated from the input image of the camera 70-1 according to the conversion information of the composite information record corresponding to the camera 70-1 in the composite information table. Thus, a partial image of the partial region image of the camera 70-1 is generated, and the generated partial image is superimposed on the background image based on information indicating a region in which the partial image is superimposed on the background image included in the conversion information.

その後、判定部８０３は、全てのカメラ７０の入力画像に対して処理が行なわれたか否か判定する（ステップＳ１０９）。より具体的には、判定部８０３は、前景オブジェクトが存在すると判定された全てのカメラ７０の入力画像に対して補完手法による処理が行なわれたか否か判定する。前景オブジェクトが存在すると判定された全てのカメラ７０の入力画像に対して補完手法による処理が行なわれた場合（ステップＳ１０９−ＹＥＳ）、画像処理装置８０は処理を終了する。
一方、前景オブジェクトが存在すると判定された全てのカメラ７０の入力画像に対して補完手法による処理が行なわれていない場合（ステップＳ１０９−ＮＯ）、ステップＳ１０１以降の処理が実行される。 Thereafter, the determination unit 803 determines whether or not processing has been performed on the input images of all the cameras 70 (step S109). More specifically, the determination unit 803 determines whether or not the processing by the complement method has been performed on the input images of all the cameras 70 determined to have the foreground object. When the processing by the complementing method is performed on the input images of all the cameras 70 determined to have the foreground object (step S109—YES), the image processing device 80 ends the processing.
On the other hand, when the processing by the complementing method is not performed on the input images of all the cameras 70 determined to have the foreground object (step S109—NO), the processing after step S101 is executed.

図４は、判定部８０３による判定処理の流れを示すフローチャートである。
判定部８０３は、オブジェクト解析部８０２による解析結果に基づいて補完対象領域を決定する（ステップＳ２０１）。判定部８０３には、オブジェクト解析部８０２から出力された入力画像と、補完対象領域を示す情報と、後景オブジェクトの有無を示す情報とが入力されている。そこで、判定部８０３は、入力画像中の補完対象領域を示す情報で示される領域を補完対象領域に決定する。次に、判定部８０３は、補完対象領域が背景であるか否か判定する（ステップＳ２０２）。補完対象領域が背景であるか否かの判定は、前景オブジェクトの後方に後景オブジェクトが存在するか否かで行われる。まず、判定部８０３は、入力された後景オブジェクトの有無を示す情報から後景オブジェクトの有無を判定する。後景オブジェクトが存在しない場合、判定部８０３は補完対象領域が背景であると判定する。一方、後景オブジェクトが存在する場合、判定部８０３は後景オブジェクトが補完対象領域の後方に存在するか否か判定する。後景オブジェクトが補完対象領域の後方に存在する場合、判定部８０３は補完対象領域が背景ではないと判定する。一方、後景オブジェクトが補完対象領域の後方に存在する場合、判定部８０３は補完対象領域が背景であると判定する。 FIG. 4 is a flowchart showing the flow of determination processing by the determination unit 803.
The determination unit 803 determines a complement target area based on the analysis result by the object analysis unit 802 (step S201). The determination unit 803 receives the input image output from the object analysis unit 802, information indicating the complement target area, and information indicating the presence / absence of the background object. Therefore, the determination unit 803 determines an area indicated by the information indicating the complement target area in the input image as the complement target area. Next, the determination unit 803 determines whether or not the complement target area is the background (step S202). Whether or not the complement target area is the background is determined by whether or not the foreground object exists behind the foreground object. First, the determination unit 803 determines the presence / absence of a background object from the input information indicating the presence / absence of the background object. When the background object does not exist, the determination unit 803 determines that the complement target area is the background. On the other hand, when the foreground object exists, the determination unit 803 determines whether the foreground object exists behind the region to be complemented. When the background object exists behind the complement target area, the determination unit 803 determines that the complement target area is not the background. On the other hand, when the background object exists behind the complement target area, the determination unit 803 determines that the complement target area is the background.

補完対象領域が背景である場合（ステップＳ２０２−ＹＥＳ）、判定部８０３は補完対象領域を補完しやすいと判定する（ステップＳ２０３）。そして、判定部８０３は、補完対象領域を補完しやすいことを示す情報を含む判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。
一方、補完対象領域が背景ではない場合（ステップＳ２０２−ＮＯ）、判定部８０３は他のカメラ７０の入力画像（過去の入力画像も含む）から類似画像を検索する（ステップＳ２０４）。例えば、判定部８０３は、ステップＳ１０１の処理で選択されたカメラｘ以外のカメラ７０（例えば、カメラｙ）について、Ｓ（Ｖｙ（Ｔ＋１）、Ｖｙ（ｔ））＜Ｔｈとなる時刻ｔを求める。ここで、Ｓ（ａ、ｂ）は、ａとｂとの類似度を表す。Ｔｈは、閾値を表す。Ｔ＋１は現時刻を表し、ｔは現時刻よりも前の時刻を表す。つまり、類似画像とは、ステップＳ１０１の処理で選択されたカメラｘの入力画像の後景オブジェクトが存在する画像であり、かつ、上記Ｔｈ未満となる時刻ｔの画像を表す。 When the complement target region is the background (step S202—YES), the determination unit 803 determines that the supplement target region is easily supplemented (step S203). Then, the determination unit 803 outputs a determination result including information indicating that the complement target area is easily complemented and information on the complement target area to the complement processing unit 804.
On the other hand, when the complement target region is not the background (step S202—NO), the determination unit 803 searches for similar images from input images (including past input images) of other cameras 70 (step S204). For example, the determination unit 803 obtains a time t at which S (Vy (T + 1), Vy (t)) <Th is obtained for a camera 70 (for example, camera y) other than the camera x selected in step S101. Here, S (a, b) represents the degree of similarity between a and b. Th represents a threshold value. T + 1 represents the current time, and t represents a time before the current time. That is, the similar image is an image in which a background object of the input image of the camera x selected in the process of step S101 exists and represents an image at time t that is less than Th.

判定部８０３は、検索の結果、見つかった時刻ｔにおけるカメラｘの入力画像において補完対処領域に前景オブジェクトが存在するか否かを調べる（ステップＳ２０５）。その結果、判定部８０３は、カメラｘの入力画像において、補完対処領域内に前景オブジェクトが存在しない時刻ｔがあるか否か判定する（ステップＳ２０６）。前景オブジェクトが存在しない時刻ｔがある場合（ステップＳ２０６−ＹＥＳ）、判定部８０３は補完対象領域を補完しやすいと判定する（ステップＳ２０３）。判定部８０３は、補完対象領域を補完しやすいことを示す情報を含む判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。
一方、それ以外の場合、即ち、前景オブジェクトが存在しない時刻ｔがない場合（ステップＳ２０６−ＮＯ）、判定部８０３は補完対象領域を補完しやすいわけではない、あるいは、補完しにくいと判定する（ステップＳ２０７）。そして、判定部８０３は、補完対象領域を補完しやすいわけではないこと、あるいは、補完しにくいことを示す情報を含む判定結果と、補完対象領域の情報とを補完処理部８０４に出力する。 As a result of the search, the determination unit 803 checks whether or not a foreground object exists in the complementary handling area in the input image of the camera x at time t (step S205). As a result, the determination unit 803 determines whether there is a time t at which no foreground object exists in the complementary handling area in the input image of the camera x (step S206). If there is a time t at which no foreground object exists (step S206—YES), the determination unit 803 determines that it is easy to complement the complement target area (step S203). The determination unit 803 outputs a determination result including information indicating that the complement target region is easily supplemented and information on the complement target region to the complement processing unit 804.
On the other hand, in other cases, that is, when there is no time t when the foreground object does not exist (step S206—NO), the determination unit 803 determines that the region to be complemented is not easily supplemented or is difficult to complement ( Step S207). Then, the determination unit 803 outputs a determination result including information indicating that it is not easy to complement the supplement target region or that it is difficult to supplement, and information on the supplement target region to the complement processing unit 804.

図５は、第１補完手法における処理の流れを示すフローチャートである。
補完処理部８０４は、前景オブジェクトが存在しない時刻ｔにおけるカメラｘの入力画像を参照画像とする（ステップＳ３０１）。次に、補完処理部８０４は、補完対象領域に対して画像処理を行うことを示すマスク領域を有するマスク画像を生成する。そして、補完処理部８０４は、参照画像を用いて、生成したマスク画像で示されているマスク領域を補完する（ステップＳ３０２）。具体的には、まず補完処理部８０４は、ステップＳ１０２の処理で読み出された入力画像から補完対象領域と、補完対象領域以外の領域とを共に含む小領域（パッチ）を選択する。この方法としては、補完対象領域の輪郭上に存在するパッチであって、パッチ領域中の補完対象領域以外の領域の画素のエッジ強度が強いパッチから選定する方法などが挙げられる。次に、補完処理部８０４は、選択したパッチの補完対象領域以外の画素を基に、参照画像から類似パッチを検出する。ここで、類似パッチとは、選択したパッチの画素の画素値と画素値が類似したパッチを表す。そして、補完処理部８０４は、検出した類似パッチを、選択したパッチに重畳する。つまり、補完処理部８０４は、類似パッチの画素値で選択したパッチの画素値を置き換える。補完処理部８０４は、以上の処理を補完対象領域が無くなるまで実行する。 FIG. 5 is a flowchart showing the flow of processing in the first complement method.
The complement processing unit 804 sets the input image of the camera x at time t when no foreground object exists as a reference image (step S301). Next, the complement processing unit 804 generates a mask image having a mask area indicating that image processing is performed on the complement target area. Then, the complement processing unit 804 supplements the mask area indicated by the generated mask image using the reference image (step S302). Specifically, the complement processing unit 804 first selects a small region (patch) that includes both the complement target region and a region other than the supplement target region from the input image read out in step S102. As this method, there is a method of selecting patches that exist on the outline of the complementation target area and that have a strong edge strength of pixels in the area other than the complementation target area in the patch area. Next, the complement processing unit 804 detects a similar patch from the reference image based on pixels other than the complement target area of the selected patch. Here, the similar patch represents a patch whose pixel value is similar to the pixel value of the pixel of the selected patch. Then, the complement processing unit 804 superimposes the detected similar patch on the selected patch. That is, the complement processing unit 804 replaces the pixel value of the selected patch with the pixel value of the similar patch. The complement processing unit 804 executes the above processing until there is no complement target area.

なお、上記の例では、補完手法（コンプリーション手法）としてパッチベースの手法を例に説明したが、補完手法はこれに限定される必要はない。補完手法としては、その他の手法が用いられてもよい。例えば、補完対象領域の画素の、類似パッチ中での該当位置の画素を用いて補完する手法や、複数の類似パッチ中の該当位置の画素の重みづけ平均の画素を用いて補完する手法などがある。 In the above example, the patch-based method is described as an example of the complementing method (completion method), but the complementing method is not necessarily limited to this. Other methods may be used as the complementing method. For example, there is a method of complementing using the pixel at the corresponding position in the similar patch of the pixel in the complement target region, or a method of complementing using the weighted average pixel of the pixel at the corresponding position in the plurality of similar patches. is there.

図６は、第２補完手法における処理の流れを示すフローチャートである。
補完処理部８０４は、第１前処理を行う（ステップＳ４０１）。具体的には、補完処理部８０４は、第１前処理として、ステップＳ１０２の処理で読み出された入力画像と同時刻に撮影された全てのカメラ７０の入力画像を読み出すとともに、各入力画像の補完対象領域に対して画像処理を行うことを示すマスク領域を有するマスク画像を生成する。その結果を図７に示す。 FIG. 6 is a flowchart showing the flow of processing in the second complement method.
The complement processing unit 804 performs the first preprocessing (step S401). Specifically, the complement processing unit 804 reads out the input images of all the cameras 70 taken at the same time as the input image read out in step S102 as the first preprocessing, and A mask image having a mask area indicating that image processing is performed on the complement target area is generated. The result is shown in FIG.

図７は、入力画像及びマスク画像の具体例を示す図である。
図７（Ａ）は、同時刻に撮影された複数のカメラ７０の入力画像を表す図である。図７（Ａ）では、カメラＩＤ“１”からカメラＩＤ“５”で識別される５台のカメラ７０−１〜７０−５それぞれで撮影された入力画像を示している。例えば、入力画像８３−１はカメラＩＤ“１”で識別されるカメラ７０−１で撮影された画像を表す。各入力画像８３−１〜８３−５には、オブジェクト８４及び８５が含まれる。ここで、オブジェクト８４は後景オブジェクトを表し、オブジェクト８５は前景オブジェクトを表す。つまり、オブジェクト８５の領域が補完対象領域を表す。 FIG. 7 is a diagram illustrating specific examples of the input image and the mask image.
FIG. 7A is a diagram illustrating input images of a plurality of cameras 70 taken at the same time. FIG. 7A shows input images taken by each of the five cameras 70-1 to 70-5 identified by camera ID “1” to camera ID “5”. For example, the input image 83-1 represents an image photographed by the camera 70-1 identified by the camera ID “1”. Each input image 83-1 to 83-5 includes objects 84 and 85. Here, the object 84 represents a foreground object, and the object 85 represents a foreground object. That is, the area of the object 85 represents the complement target area.

図７（Ｂ）は、入力画像８３−１〜８３−５のマスク画像８７−１〜８７−５を表す図である。図７（Ｂ）におけるマスク画像８７−１〜８７−５は、領域８８で示す領域に対して画像処理を行うことを示し、領域８９で示す領域に対して画像処理を行わないことを示している。第１前処理では、入力画像及びマスク画像をカメラ７０の台数×時間ｔ分用意する必要があるため多次元配列となる。例えば、マスク画像の場合、（二次元×１ｃｈ×カメラの台数×時刻ｔ）次元の配列となる。 FIG. 7B is a diagram illustrating mask images 87-1 to 87-5 of the input images 83-1 to 83-5. Mask images 87-1 to 87-5 in FIG. 7B indicate that image processing is performed on the region indicated by region 88, and that image processing is not performed on the region indicated by region 89. Yes. In the first preprocessing, since it is necessary to prepare input images and mask images for the number of cameras 70 × time t, a multidimensional array is obtained. For example, in the case of a mask image, it is an array of (two dimensions × 1 ch × number of cameras × time t) dimensions.

次に、補完処理部８０４は、第２前処理を行う（ステップＳ４０２）。具体的には、補完処理部８０４は、第２前処理として、マスク画像中のマスク領域の大きさを求める。その結果を図８に示す。
図８は、マスク画像毎のマスク領域の大きさを表す図である。
図８において、縦軸はマスク領域の大きさを表し、横軸はカメラＩＤを表す。図８に示されるグラフには、各マスク画像８７−１〜８７−５のマスク領域８８−１〜８８−５の大きさ（ピクセル数）に応じたプロットが示されている。図８を参照すると、カメラＩＤ“２”〜“４”で識別されるカメラ７０−２〜７０−４の入力画像に対するマスク画像のマスク領域が最も大きいことが示されている。 Next, the complement processing unit 804 performs second preprocessing (step S402). Specifically, the complement processing unit 804 obtains the size of the mask area in the mask image as the second preprocessing. The result is shown in FIG.
FIG. 8 is a diagram illustrating the size of the mask area for each mask image.
In FIG. 8, the vertical axis represents the size of the mask area, and the horizontal axis represents the camera ID. The graph shown in FIG. 8 shows plots corresponding to the sizes (number of pixels) of the mask regions 88-1 to 88-5 of the mask images 87-1 to 87-5. Referring to FIG. 8, it is shown that the mask area of the mask image with respect to the input images of the cameras 70-2 to 70-4 identified by the camera IDs “2” to “4” is the largest.

次に、補完処理部８０４は、入力画像に対してマスク領域が存在するか否か判定する（ステップＳ４０３）。マスク領域が存在しない場合（ステップＳ４０３−ＮＯ）、画像処理装置８０は第２補完手法による処理を終了する。
一方、マスク領域が存在する場合（ステップＳ４０３−ＹＥＳ）、補完処理部８０４は基準カメラを決定する（ステップＳ４０４）。ここで、基準カメラとは、第２補完手法を施す対象となるマスク領域、つまり補完対象領域を有する入力画像を撮影したカメラ７０を表す。基準カメラを決定する方法として以下のような方法がある。 Next, the complement processing unit 804 determines whether there is a mask area for the input image (step S403). When the mask area does not exist (step S403-NO), the image processing apparatus 80 ends the process by the second complement method.
On the other hand, when the mask area exists (step S403-YES), the complement processing unit 804 determines a reference camera (step S404). Here, the reference camera represents a camera 70 that has captured an input image having a mask area to be subjected to the second complementation method, that is, a complementation target area. There are the following methods for determining the reference camera.

（基準カメラ決定方法）
ステップＳ４０２の処理で求めた入力画像のマスク領域が最も大きい入力画像を撮影したカメラ７０（前景オブジェクトが最も大きく映り込んでいる入力画像を撮影したカメラ７０）を基準カメラに決定。
マスク領域が最も大きい入力画像を撮影したカメラ７０が複数存在する場合（図８におけるカメラＩＤ２、ＩＤ３及びＩＤ４）、それらの中央に近いカメラ７０を基準カメラに決定。
中央に近いカメラ７０が複数存在する場合（例えば、マスク領域が最も大きい入力画像を撮影したカメラ７０が４台の場合）、内側の２台のカメラのうちどちらかのカメラ７０を基準カメラに決定。この場合は、どちらのカメラ７０に決定されてもよい。 (Reference camera determination method)
The camera 70 that has captured the input image having the largest mask area of the input image obtained in the process of step S402 (the camera 70 that has captured the input image in which the foreground object is reflected most greatly) is determined as the reference camera.
When there are a plurality of cameras 70 that have captured the input image having the largest mask area (camera ID2, ID3, and ID4 in FIG. 8), the camera 70 that is close to the center of these is determined as the reference camera.
When there are a plurality of cameras 70 close to the center (for example, when there are four cameras 70 capturing an input image having the largest mask area), one of the two inner cameras 70 is determined as a reference camera. . In this case, either camera 70 may be determined.

図８の例では、カメラＩＤ３のカメラ７０が基準カメラに決定される。その後、補完処理部８０４は、基準カメラ以外のカメラ７０の入力画像及び全てのマスク画像に対してアフィン変換処理を行う（ステップＳ４０５）。ここで行われるアフィン変換処理は、基準カメラを基準として、基準カメラ以外のカメラ７０の入力画像及び全てのマスク画像を基準カメラから見た画像に変換する処理を表す。アフィン変換処理後のマスク画像の結果を図９に示す。 In the example of FIG. 8, the camera 70 with camera ID 3 is determined as the reference camera. Thereafter, the complement processing unit 804 performs an affine transformation process on the input image of the camera 70 other than the reference camera and all the mask images (step S405). The affine transformation process performed here represents a process of converting the input image of the camera 70 other than the reference camera and all the mask images into an image viewed from the reference camera with the reference camera as a reference. The result of the mask image after the affine transformation process is shown in FIG.

図９は、アフィン変換後のマスク画像の具体例を示す図である。
図９において、各マスク画像８７−１、８７−２、８７−４及び８７−５には、それぞれ領域９０が示されている。この領域９０は、アフィン変換時に対応が取れなかった画素の領域を表す。つまり、領域９０は、アフィン変換時に基準カメラからの見た目が作れなかった領域を表す。以下、領域９０を非マスク領域（マスク領域に含めない領域）と表す。なお、マスク画像８７−３は、基準カメラの入力画像におけるマスク画像であるため領域９０が存在しない。 FIG. 9 is a diagram illustrating a specific example of a mask image after affine transformation.
In FIG. 9, each mask image 87-1, 87-2, 87-4, and 87-5 shows the area | region 90, respectively. This area 90 represents an area of pixels that cannot be dealt with during affine transformation. That is, the area 90 represents an area where the appearance from the reference camera cannot be created during the affine transformation. Hereinafter, the region 90 is referred to as a non-mask region (a region not included in the mask region). Note that since the mask image 87-3 is a mask image in the input image of the reference camera, the region 90 does not exist.

その後、補完処理部８０４は、以下のエネルギー関数Ｅに基づいて、全てのカメラ７０の中からエネルギー関数Ｅが最小となるカメラを参照カメラとして決定する（ステップＳ４０６）。ここで、参照カメラとは、基準カメラの入力画像の補完対象領域を補完するために用いる入力画像を撮影したカメラを表す。なお、入力画像の探索範囲は、前後Ｔ時刻分である。 Thereafter, based on the energy function E below, the complement processing unit 804 determines a camera having the minimum energy function E from all the cameras 70 as a reference camera (step S406). Here, the reference camera represents a camera that captures an input image used for complementing a region to be complemented of the input image of the base camera. Note that the search range of the input image is the time T before and after.

（エネルギー関数Ｅ）
Ｅ＝α１×ｄ１＋α２×Ｓ＋α３×Ｓ１＋α４×（１／ｄ２）＋α５×ａ
ここで、α１〜α５は重み係数を表す。ｄ１は基準カメラから他のカメラまでの距離を表す。Ｓはマスク領域の大きさを表す。Ｓ１はアフィン変換時に対応が取れなかった画素の大きさを表す。つまり、Ｓ１は非マスク領域の大きさを表す。ｄ２は前景オブジェクトと、後景オブジェクトとのオブジェクト間距離を表す。ここで、ｄ２は以下のように算出される。まず補完処理部８０４は、セグメンテーションにより入力画像から各オブジェクトを抽出する。次に、補完処理部８０４は、前景オブジェクトと、後景オブジェクトとの中心位置間の距離を算出する。なお、後景オブジェクトが複数存在する場合には、後景オブジェクトの中で最も前景オブジェクトの近くに位置している後景オブジェクトと、前景オブジェクトとの中心位置間の距離を表す。ａは前景オブジェクトと、後景オブジェクトとが完全に離れているかを示す２値数である。ここで、ａは以下のように求められる。まず補完処理部８０４は、背景差分法によりオブジェクト領域を抽出する。次に、補完処理部８０４は、クラスタリング等によりオブジェクトの個数を算出する。そして、補完処理部８０４は、オブジェクトの個数が基準カメラのオブジェクト個数を上回っていれば１、そうでなければ０とする。 (Energy function E)
E = α1 × d1 + α2 × S + α3 × S1 + α4 × (1 / d2) + α5 × a
Here, α1 to α5 represent weighting factors. d1 represents the distance from the reference camera to another camera. S represents the size of the mask area. S1 represents the size of a pixel that cannot be dealt with during affine transformation. That is, S1 represents the size of the non-mask area. d2 represents an inter-object distance between the foreground object and the background object. Here, d2 is calculated as follows. First, the complement processing unit 804 extracts each object from the input image by segmentation. Next, the complement processing unit 804 calculates the distance between the center positions of the foreground object and the background object. When there are a plurality of foreground objects, the distance between the center position of the foreground object and the foreground object located closest to the foreground object among the foreground objects is represented. a is a binary number indicating whether the foreground object and the foreground object are completely separated. Here, a is obtained as follows. First, the complement processing unit 804 extracts an object region by the background difference method. Next, the complement processing unit 804 calculates the number of objects by clustering or the like. Then, the complement processing unit 804 sets 1 if the number of objects exceeds the number of objects of the reference camera, and sets 0 otherwise.

補完処理部８０４は、上記エネルギー関数Ｅを全てのカメラ７０の入力画像前後Ｔ時刻分算出する。そして、補完処理部８０４は、エネルギー関数Ｅが最小となる入力画像を撮影したカメラ７０を参照カメラとして決定する。なお、補完処理部８０４は、基準カメラについては参照カメラの決定処理の対象から除外してもよい。その後、補完処理部８０４は、参照カメラの入力画像のうち、エネルギー関数Ｅが最小となった入力画像を参照画像として用いて補完対象領域を補完する（ステップＳ４０７）。その後、ステップＳ４０１以降の処理を繰り返し実行する。 The complement processing unit 804 calculates the energy function E for T times before and after the input images of all the cameras 70. Then, the complement processing unit 804 determines the camera 70 that has captured the input image that minimizes the energy function E as a reference camera. Note that the complement processing unit 804 may exclude the reference camera from the reference camera determination process. Thereafter, the complementing processing unit 804 supplements the region to be complemented using the input image having the minimum energy function E among the input images of the reference camera as a reference image (step S407). Thereafter, the processing after step S401 is repeatedly executed.

以上のように構成された画像処理装置８０によれば、複数の撮影装置のそれぞれで撮影された複数の画像のいずれかに存在する除去対象となる被写体による画像の品質低下を抑制することが可能になる。以下、この効果について詳細に説明する。
画像処理装置８０は、入力画像に含まれる補完対象領域に基づいて、補完対象領域が補完しやすいか否かを判定する。そして、画像処理装置８０は、判定結果に基づいて、異なる補完手法を利用して補完対象領域を補完する。例えば、画像処理装置８０は、補完対象領域が補完しやすい場合、第１補完手法を利用して、前景オブジェクトが存在する入力画像を撮影したカメラ７０の過去の入力画像を参照画像として用いて補完対象領域を補完する。また、例えば、画像処理装置８０は、補完対象領域を補完しやすいと判定しなかった場合、第２補完手法を利用して、前景オブジェクトが存在する入力画像を撮影したカメラ７０と他のカメラ７０の過去の入力画像のいずれかを参照画像として用いて補完対象領域を補完する。このように、画像処理装置８０は、いずれかの補完手法で補完対象領域を補完する。したがって、そのため、複数の撮影装置のそれぞれで撮影された複数の画像のいずれに除去対象となる被写体が存在する場合であっても、除去対象となる被写体を除去し、被写体が除去された領域を補完することができる。複数の撮影装置のそれぞれで撮影された複数の画像のいずれかに存在する除去対象となる被写体による画像の品質低下を抑制することが可能になる。 According to the image processing device 80 configured as described above, it is possible to suppress deterioration in image quality due to a subject to be removed that exists in any of a plurality of images captured by each of a plurality of imaging devices. become. Hereinafter, this effect will be described in detail.
The image processing apparatus 80 determines whether or not the complement target area is easily complemented based on the complement target area included in the input image. Then, the image processing apparatus 80 complements the complement target area using a different complement method based on the determination result. For example, when the region to be complemented is easily complemented, the image processing device 80 uses the first complementing method and complements using the past input image of the camera 70 that captured the input image in which the foreground object exists as a reference image. Complement the target area. Further, for example, when the image processing apparatus 80 does not determine that it is easy to complement the region to be complemented, the camera 70 that captured the input image in which the foreground object is present and the other camera 70 using the second complementing method. The complement target area is complemented using any of the past input images as a reference image. As described above, the image processing apparatus 80 complements the complement target area by any one of the complement methods. Therefore, even if the subject to be removed exists in any of the plurality of images taken by each of the plurality of photographing devices, the subject to be removed is removed, and the region from which the subject has been removed is removed. Can be complemented. It is possible to suppress degradation in image quality due to a subject to be removed that exists in any of a plurality of images photographed by each of a plurality of photographing devices.

＜変形例＞
本発明は上記の実施形態に限定される必要はない。例えば、本発明は、複数のカメラ７０でそれぞれ撮影された複数の動画に含まれる特定の動画中の特定の画像において除去対象となる被写体が除去された領域（補完対象領域）を補完した後の画像を得る場合にも適用可能である。
本実施形態では、合成情報テーブルに登録されている幅の値が全てのカメラＩＤで同じであるが、幅の値はカメラＩＤ毎に異なっていてもよいし、一部のカメラＩＤで異なっていてもよい。
本実施形態では、前景オブジェクトと後景オブジェクトとの切り分け方として、オブジェクト認識やセグメンテーションを用いる方法を示したが、以下のような２つの方法が用いられてもよい。
（第１の方法）
第１の方法は、レンジセンサを用いる方法である。レンジセンサを用いる場合、仮想視点よりも奥に位置するオブジェクトを後景オブジェクトとし、仮想視点よりも手前に位置するオブジェクトを前景オブジェクトとする。
（第２の方法）
第２の方法は、競技用コート１０上部のカメラ映像を用いる方法である。競技用コート１０上部にカメラを設置できる場合、予め天井などの競技用コート１０上部に設置したカメラと、競技用コート１０外に設置したカメラ７０とのキャリブレーションを事前に済ませておく。その後、競技用コート１０上部のカメラの映像から人物やボールなどのオブジェクトをトラッキングし、各オブジェクトが各カメラ７０からどのように観察できているのかを推定する。 <Modification>
The present invention need not be limited to the above-described embodiment. For example, the present invention complements an area from which a subject to be removed (a complement target area) is removed from a specific image in a specific video included in a plurality of videos respectively captured by a plurality of cameras 70. The present invention can also be applied when obtaining an image.
In the present embodiment, the width value registered in the composite information table is the same for all camera IDs, but the width value may be different for each camera ID, or may be different for some camera IDs. May be.
In the present embodiment, a method using object recognition or segmentation is shown as a method for separating the foreground object and the background object. However, the following two methods may be used.
(First method)
The first method is a method using a range sensor. When the range sensor is used, an object positioned behind the virtual viewpoint is set as a foreground object, and an object positioned before the virtual viewpoint is set as a foreground object.
(Second method)
The second method is a method using a camera image on the upper part of the competition court 10. When the camera can be installed on the upper part of the competition court 10, calibration of the camera installed on the upper part of the competition court 10 such as the ceiling and the camera 70 installed outside the competition court 10 is completed in advance. Thereafter, an object such as a person or a ball is tracked from the video of the camera on the competition court 10 to estimate how each object can be observed from each camera 70.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

６０…全天球カメラ，７０（７０−１〜７０−Ｍ）…カメラ，４、８０…画像処理装置，８０１…入力画像記憶部，８０２…オブジェクト解析部，８０３…判定部，８０４…補完処理部，８０５…合成情報記憶部，８０６…部分領域抽出部，８０７…背景画像記憶部，８０８…画像合成部 DESCRIPTION OF SYMBOLS 60 ... Spherical camera, 70 (70-1 to 70-M) ... Camera, 4, 80 ... Image processing apparatus, 801 ... Input image memory | storage part, 802 ... Object analysis part, 803 ... Determination part, 804 ... Complementary process 805... Composite information storage unit 806. Partial region extraction unit 807 Background image storage unit 808 Image synthesis unit

Claims

The specific input based on a complementary target area where a target object to be removed exists in the specific input image in a specific input image in the specific video among the videos captured by each of the plurality of imaging devices. A determination unit that determines the difficulty indicating the degree of difficulty in complementing the region to be complemented using another input image photographed by the photographing device that photographed the image;
If the determination unit determines that it is easy to complement the region to be complemented using another input image captured by the imaging device that captured the specific input image, the imaging device that captured the specific input image The complement target area is complemented using a first complement method that complements the complement target area using any of the input images photographed as a reference image, otherwise the specific input image is photographed. The complement target region is complemented using a second complement method that complements the complement target region using either an input image photographed by the photographing device or an input image photographed by another photographing device as a reference image. A complementary processing unit to
An image processing apparatus comprising:

The complement processing unit uses, as a reference image, an input image photographed by a photographing device close to a photographing device that has photographed the specific input image among all photographing devices when using the second complement method. The image processing apparatus according to claim 1.

When the first complementing method is used, the complement processing unit uses the other imaging device in the past in an input image of another imaging device that was captured at the same time as the time when the specific input image was captured. An input image similar to an input image captured by another imaging device at the time is retrieved from the input images captured at the time, and an input image near the time at which the retrieved input image is captured is selected as the specific image. The image processing apparatus according to claim 1, wherein the image processing apparatus acquires an input image from an input image of a photographing apparatus and supplements the complement target area using the acquired input image.

The determination unit searches for an input image similar to the specific input image from an input image of another imaging device when the complement target region is a background, and sets the time at which the searched input image was captured. When the object is present in an input image taken by a photographing apparatus that has photographed the specific input image in the vicinity, the other input image photographed by the photographing apparatus that has photographed the specific input image is used. The image processing apparatus according to any one of claims 1 to 3, wherein it is determined that the complement target area is easily complemented, and other cases are determined in other cases.

The specific input based on a complementary target area where a target object to be removed exists in the specific input image in a specific input image in the specific video among the videos captured by each of the plurality of imaging devices. A determination step for determining difficulty representing a degree of difficulty of complementing the region to be complemented using another input image photographed by a photographing device that photographed the image;
In the determination step, when it is determined that the complement target area is easily complemented using another input image captured by the imaging device that captured the specific input image, the imaging device that captured the specific input image The complement target area is complemented using a first complement method that complements the complement target area using any of the input images photographed as a reference image, otherwise the specific input image is photographed. The complement target region is complemented using a second complement method that complements the complement target region using either an input image photographed by the photographing device or an input image photographed by another photographing device as a reference image. Complementary processing steps to
An image processing method.

The specific input based on a complementary target area where a target object to be removed exists in the specific input image in a specific input image in the specific video among the videos captured by each of the plurality of imaging devices. A determination step for determining difficulty representing a degree of difficulty of complementing the region to be complemented using another input image photographed by a photographing device that photographed the image;
In the determination step, when it is determined that the complement target area is easily complemented using another input image captured by the imaging device that captured the specific input image, the imaging device that captured the specific input image The complement target area is complemented using a first complement method that complements the complement target area using any of the input images photographed as a reference image, otherwise the specific input image is photographed. The complement target region is complemented using a second complement method that complements the complement target region using either an input image photographed by the photographing device or an input image photographed by another photographing device as a reference image. Complementary processing steps to
A computer program for causing a computer to execute.