JP2022070746A

JP2022070746A - Virtual viewpoint image generation system, image processing device, image generation device, control methods therefor, and program

Info

Publication number: JP2022070746A
Application number: JP2020179982A
Authority: JP
Inventors: 剛史古川; Takashi Furukawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-05-13

Abstract

To provide a technique for generating an appropriate image quality deterioration of a virtual viewpoint image according to a change in lighting condition.SOLUTION: A virtual viewpoint image generation system generates a virtual viewpoint image on the basis of three-dimensional shape information of a foreground, which is a subject based on a plurality of captured images acquired by a plurality of imaging devices, color information of the foreground, three-dimensional shape information of a background in an imaging space of the plurality of imaging devices, unlike the foreground, color information of the background, and a designated virtual viewpoint. The virtual viewpoint image generation system determines whether or not a lighting condition has changed beyond a predetermined criterion on the basis of at least one of the plurality of captured images, and adjusts the color information of at least one of the foreground and the background so that lighting conditions of the color information of the foreground and the background used for generating the virtual viewpoint image are brought closer, when it is determined that the lighting condition has changed.SELECTED DRAWING: Figure 1A

Description

本発明は、仮想視点画像生成システム、画像処理装置、画像生成装置およびそれらの制御方法、プログラムに関する。 The present invention relates to a virtual viewpoint image generation system, an image processing device, an image generation device, a control method thereof, and a program.

昨今、さまざまな視点および方向からの画像を視聴することができる仮想視点画像という技術が注目されている。仮想視点画像技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザにより高い臨場感を与えることが出来る。このような仮想視点画像は、被写体を取り囲むように設置された複数のカメラを用いて同一のタイミングで多方向から撮像して得られた複数の撮像画像（多視点の撮像画像）から生成される。 Recently, a technique called virtual viewpoint image, which enables viewing of images from various viewpoints and directions, is attracting attention. According to the virtual viewpoint image technology, for example, the highlight scenes of soccer and basketball can be viewed from various angles, so that the user can be given a higher sense of presence than a normal image. Such a virtual viewpoint image is generated from a plurality of captured images (multi-viewpoint captured images) obtained by capturing images from multiple directions at the same timing using a plurality of cameras installed so as to surround the subject. ..

特許文献１には、上述の様な仮想視点画像の生成方法について記載されている。この生成方法によれば、位置、姿勢（視点）が異なる複数のカメラから得られた複数の撮像画像のそれぞれは被写体を表す前景画像と被写体以外の背景画像に分離され、複数の前景画像のそれぞれから被写体のシルエット画像が抽出される。抽出された複数のシルエット画像は、それぞれ異なる視点からのものであり、視体積交差法などにより前景画像（被写体）の３次元形状が導出される。そして、オペレータによって指示された視点から観察される３次元形状に、上記撮像画像から得られた色情報をレンダリングすることにより任意の視点の画像（仮想視点画像）が生成される。 Patent Document 1 describes a method for generating a virtual viewpoint image as described above. According to this generation method, each of the plurality of captured images obtained from a plurality of cameras having different positions and postures (viewpoints) is separated into a foreground image representing the subject and a background image other than the subject, and each of the plurality of foreground images. The silhouette image of the subject is extracted from. The extracted plurality of silhouette images are from different viewpoints, and the three-dimensional shape of the foreground image (subject) is derived by the visual volume crossing method or the like. Then, an image of an arbitrary viewpoint (virtual viewpoint image) is generated by rendering the color information obtained from the captured image on the three-dimensional shape observed from the viewpoint instructed by the operator.

特開２０１８－１９４９８５号公報Japanese Unexamined Patent Publication No. 2018-194985

特許文献１に開示された技術では、シルエット画像から３次元形状が導出され、仮想視点画像が生成される。一般に、仮想視点画像のレンダリングでは、背景には予め撮像された色情報と背景モデルを使用し、前景となる被写体のみが撮像データに基づいてレンダリングすることで、システムの処理負荷を低減している。ところで、フラッシュなどにより急激に照明条件が変動した場合には、被写体である前景の色情報を示すテクスチャ画像は白飛びした状態で撮像される場合がある。このような場合、被写体である前景のみにフラッシュの影響を受けて白飛びしたテクスチャ画像が用いられ、背景には照明条件が変動する前のテクスチャ画像が用いられることになり、不自然な仮想視点画像が生成されてしまう。すなわち、被写体である前景のみがフラッシュの影響を受けて表示され、仮想視点画像の臨場感が失われ、視聴者に違和感を与えてしまうという課題がある。 In the technique disclosed in Patent Document 1, a three-dimensional shape is derived from a silhouette image, and a virtual viewpoint image is generated. Generally, in rendering a virtual viewpoint image, color information captured in advance and a background model are used for the background, and only the subject in the foreground is rendered based on the captured data, thereby reducing the processing load of the system. .. By the way, when the lighting conditions suddenly change due to a flash or the like, the texture image showing the color information of the foreground of the subject may be captured in an overexposed state. In such a case, a texture image that is overexposed under the influence of the flash is used only for the foreground, which is the subject, and a texture image before the lighting conditions change is used for the background, which is an unnatural virtual viewpoint. An image will be generated. That is, there is a problem that only the foreground, which is the subject, is displayed under the influence of the flash, the presence of the virtual viewpoint image is lost, and the viewer feels uncomfortable.

本発明は、照明条件の変化に応じた適切な仮想視点画像を生成する技術を提供する。 The present invention provides a technique for generating an appropriate virtual viewpoint image according to changes in lighting conditions.

本発明の一態様による仮想視点画像生成システムは、以下の構成を有する。すなわち、
複数の撮像装置により取得された複数の撮像画像に基づく被写体である前景の３次元形状情報と、前記前景の色情報と、前記前景とは異なり、前記複数の撮像装置の撮像空間における背景の３次元形状情報と、前記背景の色情報と、指定された仮想視点とに基づいて、仮想視点画像を生成する生成手段と、
前記複数の撮像画像の少なくとも１つに基づいて、照明条件に所定の基準を超える変化が生じたか否かを判定する判定手段と、
前記判定手段により前記照明条件に変化が生じたと判定された場合に、前記生成手段が用いる前記前景の色情報と前記背景の色情報の照明条件を近づけるように、前記前景の色情報と前記背景の色情報の少なくとも一方を調整する調整手段と、を有する。 The virtual viewpoint image generation system according to one aspect of the present invention has the following configurations. That is,
The three-dimensional shape information of the foreground, which is the subject based on the plurality of captured images acquired by the plurality of imaging devices, the color information of the foreground, and the background 3 in the imaging space of the plurality of imaging devices, which is different from the foreground. A generation means for generating a virtual viewpoint image based on the three-dimensional shape information, the background color information, and the designated virtual viewpoint.
A determination means for determining whether or not a change exceeding a predetermined reference has occurred in the lighting conditions based on at least one of the plurality of captured images.
When it is determined by the determination means that the lighting conditions have changed, the foreground color information and the background are brought closer to each other so that the lighting conditions of the foreground color information and the background color information used by the generation means are brought closer to each other. It has an adjusting means for adjusting at least one of the color information of the above.

本発明によれば、照明条件の変化に応じた適切な仮想視点画像を生成することができる。 According to the present invention, it is possible to generate an appropriate virtual viewpoint image according to a change in lighting conditions.

画像処理装置の機能構成例を示すブロック図。The block diagram which shows the functional composition example of an image processing apparatus. 画像処理装置のハードウェア構成例を示すブロック図。The block diagram which shows the hardware configuration example of an image processing apparatus. 仮想視点画像の例を示す図。The figure which shows the example of the virtual viewpoint image. フラッシュ情報取得部の処理を説明する図。The figure explaining the process of a flash information acquisition part. フラッシュ情報取得部の処理を示すフローチャート。A flowchart showing the processing of the flash information acquisition unit. 画像データの輝度の例を示す図。The figure which shows the example of the luminance of image data. 第1実施形態によるフラッシュ情報処理部の処理を示すフローチャート。The flowchart which shows the processing of the flash information processing part by 1st Embodiment. フラッシュの位置を示す３次元座標の例を示す図。The figure which shows the example of the 3D coordinates which shows the position of a flash. 第2実施形態によるフラッシュ情報処理部の処理を示すフローチャート。The flowchart which shows the processing of the flash information processing part by 2nd Embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential for the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are given the same reference numbers, and duplicate explanations are omitted.

＜第１実施形態＞
図１Ａに第１実施形態による仮想視点画像生成システム１の構成例を示す。仮想視点画像生成システム１は、画像処理装置１００、操作装置１１０、表示装置１２０、撮像装置１３０を備える。操作装置１１０、表示装置１２０、撮像装置１３０は、画像処理装置１００と接続される。図１Ａには、画像処理装置１００の機能構成例も示されている。操作装置１１０はオペレータから仮想視点カメラの位置／姿勢などの指示を受け付け、受け付けた指示を画像処理装置１００（仮想視点画像生成部１０５）に供給する。操作装置１１０は、例えば、仮想視点カメラの位置／姿勢を指定するためにユーザが操作するレバーとスイッチを含む。表示装置１２０は、液晶ディスプレイ等に代表される表示器を有し、画像処理装置１００（仮想視点画像生成部１０５）により生成された仮想視点画像を表示する。オペレータは、表示装置１２０に表示されている仮想視点画像を見ながら、操作装置１１０から仮想カメラの位置／姿勢の指定を行うことができる。 <First Embodiment>
FIG. 1A shows a configuration example of the virtual viewpoint image generation system 1 according to the first embodiment. The virtual viewpoint image generation system 1 includes an image processing device 100, an operation device 110, a display device 120, and an image pickup device 130. The operation device 110, the display device 120, and the image pickup device 130 are connected to the image processing device 100. FIG. 1A also shows an example of a functional configuration of the image processing apparatus 100. The operation device 110 receives instructions such as the position / orientation of the virtual viewpoint camera from the operator, and supplies the received instructions to the image processing device 100 (virtual viewpoint image generation unit 105). The operating device 110 includes, for example, a lever and a switch operated by the user to specify the position / orientation of the virtual viewpoint camera. The display device 120 has a display device typified by a liquid crystal display or the like, and displays a virtual viewpoint image generated by the image processing device 100 (virtual viewpoint image generation unit 105). The operator can specify the position / posture of the virtual camera from the operation device 110 while looking at the virtual viewpoint image displayed on the display device 120.

画像処理装置１００は、複数の撮像装置１３０による撮像画像と、操作装置１１０から入力された仮想視点カメラの位置とに基づいて仮想視点画像を生成する。画像処理装置１００は、それぞれの撮像装置１３０に対応して設けられた撮像画像処理装置１０と、撮像画像処理装置１０に接続された画像生成装置２０を有する。撮像画像処理装置１０は、撮像装置１３０から取得された撮像画像から、シルエット画像とフラッシュ情報を生成し、画像生成装置２０へ出力する。画像生成装置２０は、撮像画像処理装置１０から入力されるシルエット画像およびフラッシュ情報に基づいて、操作装置１１０から指示された仮想視点カメラの位置／姿勢における仮想視点画像を生成する。 The image processing device 100 generates a virtual viewpoint image based on the images captured by the plurality of image pickup devices 130 and the position of the virtual viewpoint camera input from the operation device 110. The image processing device 100 has an image pickup image processing device 10 provided corresponding to each image pickup device 130, and an image generation device 20 connected to the image pickup image processing device 10. The captured image processing device 10 generates a silhouette image and flash information from the captured image acquired from the image pickup device 130, and outputs the silhouette image and the flash information to the image generation device 20. The image generation device 20 generates a virtual viewpoint image at the position / orientation of the virtual viewpoint camera instructed by the operation device 110 based on the silhouette image and the flash information input from the captured image processing device 10.

撮像装置１３０はシリアルデジタルインターフェイス（ＳＤＩ）に代表される画像信号インターフェイスを備えたデジタルビデオカメラである。撮像装置１３０は画像信号インターフェイスを介して画像データを撮像画像処理装置１０に供給する。撮像装置１３０は、被写体（オブジェクト）の周囲を囲むように複数個設置されており、複数の撮像装置１３０は実質的に同時に撮像を行う。それぞれの撮像装置１３０は撮像画像処理装置１０に接続されている。なお、撮像画像処理装置１０が撮像装置１３０に設けられてもよい。 The image pickup apparatus 130 is a digital video camera provided with an image signal interface represented by a serial digital interface (SDI). The image pickup device 130 supplies image data to the image pickup image processing device 10 via the image signal interface. A plurality of image pickup devices 130 are installed so as to surround the periphery of the subject (object), and the plurality of image pickup devices 130 perform image pickup substantially at the same time. Each image pickup device 130 is connected to the image pickup image processing device 10. The image pickup image processing device 10 may be provided in the image pickup device 130.

撮像画像処理装置１０は、機能構成として、画像入力部１０１、シルエット画像導出部１０２、フラッシュ情報取得部１０３を備える。画像入力部１０１は、撮像装置１３０からの画像データ（撮像画像）を受信し、シルエット画像導出部１０２及びフラッシュ情報取得部１０３にこれを供給する。 The captured image processing device 10 includes an image input unit 101, a silhouette image derivation unit 102, and a flash information acquisition unit 103 as functional configurations. The image input unit 101 receives image data (captured image) from the image pickup device 130 and supplies the image data to the silhouette image derivation unit 102 and the flash information acquisition unit 103.

シルエット画像導出部１０２は、背景差分法などに代表される方法で被写体（前景）の形状情報であるシルエット画像を導出する。シルエット画像とは、被写体の輪郭の中が塗りつぶされた白黒の画像である。シルエット画像は、各画素が被写体の輪郭の中にあるか外にあるかを２値で表した、被写体の形状の情報を有する。シルエット画像導出部１０２は導出したシルエット画像と、シルエットに対応した画像データであるテクスチャ画像とを画像生成装置２０（３次元形状導出部１０４）に出力する。 The silhouette image derivation unit 102 derives a silhouette image which is shape information of a subject (foreground) by a method typified by the background subtraction method or the like. The silhouette image is a black-and-white image in which the inside of the outline of the subject is filled. The silhouette image has information on the shape of the subject, which is a binary representation of whether each pixel is inside or outside the contour of the subject. The silhouette image derivation unit 102 outputs the derived silhouette image and the texture image which is the image data corresponding to the silhouette to the image generation device 20 (three-dimensional shape derivation unit 104).

フラッシュ情報取得部１０３は、画像入力部１０１から取得された画像データ（撮像画像）からフラッシュ発光の有無を判断する。フラッシュ情報取得部１０３は、複数の撮像画像の少なくとも１つに基づいて、照明条件に所定の基準を超える変化が生じたか否かを判断する構成の一例である。フラッシュ情報取得部１０３は、照明条件に所定の基準を超える変化が生じた場合にフラッシュ発光有りと判断する。また、フラッシュ情報取得部１０３は、フラッシュの強度を表す情報（例えば、輝度のヒストグラム）を生成する。フラッシュ情報取得部１０３は、フラッシュ発光の有無を表す情報およびフラッシュの強度を表す情報を含むフラッシュ情報を、画像生成装置２０（フラッシュ情報処理部１０６）に出力する。フラッシュ情報取得部１０３の処理については図４のフローチャートを用いて後述する。 The flash information acquisition unit 103 determines the presence or absence of flash light emission from the image data (captured image) acquired from the image input unit 101. The flash information acquisition unit 103 is an example of a configuration for determining whether or not a change exceeding a predetermined reference has occurred in the lighting conditions based on at least one of a plurality of captured images. The flash information acquisition unit 103 determines that the flash is emitted when the lighting conditions change beyond a predetermined reference. Further, the flash information acquisition unit 103 generates information (for example, a histogram of luminance) indicating the intensity of the flash. The flash information acquisition unit 103 outputs flash information including information indicating the presence / absence of flash emission and information indicating the intensity of the flash to the image generation device 20 (flash information processing unit 106). The processing of the flash information acquisition unit 103 will be described later using the flowchart of FIG.

なお、本実施形態では、撮像装置１３０の台数と同数の撮像画像処理装置１０を設置している実施形態を示しているがこれに限られるものではない。例えば、２台またはそれ以上の所定数の撮像装置１３０に対して１つの撮像画像処理装置１０が設けられてもよい。また、撮像画像処理装置１０の機能（シルエット画像導出部１０２、フラッシュ情報取得部１０３）が撮像装置１３０に設けられてもよい。或いは、撮像画像処理装置１０の機能が画像生成装置２０に含まれるようにしてもよい。但し、その場合、複数の撮像装置１３０と画像処理装置１００がネットワークで接続され、そのネットワーク上を撮像装置１３０による撮像画像が伝送されることになるので、ネットワークの負荷が増大する。 It should be noted that the present embodiment shows an embodiment in which the same number of image pickup image processing devices 10 as the number of image pickup devices 130 are installed, but the present embodiment is not limited to this. For example, one image pickup image processing device 10 may be provided for a predetermined number of image pickup devices 130 of two or more. Further, the function of the image pickup image processing device 10 (silhouette image derivation unit 102, flash information acquisition unit 103) may be provided in the image pickup device 130. Alternatively, the function of the captured image processing device 10 may be included in the image generation device 20. However, in that case, since the plurality of image pickup devices 130 and the image processing device 100 are connected by a network and the images captured by the image pickup device 130 are transmitted on the network, the load on the network increases.

次に、画像生成装置２０の機能構成について説明する。３次元形状導出部１０４は、複数のシルエット画像導出部１０２から入力された複数のシルエット画像から被写体の３次元形状を表す３次元形状情報（３次元形状データ）を導出し、これを仮想視点画像生成部１０５に供給する。３次元形状導出部１０４がシルエット画像から３次元形状情報を導出する手法としては、一般的に使用されている視体積交差法（例えばｓｈａｐｅｆｒｏｍｓｉｌｈｏｕｅｔｔｅ法）などを用いることができる。視体積交差法では、複数の撮像部からのシルエット画像を３次元空間に逆投影し、それぞれの視体積の交差部分を求めることにより３次元形状情報が得られる。また、３次元形状導出部１０４は、シルエット画像導出部１０２から受信したテクスチャ（色情報）に基づいて３次元形状情報に対応したテクスチャを生成し、仮想視点画像生成部１０５に供給する。 Next, the functional configuration of the image generation device 20 will be described. The 3D shape derivation unit 104 derives 3D shape information (3D shape data) representing the 3D shape of the subject from a plurality of silhouette images input from the plurality of silhouette image derivation units 102, and obtains this as a virtual viewpoint image. It is supplied to the generation unit 105. As a method for the 3D shape deriving unit 104 to derive 3D shape information from a silhouette image, a generally used visual volume crossing method (for example, the shape from silhouette method) can be used. In the visual volume crossing method, silhouette images from a plurality of imaging units are back-projected into a three-dimensional space, and three-dimensional shape information is obtained by obtaining an intersecting portion of each visual volume. Further, the 3D shape derivation unit 104 generates a texture corresponding to the 3D shape information based on the texture (color information) received from the silhouette image derivation unit 102, and supplies the texture to the virtual viewpoint image generation unit 105.

仮想視点画像生成部１０５は、被写体（前景）の３次元形状情報とテクスチャ、背景の３次元形状情報とテクスチャに基づいて、操作装置１１０において指定された仮想視点カメラの位置姿勢から観察される仮想視点画像を生成する。但し、本実施形態では、背景のテクスチャはフラッシュ情報処理部１０６から提供される。仮想視点画像生成部１０５は、３次元形状導出部１０４から被写体（前景）の３次元形状情報およびテクスチャを受信する。また、仮想視点画像生成部１０５は、背景データ保存部１０７に予め保存されていた背景モデル（背景の３次元形状法）と、後述するフラッシュ情報処理部１０６が提供する、補正された背景のテクスチャを取得する。仮想視点画像生成部１０５は、取得された前景および背景の３次元形状情報とテクスチャ情報からレンダリング処理を行い、仮想視点画像を生成する。仮想視点画像生成部１０５は生成した仮想視点画像を表示装置１２０に出力する。 The virtual viewpoint image generation unit 105 observes from the position and orientation of the virtual viewpoint camera designated by the operation device 110 based on the three-dimensional shape information and texture of the subject (foreground) and the three-dimensional shape information and texture of the background. Generate a viewpoint image. However, in the present embodiment, the background texture is provided by the flash information processing unit 106. The virtual viewpoint image generation unit 105 receives the 3D shape information and texture of the subject (foreground) from the 3D shape derivation unit 104. Further, the virtual viewpoint image generation unit 105 has a background model (three-dimensional shape method of the background) previously stored in the background data storage unit 107, and a corrected background texture provided by the flash information processing unit 106 described later. To get. The virtual viewpoint image generation unit 105 performs rendering processing from the acquired three-dimensional shape information and texture information of the foreground and background, and generates a virtual viewpoint image. The virtual viewpoint image generation unit 105 outputs the generated virtual viewpoint image to the display device 120.

フラッシュ情報処理部１０６は、フラッシュ情報取得部１０３から取得されたフラッシュ情報に基づいて、背景データ保存部１０７から取得される背景のテクスチャを調整（以下、補正ともいう）する。より具体的には、フラッシュ情報処理部１０６は、フラッシュ情報がフラッシュ発光有りを示す場合には、背景データ保存部１０７に保存されている背景のテクスチャをフラッシュの強度を表す情報に基づいて調整する。以下、この調整処理を、フラッシュ再現処理と呼ぶこともある。調整（補正）されたテクスチャは仮想視点画像生成部１０５に提供される。また、フラッシュ情報処理部１０６は、フラッシュ情報がフラッシュ発光無しを示す場合には、背景データ保存部１０７に保存されている背景のテクスチャをそのまま仮想視点画像生成部１０５に提供する。フラッシュ情報処理部１０６のさらに詳細な処理については図６のフローチャートを用いて後述する。背景データ保存部１０７は、背景の３次元形状法と背景のテクスチャを保存する。背景の３次元形状情報と背景のテクスチャは、予め被写体（本例では力士）が存在しない間に、複数の撮像装置１３０が撮像空間を撮像することにより得られた撮像画像に基づいて得られる。 The flash information processing unit 106 adjusts (hereinafter, also referred to as correction) the texture of the background acquired from the background data storage unit 107 based on the flash information acquired from the flash information acquisition unit 103. More specifically, when the flash information indicates that the flash is emitted, the flash information processing unit 106 adjusts the background texture stored in the background data storage unit 107 based on the information indicating the intensity of the flash. .. Hereinafter, this adjustment process may be referred to as a flash reproduction process. The adjusted (corrected) texture is provided to the virtual viewpoint image generation unit 105. Further, when the flash information indicates that there is no flash emission, the flash information processing unit 106 provides the virtual viewpoint image generation unit 105 with the background texture stored in the background data storage unit 107 as it is. Further detailed processing of the flash information processing unit 106 will be described later using the flowchart of FIG. The background data storage unit 107 stores the three-dimensional shape method of the background and the texture of the background. The three-dimensional shape information of the background and the texture of the background are obtained based on the captured images obtained by the plurality of imaging devices 130 capturing the imaging space while the subject (wrestler in this example) does not exist in advance.

上述の機能を実現する撮像画像処理装置１０および画像生成装置２０のハードウェア構成の一例について、図１Ｂを用いて説明する。なお、以下では、画像生成装置２０の構成について説明するが、撮像画像処理装置１０も同様の構成を有している。画像生成装置２０は、ＣＰＵ１６１、ＲＯＭ１６２、ＲＡＭ１６３、補助記憶装置１６４、表示部１６５、操作部１６６、通信Ｉ／Ｆ１６７、及びバス１６８を有する。なお、表示部１６５と操作部１６６は、上述の表示装置１２０と操作装置１１０が接続される場合、省略されてもよい。また、撮像画像処理装置１０において、補助記憶装置１６４、表示部１６５、操作部１６６は省略可能である。 An example of the hardware configuration of the image pickup image processing device 10 and the image generation device 20 that realize the above-mentioned functions will be described with reference to FIG. 1B. Although the configuration of the image generation device 20 will be described below, the captured image processing device 10 also has the same configuration. The image generation device 20 includes a CPU 161, a ROM 162, a RAM 163, an auxiliary storage device 164, a display unit 165, an operation unit 166, a communication I / F 167, and a bus 168. The display unit 165 and the operation unit 166 may be omitted when the display device 120 and the operation device 110 are connected. Further, in the captured image processing device 10, the auxiliary storage device 164, the display unit 165, and the operation unit 166 can be omitted.

ＣＰＵ１６１は、ＲＯＭ１６２やＲＡＭ１６３に格納されているコンピュータプログラムやデータを用いて画像生成装置２０の全体を制御することで、図１Ａに示す画像生成装置２０の各機能を実現する。なお、画像生成装置２０がＣＰＵ１６１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ１６１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ１６２は、変更を必要としないプログラムなどを格納する。ＲＡＭ１６３は、補助記憶装置１６４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ１６７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１６４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどの種々のデータを記憶する。 The CPU 161 realizes each function of the image generation device 20 shown in FIG. 1A by controlling the entire image generation device 20 by using computer programs and data stored in the ROM 162 and the RAM 163. The image generation device 20 may have one or more dedicated hardware different from the CPU 161 and the dedicated hardware may execute at least a part of the processing by the CPU 161. Examples of dedicated hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors). The ROM 162 stores programs and the like that do not require changes. The RAM 163 temporarily stores programs and data supplied from the auxiliary storage device 164, data supplied from the outside via the communication I / F 167, and the like. The auxiliary storage device 164 is composed of, for example, a hard disk drive or the like, and stores various data such as image data and audio data.

表示部１６５は、例えば液晶ディスプレイやＬＥＤ等で構成され、ユーザが画像生成装置２０を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部１６６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１６１に入力する。ＣＰＵ１６１は、表示部１６５を制御する表示制御部、及び操作部１６６を制御する操作制御部として動作する。 The display unit 165 is composed of, for example, a liquid crystal display, an LED, or the like, and displays a GUI (Graphical User Interface) for the user to operate the image generation device 20. The operation unit 166 is composed of, for example, a keyboard, a mouse, a joystick, a touch panel, or the like, and inputs various instructions to the CPU 161 in response to an operation by the user. The CPU 161 operates as a display control unit that controls the display unit 165 and an operation control unit that controls the operation unit 166.

通信Ｉ／Ｆ１６７は、画像生成装置２０の外部の装置との通信に用いられる。例えば、画像生成装置２０が外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ１６７に接続される。画像生成装置２０が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ１６７はアンテナを備える。バス１６８は、画像生成装置２０の各部をつないで情報を伝達する。 The communication I / F 167 is used for communication with an external device of the image generation device 20. For example, when the image generation device 20 is connected to an external device by wire, a communication cable is connected to the communication I / F 167. When the image generation device 20 has a function of wirelessly communicating with an external device, the communication I / F 167 includes an antenna. The bus 168 connects each part of the image generation device 20 to transmit information.

本実施形態では表示部１６５と操作部１６６が画像生成装置２０の内部に存在するものとするが、表示部１６５と操作部１６６との少なくとも一方が画像生成装置２０の外部に別の装置として存在していてもよい。 In the present embodiment, it is assumed that the display unit 165 and the operation unit 166 exist inside the image generation device 20, but at least one of the display unit 165 and the operation unit 166 exists as another device outside the image generation device 20. You may be doing it.

次に図２を参照して、画像処理装置１００による仮想視点画像の生成処理について説明する。図２（ｇ）は、本実施形態の画像処理装置１００（画像生成装置２０）が生成し、出力する仮想視点画像２７０を示している。以下、図２（ｇ）の仮想視点画像２７０を画像処理装置１００が生成する処理について説明する。図２（ｇ）に示した仮想視点画像２７０には、相撲を行っているシーンが映し出されている。 Next, with reference to FIG. 2, the virtual viewpoint image generation process by the image processing device 100 will be described. FIG. 2 (g) shows a virtual viewpoint image 270 generated and output by the image processing device 100 (image generation device 20) of the present embodiment. Hereinafter, the process of generating the virtual viewpoint image 270 of FIG. 2 (g) by the image processing apparatus 100 will be described. The virtual viewpoint image 270 shown in FIG. 2 (g) shows a scene of sumo wrestling.

まず、図２（ａ）において、撮像画像２１０は、撮像装置１３０が土俵２１３の周囲から被写体である力士２１１及び２１２を撮像することにより得られた画像である。また、図２（ａ）では、土俵２１３上の取り組みの様子をカメラマン２１５が土俵２１３の外側の観客席において撮像をしている。カメラマン２１５はフラッシュを用いて撮像をしている。撮像装置１３０は、フラッシュ撮像されていない状態で適正露出となるように撮像しているので、フラッシュが焚かれることにより被写体である力士２１１、２１２の画像には白飛びが生じる。 First, in FIG. 2A, the captured image 210 is an image obtained by the imaging device 130 taking images of the wrestlers 211 and 212, which are the subjects, from the periphery of the ring 213. Further, in FIG. 2A, the cameraman 215 photographs the state of the efforts on the ring 213 in the audience seats outside the ring 213. The cameraman 215 uses a flash to take an image. Since the image pickup apparatus 130 takes an image so as to have an appropriate exposure in a state where the flash image is not taken, the images of the wrestlers 211 and 212, which are the subjects, are overexposed due to the flash being fired.

図２（ｂ）に示すように、シルエット画像導出部１０２は、撮像画像２１０から被写体である力士２１１、２１２のシルエット画像２２１、２２２を導出する。また、図２（ｃ）に示すように、シルエット画像導出部１０２は、シルエット画像２２１、２２２に対応したテクスチャ画像２３１、２３２を撮像画像２１０から導出する。フラッシュの影響を受けている図２（ａ）の画像データからテクスチャ画像が生成されるため、テクスチャ画像２３１と２３２はフラッシュの影響を受けている。例えば、テクスチャ画像２３１と２３２の少なくとも一部に白飛びが生じている。テクスチャ画像は、被写体の色情報を示す画像である。 As shown in FIG. 2B, the silhouette image deriving unit 102 derives the silhouette images 221 and 222 of the wrestlers 211 and 212, which are the subjects, from the captured image 210. Further, as shown in FIG. 2C, the silhouette image deriving unit 102 derives the texture images 231 and 232 corresponding to the silhouette images 221 and 222 from the captured image 210. Since the texture image is generated from the image data of FIG. 2A that is affected by the flash, the texture images 231 and 232 are affected by the flash. For example, overexposure occurs in at least a part of the texture images 231 and 232. The texture image is an image showing the color information of the subject.

次に、図２（ｄ）に示すように、３次元形状導出部１０４は、図２（ｂ）に示すようなシルエット画像２２１，２２２から、３次元形状情報２４１、２４２を導出する。被写体の周囲に配置された複数の撮像装置１３０が実質的に同時に撮像することにより得られた複数の撮像画像のそれぞれから導出された複数のシルエット画像が３次元形状導出部１０４に入力される。３次元形状導出部１０４は、それら複数のシルエット画像に基づき、視体積交差法等を用いて図２（ｄ）に示した３次元形状情報２４１，２４２を導出する。 Next, as shown in FIG. 2 (d), the three-dimensional shape deriving unit 104 derives the three-dimensional shape information 241 and 242 from the silhouette images 221 and 222 as shown in FIG. 2 (b). A plurality of silhouette images derived from each of the plurality of captured images obtained by the plurality of imaging devices 130 arranged around the subject substantially simultaneously imaging are input to the three-dimensional shape deriving unit 104. The three-dimensional shape derivation unit 104 derives the three-dimensional shape information 241,242 shown in FIG. 2D by using the visual volume crossing method or the like based on the plurality of silhouette images.

次に、仮想視点画像生成部１０５が仮想視点画像を生成する。本実施形態では、背景モデルと背景テクスチャは予め背景データ保存部１０７に保存されている。図２（ｅ）に予め保存されている土俵の背景テクスチャ２５３を示す。背景データ保存部１０７に予め保存されている土俵２１３の背景テクスチャ２５３はフラッシュの影響を受けていない。フラッシュの影響を受けている被写体の力士のテクスチャ画像２３１，２３２と、フラッシュの影響を受けていない背景テクスチャ２５３を用いてレンダリングを行うと、前景のみがフラッシュの影響を受けた不自然な仮想視点画像になってしまう。そこで、本実施形態の画像処理装置１００（画像生成装置２０）では、フラッシュ情報処理部１０６が、予め保存されている背景テクスチャ２５３を補正し、背景テクスチャにフラッシュの影響を再現する。 Next, the virtual viewpoint image generation unit 105 generates a virtual viewpoint image. In the present embodiment, the background model and the background texture are stored in the background data storage unit 107 in advance. FIG. 2 (e) shows the background texture 253 of the ring stored in advance. The background texture 253 of the ring 213 stored in advance in the background data storage unit 107 is not affected by the flash. When rendering is performed using the texture images 231 and 232 of the wrestlers of the subject affected by the flash and the background texture 253 not affected by the flash, only the foreground is an unnatural virtual viewpoint affected by the flash. It becomes an image. Therefore, in the image processing device 100 (image generation device 20) of the present embodiment, the flash information processing unit 106 corrects the background texture 253 stored in advance and reproduces the influence of the flash on the background texture.

フラッシュ情報取得部１０３は、撮像した画像からフラッシュ発光の有無を検知し、その検知結果をフラッシュ情報処理部１０６に通知する。フラッシュ情報処理部１０６は、フラッシュ発光が検知されたフレームに関して、フラッシュを再現することにより背景テクスチャを補正する。図２（ｆ）の背景テクスチャ２６３は、背景テクスチャ２５３に対してフラッシュの影響が再現されたテクスチャを表す。仮想視点画像生成部１０５は、前景のテクスチャ画像２３１、２３２と、３次元形状情報２４１、２４２と、補正された背景テクスチャ２６３と、背景モデルとを用いてレンダリングを行い、図２（ｇ）に示す仮想視点画像２７０を生成する。こうして、前景の画像と背景の画像がともにフラッシュの影響を受けた画像となるため、より自然な仮想視点画像が得られる。 The flash information acquisition unit 103 detects the presence or absence of flash light emission from the captured image, and notifies the flash information processing unit 106 of the detection result. The flash information processing unit 106 corrects the background texture by reproducing the flash for the frame in which the flash emission is detected. The background texture 263 in FIG. 2 (f) represents a texture in which the influence of the flash is reproduced on the background texture 253. The virtual viewpoint image generation unit 105 renders using the foreground texture images 231 and 232, the three-dimensional shape information 241, 242, the corrected background texture 263, and the background model, and is shown in FIG. 2 (g). The virtual viewpoint image 270 shown is generated. In this way, both the foreground image and the background image become images affected by the flash, so that a more natural virtual viewpoint image can be obtained.

次に、フラッシュ情報取得部１０３がフラッシュ情報を取得する処理とフラッシュ情報処理部１０６がフラッシュ情報に基づいて背景テクスチャを補正する処理について、図３を用いて説明する。 Next, a process in which the flash information acquisition unit 103 acquires flash information and a process in which the flash information processing unit 106 corrects the background texture based on the flash information will be described with reference to FIG.

図３（ａ）、（ｃ）、（ｅ）は、撮像装置１３０が撮像した画像である。図３（ａ）、（ｃ）、（ｅ）は、連続したフレームの撮像画像であり、それぞれ時刻Ｔ＝１００、Ｔ＝１０１、Ｔ＝１０２における撮像画像である。また、それぞれのフレームについてフラッシュ情報取得部１０３が導出したヒストグラムを図３（ｂ）、（ｄ）、（ｆ）に示す。図３（ｂ）、（ｄ）、（ｆ）のヒストグラムでは、横軸に画素の輝度、縦軸に画素数を示している。本実施形態では、輝度値の地域は８ｂｉｔ（０～２５５）である。 3 (a), (c), and (e) are images taken by the image pickup apparatus 130. 3A, 3C, and 3E are captured images of continuous frames, which are captured images at time T = 100, T = 101, and T = 102, respectively. Further, the histograms derived by the flash information acquisition unit 103 for each frame are shown in FIGS. 3 (b), (d), and (f). In the histograms of FIGS. 3B, 3D, and 3F, the horizontal axis shows the brightness of pixels and the vertical axis shows the number of pixels. In the present embodiment, the area of the luminance value is 8 bits (0 to 255).

図３（ａ）の撮像画像３１０には、被写体である力士３１１、３１２が土俵３１３の上で取り組みを行っているシーンが映し出されている。その取り組みの様子をカメラマン３１５が土俵３１３の外側の観客席において撮像をしているとする。時刻Ｔ＝１００の時点では、カメラマン３１５の持つカメラのフラッシュは発光していない。図３（ｂ）は、図３（ａ）の撮像画像３１０のヒストグラム３４０を示す。ヒストグラム３４０では、最頻値が中央付近の１２０に位置しており、適切な露出になっていることが分かる。 The captured image 310 of FIG. 3A shows a scene in which the wrestlers 311 and 312, who are the subjects, are working on the ring 313. It is assumed that the cameraman 315 is taking an image of the effort in the audience seats outside the ring 313. At time T = 100, the camera flash of the cameraman 315 is not firing. FIG. 3B shows a histogram 340 of the captured image 310 of FIG. 3A. In the histogram 340, it can be seen that the mode is located at 120 near the center and the exposure is appropriate.

次に、時刻Ｔ＝１０１の時点でカメラマン３１５の持つカメラのフラッシュが発光したとする。図３（ｃ）のフレーム（撮像画像３２０）では、フラッシュの影響により、力士３１１、３１２及び土俵３１３に白飛びが発生している。この場合、図３（ｄ）に示されるように、撮像画像３２０のヒストグラム３６０では、最頻値が輝度の上限に近い２４０に位置し、露出過度の状態になっていることが分かる。 Next, it is assumed that the flash of the camera possessed by the cameraman 315 fires at the time T = 101. In the frame (captured image 320) of FIG. 3C, overexposure occurs in the wrestlers 311 and 312 and the ring 313 due to the influence of the flash. In this case, as shown in FIG. 3D, in the histogram 360 of the captured image 320, it can be seen that the mode is located at 240, which is close to the upper limit of the luminance, and is in an overexposed state.

次に、時刻Ｔ＝１０２の時点では、カメラマン３１５の持つカメラのフラッシュの発光が終わっている状態となる。この場合、図３（ｅ）に示されるように撮像画像３３０が得られ、そのヒストグラム３８０では、図３（ｆ）に示されるように、最頻値が中央付近の１００に戻り、適切な露出になっている。 Next, at time T = 102, the flash of the camera possessed by the cameraman 315 has finished firing. In this case, the captured image 330 is obtained as shown in FIG. 3 (e), and in the histogram 380, the mode returns to 100 near the center as shown in FIG. 3 (f), and the appropriate exposure is obtained. It has become.

フラッシュ情報取得部１０３は、撮像装置１３０から得られる撮像画像から各画素の輝度の平均値を導出し、導出した平均値に基づいてフラッシュ発光の有無を判断する。例えば、フラッシュ情報取得部１０３は、判断の対象である現フレームよりも前の１０フレームの輝度値の平均値と、現フレームの輝度値との差が閾値以上あった場合に、フラッシュが発光したと判断する。フラッシュ発光を検知した場合、フラッシュ情報取得部１０３は、フラッシュ発光有りを通知する情報に加えて、その時の撮像画像から得られる輝度値のヒストグラムを含むフラッシュ情報をフラッシュ情報処理部１０６に送る。なお、この輝度値のヒストグラムは、フラッシュの強度を表す情報の一例である。本実施形態のフラッシュ情報取得部１０３は、現フレームの前１０フレーム分の平均値と比較する例を示したが、１０フレームに限定されるものではなく、他のフレーム数でも良いことは明らかである。フラッシュ発光を検知しなかった場合、フラッシュ情報取得部１０３は、フラッシュ発光無しを通知する情報を含むフラッシュ情報をフラッシュ情報処理部１０６に送る。 The flash information acquisition unit 103 derives an average value of the brightness of each pixel from the captured image obtained from the image pickup device 130, and determines the presence or absence of flash light emission based on the derived average value. For example, the flash information acquisition unit 103 fires the flash when the difference between the average value of the brightness values of the 10 frames before the current frame to be determined and the brightness value of the current frame is equal to or more than the threshold value. Judge. When the flash light emission is detected, the flash information acquisition unit 103 sends the flash information including the histogram of the luminance value obtained from the captured image at that time to the flash information processing unit 106 in addition to the information notifying that the flash light emission is present. The histogram of the luminance value is an example of information indicating the intensity of the flash. The flash information acquisition unit 103 of the present embodiment has shown an example of comparison with the average value of the previous 10 frames of the current frame, but it is clear that the number of frames is not limited to 10 and may be another number of frames. be. When the flash light emission is not detected, the flash information acquisition unit 103 sends flash information including information notifying that there is no flash light emission to the flash information processing unit 106.

図５にフラッシュ情報取得部１０３が算出する各フレームの輝度値（平均輝度値）の例を示す。フラッシュ情報取得部１０３は、時刻Ｔ＝０からフレーム内の平均輝度（フレーム内平均輝度値）の算出を開始する。フラッシュ情報取得部１０３は、Ｔ＝０のフレーム内平均輝度値を１０１と算出する。さらにフラッシュ情報取得部１０３は、現フレームよりも前の複数フレーム分の輝度の平均値（以下、フレーム間平均輝度値）を算出する。本実施形態のフラッシュ情報取得部１０３は、現フレーム（Ｔ＝ｎ）より前の１０フレーム分（Ｔ＝ｎ－１～ｎ－１０）の平均輝度を算出し、フレーム間平均輝度値を算出する。なお、Ｔ＝０では、それよりも前のフレームが存在しないので、現フレームの平均輝度をフレーム間平均輝度値とする。 FIG. 5 shows an example of the luminance value (average luminance value) of each frame calculated by the flash information acquisition unit 103. The flash information acquisition unit 103 starts calculating the average luminance in the frame (average luminance value in the frame) from time T = 0. The flash information acquisition unit 103 calculates the in-frame average luminance value of T = 0 as 101. Further, the flash information acquisition unit 103 calculates the average value of the luminance for a plurality of frames before the current frame (hereinafter, the average luminance value between frames). The flash information acquisition unit 103 of the present embodiment calculates the average luminance for 10 frames (T = n-1 to n-10) before the current frame (T = n), and calculates the average luminance value between frames. .. When T = 0, there is no frame before that, so the average luminance of the current frame is used as the average luminance value between frames.

時刻Ｔ＝１から９９までの説明は省略するが、フラッシュ情報取得部１０３は現フレームのフレーム内平均輝度値を算出して、さらに現フレームより前の１０フレームからフレーム間平均輝度値を算出する。時刻Ｔ＝１００において、フラッシュ情報取得部１０３は、フレーム内平均輝度値＝１０１、フレーム間平均輝度値＝１００を得る。時刻Ｔ＝１０１においては、図３（ｃ）で示したようにフラッシュが発光しており、図３（ｄ）で示したようにそのヒストグラム３６０では、最頻値は輝度の上限に近い２４０に位置し、露出過度の状態になっている。この状態で、フラッシュ情報取得部１０３は、フレーム内平均輝度値＝２００、フレーム間平均輝度値＝１００を得ている。このように、フラッシュの影響を受けた撮像画像では、フレーム内平均輝度値とフレーム間平均輝度値との差が大きくなる。フラッシュ情報取得部１０３は、このような現象を利用してフラッシュ発光の有無を判断する。 Although the description from time T = 1 to 99 is omitted, the flash information acquisition unit 103 calculates the in-frame average luminance value of the current frame, and further calculates the inter-frame average luminance value from the 10 frames before the current frame. .. At time T = 100, the flash information acquisition unit 103 obtains an in-frame average luminance value = 101 and an inter-frame average luminance value = 100. At time T = 101, the flash fires as shown in FIG. 3 (c), and in the histogram 360 as shown in FIG. 3 (d), the mode is 240, which is close to the upper limit of luminance. It is located and is overexposed. In this state, the flash information acquisition unit 103 obtains an in-frame average luminance value = 200 and an inter-frame average luminance value = 100. As described above, in the captured image affected by the flash, the difference between the in-frame average brightness value and the inter-frame average brightness value becomes large. The flash information acquisition unit 103 determines the presence or absence of flash light emission by utilizing such a phenomenon.

図４は、フラッシュ情報取得部１０３の処理を示すフローチャートである。以下、フラッシュ情報取得部１０３の処理を、図４に示されるフローチャートと、図３及び図５に示した画像及び輝度値の例を用いて説明する。なお、説明を簡単にするために、時刻Ｔ＝０から９９までの説明を省略し、時刻Ｔ＝１００の処理から説明を行う。時刻Ｔ＝０から９９においても、フラッシュ情報取得部１０３は図４に示される処理を行っている。 FIG. 4 is a flowchart showing the processing of the flash information acquisition unit 103. Hereinafter, the processing of the flash information acquisition unit 103 will be described with reference to the flowchart shown in FIG. 4 and examples of the images and luminance values shown in FIGS. 3 and 5. In addition, in order to simplify the explanation, the explanation from time T = 0 to 99 is omitted, and the explanation starts from the process of time T = 100. Even at times T = 0 to 99, the flash information acquisition unit 103 is performing the process shown in FIG.

画像処理装置１００が起動し、撮像装置１３０による撮像動作が開始すると、ステップＳ４０１において、フラッシュ情報取得部１０３は、撮像装置１３０から撮像画像を取得する。時刻Ｔ＝１００では、撮像装置１３０から図３（ａ）に示すような撮像画像３１０が取得される。次に、ステップＳ４０２において、フラッシュ情報取得部１０３はステップＳ４０１で取得された撮像画像からフレーム内平均輝度値を導出する。時刻Ｔ＝１００では、図３（ｂ）、図５に示されるように、フレーム内平均輝度値＝１０１が取得される。次に、ステップＳ４０３において、フラッシュ情報取得部１０３は、フレーム間平均輝度値を導出する。上述したように、本実施形態では、現フレームより１つ前のフレームから１０フレーム前までの１０個のフレームのフレーム内平均輝度値の平均値が、フレーム間平均輝度値として算出される。ここでは、時刻Ｔ＝１００の１０フレーム前のＴ＝９０からＴ＝９９までは、時刻Ｔ＝１００と同じようなフレームが続いていて、図５に示すようにフラッシュ情報取得部１０３は、１０フレーム分のフレーム間平均輝度を１００と導出したとする。 When the image processing device 100 is activated and the image pickup operation by the image pickup device 130 is started, in step S401, the flash information acquisition unit 103 acquires the captured image from the image pickup device 130. At time T = 100, the captured image 310 as shown in FIG. 3A is acquired from the imaging device 130. Next, in step S402, the flash information acquisition unit 103 derives the in-frame average luminance value from the captured image acquired in step S401. At time T = 100, as shown in FIGS. 3 (b) and 5, the in-frame average luminance value = 101 is acquired. Next, in step S403, the flash information acquisition unit 103 derives the inter-frame average luminance value. As described above, in the present embodiment, the average value of the in-frame average luminance values of the 10 frames from the frame immediately before the current frame to 10 frames before is calculated as the interframe average luminance value. Here, from T = 90 to T = 99, which is 10 frames before the time T = 100, frames similar to the time T = 100 continue, and as shown in FIG. 5, the flash information acquisition unit 103 has 10 frames. It is assumed that the average brightness between frames for each frame is derived as 100.

次に、ステップＳ４０４において、フラッシュ情報取得部１０３は、照明条件に所定の基準を超える変化が生じたか否かを判断する。本例では、撮像画像における輝度の変化が閾値以上であるか否かが判断される。輝度の変化が閾値以上である場合は、フラッシュが発光したと判断され、処理はステップＳ４０５へ進む。輝度の変化が閾値未満の場合は、フラッシュは発光していないと判断され、処理はステップＳ４０７へ進む。より具体的には、フラッシュ情報取得部１０３は、ステップＳ４０２で算出された現フレームのフレーム内平均輝度値と、ステップＳ４０３で算出された前フレームまでのフレーム間平均輝度値とを比較する。フラッシュ情報取得部１０３は、フレーム内平均輝度値とフレーム間平均輝度値との差が所定の閾値以上であった場合に、撮像画像における輝度の変化が閾値以上であると判断する。例えば、フレーム内平均輝度値とフレーム間平均輝度値の差が５０以上であった場合に、輝度の変化が大きいと判断され、フラッシュが発光したと判断される。もちろん、閾値はこの例に限定されるものではない。時刻Ｔ＝１００においては、現フレームのフレーム内平均輝度値が１０１、前フレームまでのフレーム間平均輝度値が１００であり、両者の差は１である。差が５０以下であることからフラッシュは発光していないと判断され、処理はステップＳ４０７に遷移する。ステップＳ４０７において、フラッシュ情報取得部１０３は、「フラッシュ発光無し」を示すフラッシュ情報をフラッシュ情報処理部１０６に送る。 Next, in step S404, the flash information acquisition unit 103 determines whether or not a change exceeding a predetermined reference has occurred in the lighting conditions. In this example, it is determined whether or not the change in brightness in the captured image is equal to or greater than the threshold value. If the change in luminance is equal to or greater than the threshold value, it is determined that the flash has fired, and the process proceeds to step S405. If the change in luminance is less than the threshold value, it is determined that the flash is not emitting light, and the process proceeds to step S407. More specifically, the flash information acquisition unit 103 compares the in-frame average luminance value of the current frame calculated in step S402 with the inter-frame luminance value calculated in step S403 up to the previous frame. The flash information acquisition unit 103 determines that the change in luminance in the captured image is equal to or greater than the threshold when the difference between the average luminance value in the frame and the average luminance value between frames is equal to or greater than a predetermined threshold value. For example, when the difference between the average luminance value in the frame and the average luminance value between frames is 50 or more, it is determined that the change in luminance is large, and it is determined that the flash fires. Of course, the threshold is not limited to this example. At time T = 100, the average in-frame luminance value of the current frame is 101, the average luminance value between frames up to the previous frame is 100, and the difference between the two is 1. Since the difference is 50 or less, it is determined that the flash is not emitting light, and the process proceeds to step S407. In step S407, the flash information acquisition unit 103 sends flash information indicating “no flash emission” to the flash information processing unit 106.

次に、ステップＳ４０６において、フラッシュ情報取得部１０３はすべてのフレームの処理が完了したか否かを判断する。ステップＳ４０６で全てのフレームの処理が完了したと判断された場合、或いは、撮像装置１３０による撮像の終了が指示された場合、本処理は終了する（ステップＳ４０６でＹＥＳ）。本例の場合、時刻Ｔ＝１００ではまだすべてのフレームの処理が完了しておらず（ステップＳ４０６でＮＯ）、処理はステップＳ４０１に遷移する。 Next, in step S406, the flash information acquisition unit 103 determines whether or not the processing of all frames has been completed. When it is determined in step S406 that the processing of all frames is completed, or when the end of imaging by the image pickup apparatus 130 is instructed, this processing ends (YES in step S406). In the case of this example, at time T = 100, the processing of all frames has not been completed yet (NO in step S406), and the processing transitions to step S401.

続いて、フラッシュ情報取得部１０３は、ステップＳ４０１において、撮像装置１３０から、時刻Ｔ＝１０１における撮像画像を取得する。フラッシュ情報取得部１０３は、時刻Ｔ＝１０１において、画像入力部１０１を介して撮像画像３２０（図３（ｃ））を取得する。ステップＳ４０２において、フラッシュ情報取得部１０３は、撮像画像３２０からフレーム内平均輝度値を導出する。前述のように、図３（ｃ）ではフラッシュが発光しており、そのヒストグラム３６０（図３（ｄ））では、最頻値は輝度の上限に近い２４０に位置し、露出過度の状態になっている。この例では、時刻Ｔ＝１０１の撮像画像３２０について、現フレームのフレーム内平均輝度値＝２００が導出される。次に、ステップＳ４０３において、フラッシュ情報取得部１０３は、フレーム間平均輝度値を導出する。上述したように、現フレームの１つ前のフレームから１０フレーム前までの輝度値の平均値が導出される。本例では、時刻Ｔ＝１０１の１０フレーム前のＴ＝９１からＴ＝１００までのフレーム間平均輝度値＝１００が導出される。 Subsequently, in step S401, the flash information acquisition unit 103 acquires an image captured at time T = 101 from the image pickup device 130. The flash information acquisition unit 103 acquires the captured image 320 (FIG. 3 (c)) via the image input unit 101 at time T = 101. In step S402, the flash information acquisition unit 103 derives the in-frame average luminance value from the captured image 320. As described above, in FIG. 3 (c), the flash is emitted, and in the histogram 360 (FIG. 3 (d)), the mode is located at 240, which is close to the upper limit of the luminance, and is in an overexposed state. ing. In this example, for the captured image 320 at time T = 101, the in-frame average luminance value = 200 of the current frame is derived. Next, in step S403, the flash information acquisition unit 103 derives the inter-frame average luminance value. As described above, the average value of the luminance values from the frame immediately before the current frame to 10 frames before is derived. In this example, the inter-frame average luminance value = 100 from T = 91 to T = 100 10 frames before the time T = 101 is derived.

次に、ステップＳ４０４において、フラッシュ情報取得部１０３は、ステップＳ４０２で取得されたフレーム内平均輝度値と、ステップＳ４０３で取得されたフレーム間平均輝度値を比較する。時刻Ｔ＝１０１の場合には、フレーム内平均輝度値が２００、フレーム間平均輝度値が１００となるので、両者の差は１００となる。これは、閾値（＝５０）よりも大きいため、処理はステップＳ４０５に遷移する。ステップＳ４０５において、フラッシュ情報取得部１０３は、フラッシュが発光していると判断し、フラッシュ発光有りを示す情報とフラッシュの強度を表す情報（ヒストグラム）を含むフラッシュ情報をフラッシュ情報処理部１０６に送る。 Next, in step S404, the flash information acquisition unit 103 compares the in-frame average luminance value acquired in step S402 with the interframe average luminance value acquired in step S403. When the time T = 101, the average luminance value in the frame is 200 and the average luminance value between frames is 100, so that the difference between the two is 100. Since this is larger than the threshold value (= 50), the process proceeds to step S405. In step S405, the flash information acquisition unit 103 determines that the flash is emitting light, and sends flash information including information indicating that the flash is emitted and information (histogram) indicating the intensity of the flash to the flash information processing unit 106.

次に、ステップＳ４０６において、フラッシュ情報取得部１０３はすべてのフレームの処理が完了したか否かを判断する。時刻Ｔ＝１０１ではまだすべてのフレームの処理が完了していないので、処理はステップＳ４０１に遷移する。 Next, in step S406, the flash information acquisition unit 103 determines whether or not the processing of all frames has been completed. At time T = 101, the processing of all frames has not been completed yet, so the processing proceeds to step S401.

ステップＳ４０１において、フラッシュ情報取得部１０３は、撮像装置１３０から時刻Ｔ＝１０２における撮像画像を取得する。フラッシュ情報取得部１０３は、時刻Ｔ＝１０２において、撮像装置１３０から図３（ｅ）に示すような撮像画像３３０を取得する。そして、ステップＳ４０２において、フラッシュ情報取得部１０３は、撮像画像３３０からフレーム内平均輝度値を導出する。前述のように、撮像画像３３０の撮影時刻であるＴ＝１０２では、前フレームで発光していたフラッシュはすでに発光しておらず、図３（ｆ）に示されるようにフレーム内平均輝度値は１００になっている。次に、ステップＳ４０３において、フラッシュ情報取得部１０３は、フレーム間平均輝度値を導出する。時刻Ｔ＝１０２の１０フレーム前のＴ＝９２のフレームからＴ＝１０１のフレームまでの輝度値の平均（＝１１０）がフレーム間輝度値として導出される。 In step S401, the flash information acquisition unit 103 acquires an image captured at time T = 102 from the image pickup device 130. The flash information acquisition unit 103 acquires an image captured image 330 as shown in FIG. 3 (e) from the image pickup apparatus 130 at time T = 102. Then, in step S402, the flash information acquisition unit 103 derives the in-frame average luminance value from the captured image 330. As described above, at T = 102, which is the shooting time of the captured image 330, the flash emitted in the previous frame is no longer emitted, and the average luminance value in the frame is as shown in FIG. 3 (f). It is 100. Next, in step S403, the flash information acquisition unit 103 derives the average luminance value between frames. The average (= 110) of the luminance values from the frame of T = 92 10 frames before the time T = 102 to the frame of T = 101 is derived as the interframe luminance value.

次に、ステップＳ４０４において、フラッシュ情報取得部１０３は、ステップＳ４０２で導出された現フレームのフレーム内平均輝度値と、ステップＳ４０３で導出された前フレームまでのフレーム間平均輝度値とを比較する。本例では、時刻Ｔ＝１０２の撮像画像３３０に関して、フレーム内平均輝度値が１００、フレーム間平均輝度値が１１０であり、両者の差は１０（＜閾値＝５０）となる。従って、処理はステップＳ４０７へ遷移し、フラッシュ情報取得部１０３は、フラッシュは発光していないと判断し、「フラッシュ発光無し」を示すフラッシュ情報を、フラッシュ情報処理部１０６に送信する。 Next, in step S404, the flash information acquisition unit 103 compares the in-frame average luminance value of the current frame derived in step S402 with the interframe average luminance value up to the previous frame derived in step S403. In this example, the average luminance value in the frame is 100 and the average luminance value between frames is 110 for the captured image 330 at time T = 102, and the difference between the two is 10 (<threshold value = 50). Therefore, the process proceeds to step S407, and the flash information acquisition unit 103 determines that the flash is not emitting light, and transmits the flash information indicating "no flash emission" to the flash information processing unit 106.

このようにして前述の処理によりフラッシュ情報取得部１０３は、画像データからフラッシュ発光の有無を判断し、フラッシュの情報としてヒストグラムを導出し、フラッシュ情報処理部１０６に対し送信することができる。なお、上記の例では、フラッシュの発光期間がＴ＝１０１の１フレーム期間としているが、フラッシュの発光期間が複数の連続するフレームにまたがってもよい。フレーム内平均輝度値とフレーム間平均輝度値との差を用いていることにより、フラッシュ発光が存在する複数の連続するフレームを検知することができる。例えば、図５において、Ｔ＝１０１とＴ＝１０２におけるフレーム内平均輝度値が２００であったとしても、Ｔ＝１０１とＴ＝１０２で算出される平均輝度値は１００と１１０になる。よって、Ｔ＝１０２においても、フレーム内平均輝度値とフレーム間平均輝度値との差は９０＞閾値（５０）であり、フラッシュ発光が検知されることになる。 In this way, the flash information acquisition unit 103 can determine the presence or absence of flash light emission from the image data, derive a histogram as flash information, and transmit it to the flash information processing unit 106 by the above-mentioned processing. In the above example, the flash emission period is one frame period of T = 101, but the flash emission period may span a plurality of consecutive frames. By using the difference between the average luminance value in the frame and the average luminance value between frames, it is possible to detect a plurality of consecutive frames in which flash emission exists. For example, in FIG. 5, even if the in-frame average luminance value at T = 101 and T = 102 is 200, the average luminance values calculated at T = 101 and T = 102 are 100 and 110. Therefore, even at T = 102, the difference between the average luminance value in the frame and the average luminance value between frames is 90> threshold value (50), and the flash emission is detected.

次に、フラッシュ情報取得部１０３からフラッシュ情報を受信したフラッシュ情報処理部１０６の動作を説明する。図６は、フラッシュ情報処理部１０６の処理を示すフローチャートである。なお、以下では、説明を簡単にするために、時刻Ｔ＝０から９９の撮像画像に関する処理の説明は省略し、時刻Ｔ＝１００以降の撮像画像の処理を説明する。時刻Ｔ＝０から９９の撮像画像についても、以下に説明する処理が行われる。 Next, the operation of the flash information processing unit 106 that has received the flash information from the flash information acquisition unit 103 will be described. FIG. 6 is a flowchart showing the processing of the flash information processing unit 106. In the following, for the sake of simplicity, the description of the processing related to the captured image at time T = 0 to 99 will be omitted, and the processing of the captured image after time T = 100 will be described. The processing described below is also performed on the captured images from time T = 0 to 99.

ステップＳ６０１において、フラッシュ情報処理部１０６は、フラッシュ情報取得部１０３からフラッシュ情報を取得する。次に、ステップＳ６０２において、フラッシュ情報処理部１０６は、ステップＳ６０１で受信したフラッシュ情報からフラッシュ発光の有無を確認する。時刻Ｔ＝１００の撮像画像に関しては、フラッシュ情報により「フラッシュ発光無し」が通知される。「フラッシュ発光無し」が確認されると、処理はステップＳ６０４に遷移する。ステップＳ６０４において、フラッシュ情報処理部１０６は、仮想視点画像生成部１０５に、背景データ保存部１０７に保存されている背景テクスチャをそのまま送る。こうして、フラッシュが発光していないと判断された場合には、フラッシュ情報処理部１０６はフラッシュの再現処理はせずに、背景テクスチャをそのまま仮想視点画像生成部１０５に提供する。 In step S601, the flash information processing unit 106 acquires flash information from the flash information acquisition unit 103. Next, in step S602, the flash information processing unit 106 confirms the presence or absence of flash light emission from the flash information received in step S601. For the captured image at time T = 100, "no flash emission" is notified by the flash information. When "no flash emission" is confirmed, the process proceeds to step S604. In step S604, the flash information processing unit 106 sends the background texture stored in the background data storage unit 107 to the virtual viewpoint image generation unit 105 as it is. In this way, when it is determined that the flash is not emitting light, the flash information processing unit 106 does not perform the flash reproduction process, but provides the background texture to the virtual viewpoint image generation unit 105 as it is.

以上により、フラッシュが発光していない場合には、仮想視点画像生成部１０５により、フラッシュ再現処理をされていない仮想視点画像２８０（図２（ｈ））がレンダリングされる。仮想視点画像生成部１０５がレンダリングする図２（ｈ）の土俵２８３のレンダリングには、予め保存されている土俵２８３のテクスチャ（背景テクスチャ）が用いられる。 As described above, when the flash does not emit light, the virtual viewpoint image generation unit 105 renders the virtual viewpoint image 280 (FIG. 2 (h)) that has not been subjected to the flash reproduction process. The texture (background texture) of the ring 283 stored in advance is used for rendering the ring 283 of FIG. 2 (h) rendered by the virtual viewpoint image generation unit 105.

ステップＳ６０５において、フラッシュ情報処理部１０６は、すべてのフレームの処理が完了したか否かを判断する。時刻Ｔ＝１００ではすべてのフレームの処理が完了していないので、処理はステップＳ６０１に遷移する。 In step S605, the flash information processing unit 106 determines whether or not the processing of all the frames is completed. Since the processing of all frames is not completed at time T = 100, the processing proceeds to step S601.

続いて、フラッシュ情報処理部１０６は、時刻Ｔ＝１０１の撮像画像３２０について処理を行う。ステップＳ６０１において、フラッシュ情報処理部１０６はフラッシュ情報取得部１０３からフラッシュ情報を取得する。次に、ステップＳ６０２において、フラッシュ情報処理部１０６は、受信したフラッシュ情報からフラッシュ発光の有無を確認する。時刻Ｔ＝１０１のフラッシュ情報は、フラッシュが発光していることを通知している。従って、処理はステップＳ６０３に遷移する。 Subsequently, the flash information processing unit 106 processes the captured image 320 at time T = 101. In step S601, the flash information processing unit 106 acquires flash information from the flash information acquisition unit 103. Next, in step S602, the flash information processing unit 106 confirms the presence or absence of flash light emission from the received flash information. The flash information at time T = 101 notifies that the flash is firing. Therefore, the process transitions to step S603.

ステップＳ６０３において、フラッシュ情報処理部１０６は、変化が検出されたとき（フラッシュ発光有りが検出されたとき）の撮像画像の輝度の特性に近づくように背景のテクスチャを調整（補正）する。なお、背景のテクスチャが、撮像画像の輝度の特性と完全に一致するように調整されなくてもよい。仮想視点画像上でのフラッシュによる前景と背景との違和感を低減することができればよい。本実施形態では、フラッシュ情報処理部１０６は、フラッシュ情報に付されているフラッシュの強度を表す情報（輝度のヒストグラム）に基づいて、背景テクスチャに対してフラッシュ再現処理を行う。前述のようにフラッシュ情報取得部１０３はフラッシュが発光した際に、ヒストグラムをフラッシュ情報に加えてフラッシュ情報処理部１０６に送っている。フラッシュ情報処理部１０６は、背景データ保存部１０７に保存されている、土俵の背景テクスチャ２５３の輝度を、フラッシュ情報に含まれているヒストグラムに基づいて補正する。この補正には、例えば、ヒストグラムマッチング補正を適用することができる。ヒストグラムマッチング補正とは、ヒストグラムの形状を合わせることで画像の輝度やコントラストを同一にする画像処理手法である。フラッシュ情報処理部１０６は、送信されたフラッシュ情報（ヒストグラム）に基づいて、予め保存されていた土俵の背景テクスチャ２５３に対して輝度補正をすることで、図２（ｆ）に示す補正された背景テクスチャ２６３を導出する。図２（ｆ）は、フラッシュが発光した際の撮像画像のヒストグラムに基づいてヒストグラムマッチング補正が施されているので、フラッシュが発光した際の土俵が再現され得る。 In step S603, the flash information processing unit 106 adjusts (corrects) the background texture so as to approach the luminance characteristics of the captured image when a change is detected (when the presence or absence of flash emission is detected). It should be noted that the background texture does not have to be adjusted to perfectly match the luminance characteristics of the captured image. It suffices if it is possible to reduce the discomfort between the foreground and the background due to the flash on the virtual viewpoint image. In the present embodiment, the flash information processing unit 106 performs flash reproduction processing on the background texture based on the information (brightness histogram) indicating the intensity of the flash attached to the flash information. As described above, when the flash fires, the flash information acquisition unit 103 adds a histogram to the flash information and sends it to the flash information processing unit 106. The flash information processing unit 106 corrects the brightness of the background texture 253 of the ring stored in the background data storage unit 107 based on the histogram included in the flash information. For example, a histogram matching correction can be applied to this correction. Histogram matching correction is an image processing method that makes the brightness and contrast of an image the same by matching the shapes of the histograms. The flash information processing unit 106 corrects the brightness of the background texture 253 of the ring stored in advance based on the transmitted flash information (histogram), thereby correcting the corrected background shown in FIG. 2 (f). Derivation of texture 263. In FIG. 2 (f), since the histogram matching correction is performed based on the histogram of the captured image when the flash fires, the ring when the flash fires can be reproduced.

仮想視点画像生成部１０５は、フラッシュ再現処理により補正された、土俵の背景テクスチャ２６３（図２（ｆ））を用いて、仮想視点画像をレンダリングし、仮想視点画像２７０（図２（ｇ））を生成する。この処理により、前述したように、前景のみがフラッシュの影響を受けている不自然な仮想視点画像の生成を抑制することができる。次に、処理はステップＳ６０５に遷移し、フラッシュ情報処理部１０６はすべてのフレームの処理が完了したか否かを判断する。時刻Ｔ＝１０１ではまだすべてのフレームの処理が完了していないので、処理はステップＳ６０１に遷移する。時刻Ｔ＝１０２の処理の説明は省略するが、フラッシュは発光していないので時刻Ｔ＝１００と同様にフラッシュ再現処理は実施されない。 The virtual viewpoint image generation unit 105 renders a virtual viewpoint image using the background texture 263 (FIG. 2 (f)) of the ring corrected by the flash reproduction process, and the virtual viewpoint image 270 (FIG. 2 (g)). To generate. By this process, as described above, it is possible to suppress the generation of an unnatural virtual viewpoint image in which only the foreground is affected by the flash. Next, the processing proceeds to step S605, and the flash information processing unit 106 determines whether or not the processing of all the frames is completed. At time T = 101, the processing of all frames has not been completed yet, so the processing proceeds to step S601. Although the description of the process at time T = 102 is omitted, since the flash does not emit light, the flash reproduction process is not performed as at time T = 100.

以上のような処理により、前景のみがフラッシュの影響を受けている不自然な仮想視点画像の生成を抑制することができ、視聴者に臨場感を損なうことなく違和感を与えない仮想視点画像の提供が可能となる。 By the above processing, it is possible to suppress the generation of an unnatural virtual viewpoint image in which only the foreground is affected by the flash, and the provision of a virtual viewpoint image that does not impair the viewer's sense of presence and does not give a sense of discomfort. Is possible.

なお、本実施形態では背景のテクスチャを補正する例を示したが、これに限られるものではない。違和感を与えない仮想視点画像を提供するために、撮像空間の照明条件に所定の基準を超える変化が生じた場合に、仮想視点画像の生成に用いられる前景と背景のテクスチャの照明条件を近づけるように、それらテクスチャの少なくとも一方が補正されればよい。したがって、例えば、前景のテクスチャを補正してフラッシュの影響を軽減する処理が行われても良い。例えば、フラッシュが発光していない前フレームの前景テクスチャのヒストグラムに基づき、フラッシュが発光した前景テクスチャを補正しても良い。但し、フラッシュ情報取得部１０３は、フラッシュ発光無しを検知した際の前景テクスチャのヒストグラムをフラッシュ情報処理部１０６へ送信する。なお、前景テクスチャのヒストグラムは、例えば、フラッシュ発光無しが検知されるごとに、或いは、一連の撮像における任意のタイミング（例えば、フラッシュ発光無しを最初に検出したタイミング）でフラッシュ情報処理部１０６へ送信され得る。例えば、図４のステップＳ４０７において送信される、フラッシュ発光無しを示すフラッシュ情報に、当該撮像画像の輝度値のヒストグラムを含める。この場合、ステップＳ４０５で送信されるフラッシュ情報に含めていたヒストグラムは、省略が可能である。 In this embodiment, an example of correcting the background texture is shown, but the present invention is not limited to this. In order to provide a virtual viewpoint image that does not give a sense of discomfort, the lighting conditions of the foreground and background textures used to generate the virtual viewpoint image should be brought closer to each other when the lighting conditions of the imaging space change beyond a predetermined standard. In addition, at least one of those textures needs to be corrected. Therefore, for example, a process of correcting the texture of the foreground to reduce the influence of the flash may be performed. For example, the foreground texture emitted by the flash may be corrected based on the histogram of the foreground texture of the foreground frame in which the flash is not emitted. However, the flash information acquisition unit 103 transmits the histogram of the foreground texture when it detects that there is no flash emission to the flash information processing unit 106. The histogram of the foreground texture is transmitted to the flash information processing unit 106, for example, every time no flash emission is detected, or at an arbitrary timing in a series of imaging (for example, the timing when no flash emission is first detected). Can be done. For example, the flash information indicating no flash emission transmitted in step S407 of FIG. 4 includes a histogram of the luminance value of the captured image. In this case, the histogram included in the flash information transmitted in step S405 can be omitted.

また、図１Ａに示した構成では、複数の撮像装置１３０に対応して複数のフラッシュ情報取得部１０３が設けられているため、複数のフラッシュ情報がフラッシュ情報処理部１０６へ送信される。したがって、ステップＳ６０２およびステップＳ６０３においては、使用するフラッシュ情報を選択する必要がある。フラッシュ情報処理部１０６は、例えば、複数のフラッシュ情報のうち、仮想視点画像の生成に用いられる仮想カメラの位置と撮像装置１３０の位置との関係から、使用するフラッシュ情報を選択する。例えば、フラッシュ情報処理部１０６は仮想カメラの位置に最も近い撮像装置１３０に対応するフラッシュ情報取得部１０３からのフラッシュ情報を用いて、ステップＳ６０２とステップＳ６０３の処理を実行する。或いは、選択されたフラッシュ情報取得部１０３のみからフラッシュ情報が送信されるようにしてもよいし、システムに１つのフラッシュ情報取得部１０３を設けて、特定の撮像装置の撮像画像を用いてフラッシュ情報が生成されるようにしてもよい。 Further, in the configuration shown in FIG. 1A, since a plurality of flash information acquisition units 103 are provided corresponding to the plurality of image pickup devices 130, a plurality of flash information is transmitted to the flash information processing unit 106. Therefore, in step S602 and step S603, it is necessary to select the flash information to be used. For example, the flash information processing unit 106 selects the flash information to be used from the relationship between the position of the virtual camera used for generating the virtual viewpoint image and the position of the image pickup apparatus 130 among the plurality of flash information. For example, the flash information processing unit 106 executes the processes of steps S602 and S603 by using the flash information from the flash information acquisition unit 103 corresponding to the image pickup device 130 closest to the position of the virtual camera. Alternatively, the flash information may be transmitted only from the selected flash information acquisition unit 103, or the system may be provided with one flash information acquisition unit 103 and the flash information may be transmitted using an image captured by a specific image pickup device. May be generated.

上述のフラッシュ情報の選択は、単一のフラッシュ情報取得部１０３の判断に基づく補正の例であるが、複数のフラッシュ情報取得部１０３のフラッシュ発光有無の判断が用いても良い。例えば、複数のフラッシュ情報取得部１０３のフラッシュ発光有無の結果を多数決でフラッシュ再現処理の実施を決定しても良い。また、多数決の判断を行う際に、仮想カメラの位置と撮像装置の位置関係に基づいた重みづけがなされてもよい。例えば、仮想カメラの位置に近い撮像装置のフラッシュ情報により大きい重みを与えて多数決による判断を行うようにしてもよい。或いは、取得された複数のフラッシュ情報のうちの所定数以上（１以上）のフラッシュ情報がフラッシュ発光有りを示す場合に、当該撮像においてフラッシュ発光有りと判断するようにしてもよい。また、フラッシュ情報にフレーム内平均輝度値を含ませておき、ステップＳ６０３の調整処理において、フレーム内平均輝度値が最も高いフラッシュ情報が用いられるようにしてもよい。 The above-mentioned selection of flash information is an example of correction based on the determination of a single flash information acquisition unit 103, but the determination of the presence or absence of flash emission of a plurality of flash information acquisition units 103 may be used. For example, the execution of the flash reproduction process may be decided by a majority vote based on the result of the presence / absence of the flash emission of the plurality of flash information acquisition units 103. Further, when making a majority decision, weighting may be made based on the positional relationship between the position of the virtual camera and the positional relationship of the image pickup device. For example, the flash information of the image pickup device close to the position of the virtual camera may be given a larger weight to make a judgment by majority vote. Alternatively, when a predetermined number or more (1 or more) of the acquired flash information indicates that the flash is emitted, it may be determined that the flash is emitted in the image pickup. Further, the flash information may include the in-frame average luminance value, and the flash information having the highest in-frame average luminance value may be used in the adjustment process of step S603.

＜第２実施形態＞
第１実施形態では、照明条件の基準を超える変化（例えば、フラッシュ発光）を検知して、背景のテクスチャを補正する例を示した。第２実施形態では、照明条件の変化の原因となる光源の位置を特定し、仮想視点画像にその光源を配置して背景のテクスチャを補正する構成を説明する。以下では、光源としてのフラッシュの３次元空間における発光位置を導出し、仮想視点画像のレンダリング時にそのフラッシュ発光を再現する例を示す。 <Second Embodiment>
In the first embodiment, an example is shown in which a change exceeding the standard of lighting conditions (for example, flash emission) is detected to correct the texture of the background. In the second embodiment, a configuration will be described in which the position of a light source that causes a change in lighting conditions is specified, the light source is arranged in a virtual viewpoint image, and the background texture is corrected. The following is an example of deriving the light emitting position of the flash as a light source in the three-dimensional space and reproducing the flash light emission when rendering the virtual viewpoint image.

第２実施形態では、３次元空間にフラッシュの発光位置を導出してレンダリング時にフラッシュに相当する光源を追加することにより、フラッシュを再現する。そのため、前景はフラッシュの影響を受けていないテクスチャを用いて後述の処理を行うことが望ましい。第２実施形態では、前景のテクスチャに対して第１実施形態で示したヒストグラムマッチング補正による補正を行って前景テクスチャへのフラッシュの影響が低減または排除された後、後述の処理が行われる。なお、第２実施形態の仮想視点画像生成システム１の構成は、第１実施形態（図１Ａ）と同様である。但し、第２実施形態では、前景テクスチャへのフラッシュの影響を低減するために、シルエット画像導出部１０２が送信する前景テクスチャがフラッシュ情報処理部１０６に提供される。 In the second embodiment, the flash is reproduced by deriving the light emitting position of the flash in the three-dimensional space and adding a light source corresponding to the flash at the time of rendering. Therefore, it is desirable to perform the processing described later using a texture that is not affected by the flash for the foreground. In the second embodiment, the texture of the foreground is corrected by the histogram matching correction shown in the first embodiment to reduce or eliminate the influence of the flash on the foreground texture, and then the processing described later is performed. The configuration of the virtual viewpoint image generation system 1 of the second embodiment is the same as that of the first embodiment (FIG. 1A). However, in the second embodiment, in order to reduce the influence of the flash on the foreground texture, the foreground texture transmitted by the silhouette image deriving unit 102 is provided to the flash information processing unit 106.

第２実施形態において、フラッシュ情報取得部１０３は、フラッシュ発光の有無を検知することに加えて、当該フラッシュの発光位置（画像座標）を特定する。また、フラッシュ情報処理部１０６は、複数のフラッシュ情報取得部１０３のそれぞれが導出した画像座標（フラッシュの発光位置）を受信する。フラッシュ情報処理部１０６は、複数の画像座標からフラッシュの発光の３次元位置を導出し、その３次元位置を仮想視点画像生成部１０５に通知する。仮想視点画像生成部１０５は、導出されたフラッシュの３次元位置に新たな光源を設置して仮想視点画像のレンダリングを行う。フラッシュが発光したフレームに対して、フラッシュを再現する光源を追加しレンダリングを行うことで、フラッシュの影響を前景及び／または背景に与えることが可能となる。 In the second embodiment, the flash information acquisition unit 103 specifies the light emission position (image coordinates) of the flash in addition to detecting the presence or absence of the flash light emission. Further, the flash information processing unit 106 receives the image coordinates (flash emission position) derived by each of the plurality of flash information acquisition units 103. The flash information processing unit 106 derives the three-dimensional position of the light emission of the flash from a plurality of image coordinates, and notifies the virtual viewpoint image generation unit 105 of the three-dimensional position. The virtual viewpoint image generation unit 105 renders the virtual viewpoint image by installing a new light source at the three-dimensional position of the derived flash. By adding a light source that reproduces the flash to the frame emitted by the flash and rendering, it is possible to give the influence of the flash to the foreground and / or the background.

図７に第２の実施形態のフラッシュ情報取得部１０３が導出するフラッシュの画像座標、及び、フラッシュ情報処理部１０６が導出するフラッシュの発光の３次元位置を示す。 FIG. 7 shows the image coordinates of the flash derived by the flash information acquisition unit 103 of the second embodiment and the three-dimensional position of the light emission of the flash derived by the flash information processing unit 106.

図７（ａ）は、第２実施形態の撮像装置１３０が取得した撮像画像（画像データ）の例を示す。図７（ａ）には、フラッシュ７０１を有するカメラを所有したカメラマンのみが図示されているが、第１実施形態と同様に、被写体の周囲に複数の撮像装置１３０が配置されており、力士や土俵を撮像している。また、図７（ａ）には、配置されたすべての撮像装置１３０のうちの、５台の撮像装置１３０から撮像された撮像画像７１１～７１５が図示されている。撮像画像７１１～７１５は、フラッシュ７０１が発光した際に撮像された画像であり、それぞれにフラッシュ光が撮像されている。 FIG. 7A shows an example of an captured image (image data) acquired by the imaging apparatus 130 of the second embodiment. FIG. 7A shows only a cameraman who owns a camera having a flash 701, but as in the first embodiment, a plurality of image pickup devices 130 are arranged around the subject, and a wrestler or a wrestler or a wrestler The image of the ring is taken. Further, FIG. 7A shows images 711 to 715 taken from five image pickup devices 130 among all the image pickup devices 130 arranged. The captured images 711 to 715 are images captured when the flash 701 emits light, and the flash light is captured in each of the captured images.

次に、撮像装置１３０の撮像画像から、フラッシュ情報取得部１０３が取得したフラッシュの画像座標の例を図７（ｂ）に示す。フラッシュの発光時には、前フレームと比較してフラッシュの発光位置の輝度が急激に高くなる。フラッシュ情報取得部１０３は、前フレームと現フレームの各画素の輝度変化を算出する。例えば、輝度変化が閾値以上である座標を白、閾値以下である座標を黒にすることで図７（ｂ）の２値画像７２０が導出される。２値画像７２０において、フラッシュ発光により輝度が急変した画素は白の画素７２１で表され、フラッシュの発光位置の画像座標を示すデータとなる。フラッシュ情報取得部１０３は、フラッシュ発光位置情報としてこの２値画像７２０をフラッシュ情報処理部１０６に送る。 Next, FIG. 7B shows an example of the image coordinates of the flash acquired by the flash information acquisition unit 103 from the image captured by the image pickup apparatus 130. When the flash fires, the brightness of the flash position becomes sharply higher than that of the previous frame. The flash information acquisition unit 103 calculates the brightness change of each pixel of the previous frame and the current frame. For example, the binary image 720 of FIG. 7B is derived by setting the coordinates whose brightness change is equal to or greater than the threshold value to white and the coordinates whose luminance change is equal to or lower than the threshold value to black. In the binary image 720, the pixel whose brightness suddenly changes due to the flash emission is represented by the white pixel 721, which is data indicating the image coordinates of the flash emission position. The flash information acquisition unit 103 sends the binary image 720 to the flash information processing unit 106 as flash emission position information.

フラッシュ情報処理部１０６は、複数のフラッシュ情報取得部１０３から提供される、フラッシュの発光位置を示す複数の２値画像７２０を使用し、視体積交差法等によりフラッシュ発光の３次元位置を得る。図７（ｃ）にフラッシュ情報処理部１０６が導出するフラッシュの３次元位置７３０の例を示す。なお、画像座標内のフラッシュ座標を示すデータとして２値画像を用いたが、これに限られるものではなく、例えば、画素の画像座標が用いられても良い。この場合、フラッシュ情報処理部１０６は、フラッシュの発光位置を示す複数の画像座標からステレオ画像法を用いて当該フラッシュ発光の３次元位置７３０を導出する。導出されたフラッシュ発光の３次元位置７３０を示す３次元位置情報は、仮想視点画像生成部１０５に提供される。 The flash information processing unit 106 uses a plurality of binary images 720 showing the flash emission positions provided by the plurality of flash information acquisition units 103, and obtains a three-dimensional position of the flash emission by a visual volume crossing method or the like. FIG. 7C shows an example of the three-dimensional position 730 of the flash derived by the flash information processing unit 106. Although the binary image is used as the data indicating the flash coordinates in the image coordinates, the present invention is not limited to this, and for example, the image coordinates of the pixels may be used. In this case, the flash information processing unit 106 derives the three-dimensional position 730 of the flash emission from a plurality of image coordinates indicating the emission position of the flash by using the stereo imaging method. The three-dimensional position information indicating the three-dimensional position 730 of the derived flash light emission is provided to the virtual viewpoint image generation unit 105.

仮想視点画像生成部１０５は、フラッシュ情報処理部１０６から提供されるフラッシュの３次元位置情報に基づいて光源を追加し、仮想視点画像のためのレンダリングを行う。仮想視点画像生成部１０５は、光源として、例えば、一点から放射状に光を放射するポイントライト、または、一点から光線の方向の指定（角度指定）が可能なスポットライトを用い得る。ポイントライト光源の場合は、フラッシュ情報取得部１０３が出力するフラッシュ発光の３次元位置情報に従い光源が設定される。フラッシュ光をより忠実に再現したい場合には、スポットライト光源が用いられる。但し、スポットライト光源が用いられる場合には、フラッシュの発光位置（３次元位置情報）に加えて、フラッシュからの光線の方向を示す角度を導出する必要がある。 The virtual viewpoint image generation unit 105 adds a light source based on the three-dimensional position information of the flash provided by the flash information processing unit 106, and renders for the virtual viewpoint image. As the light source, the virtual viewpoint image generation unit 105 may use, for example, a point light that radiates light from one point, or a spotlight that can specify the direction (angle designation) of the light ray from one point. In the case of a point light light source, the light source is set according to the three-dimensional position information of the flash emission output by the flash information acquisition unit 103. If you want to reproduce the flash light more faithfully, a spotlight light source is used. However, when a spotlight light source is used, it is necessary to derive an angle indicating the direction of the light beam from the flash in addition to the light emission position (three-dimensional position information) of the flash.

図７（ｄ）にスポットライト光源の角度の例を示す。図７（ｄ）の３次元位置７３０は前述の処理で導出されたフラッシュの発光位置を示している。スポットライト光源の方向（角度）は、光源の位置から撮像空間の所定の位置へ向かう方向に設定され得る。例えば、カメラマンが所有するカメラは、被写体である力士や土俵に向けられているとして、被写体の部分を示す３次元座標をあらかじめ決めておいても良い。例えば、被写体である土俵の中央の３次元位置７４０（３次元座標）と、フラッシュ発光の３次元位置７３０（３次元座標）を結ぶベクトルをスポットライト光源の角度とする。このように、フラッシュ情報処理部１０６はスポットライト光源の角度を導出してスポットライト光源を追加することができる。 FIG. 7D shows an example of the angle of the spotlight light source. The three-dimensional position 730 in FIG. 7D shows the light emitting position of the flash derived by the above-mentioned processing. The direction (angle) of the spotlight light source can be set in the direction from the position of the light source to a predetermined position in the imaging space. For example, assuming that the camera owned by the cameraman is aimed at the wrestler or the ring, which is the subject, the three-dimensional coordinates indicating the part of the subject may be determined in advance. For example, the angle of the spotlight light source is a vector connecting the three-dimensional position 740 (three-dimensional coordinates) in the center of the bale, which is the subject, and the three-dimensional position 730 (three-dimensional coordinates) of the flash emission. In this way, the flash information processing unit 106 can derive the angle of the spotlight light source and add the spotlight light source.

以上の処理をフローチャートにより説明する。まず、第２実施形態のフラッシュ情報取得部１０３の動作について図４のフローチャートを流用して説明する。第２実施形態のフラッシュ情報取得部１０３の動作は、ステップＳ４０５、Ｓ４０７で送信されるフラッシュ情報の内容を除いて第１実施形態と同様である。第２実施形態では、フラッシュ情報取得部１０３は、ステップＳ４０７において、フラッシュ発光無しを示す情報と、当該撮像画像の輝度値のヒストグラムを含むフラッシュ情報を送信する。また、ステップＳ４０５では、フラッシュ情報取得部１０３は、フラッシュ発光有りを示す情報と当該フラッシュの発光位置を示す情報とを含むフラッシュ情報を送信する。 The above process will be described with a flowchart. First, the operation of the flash information acquisition unit 103 of the second embodiment will be described by diverting the flowchart of FIG. The operation of the flash information acquisition unit 103 of the second embodiment is the same as that of the first embodiment except for the contents of the flash information transmitted in steps S405 and S407. In the second embodiment, the flash information acquisition unit 103 transmits, in step S407, information indicating no flash emission and flash information including a histogram of the luminance value of the captured image. Further, in step S405, the flash information acquisition unit 103 transmits flash information including information indicating the presence or absence of flash light emission and information indicating the light emission position of the flash.

図８は第２実施形態によるフラッシュ情報処理部１０６の動作を説明するフローチャートである。ステップＳ８０１において、フラッシュ情報処理部１０６は、フラッシュ情報取得部１０３からフラッシュ情報を、シルエット画像導出部１０２から前景テクスチャを受信する。次に、ステップＳ８０２において、フラッシュ情報処理部１０６は、ステップＳ８０１で受信したフラッシュ情報からフラッシュ発光の有無を確認する。フラッシュ発光無しと判断された場合、処理はステップＳ８０６へ進み、フラッシュ情報処理部１０６は、当該フラッシュ情報に含まれているヒストグラムを非発光時のヒストグラムとして保持する。その後、ステップＳ８０７において、フラッシュ情報処理部１０６は、シルエット画像導出部１０２から受信した前景テクスチャをそのまま仮想視点画像生成部１０５へ提供する。 FIG. 8 is a flowchart illustrating the operation of the flash information processing unit 106 according to the second embodiment. In step S801, the flash information processing unit 106 receives the flash information from the flash information acquisition unit 103 and the foreground texture from the silhouette image derivation unit 102. Next, in step S802, the flash information processing unit 106 confirms the presence or absence of flash light emission from the flash information received in step S801. If it is determined that there is no flash emission, the process proceeds to step S806, and the flash information processing unit 106 holds the histogram included in the flash information as a histogram at the time of non-flash emission. After that, in step S807, the flash information processing unit 106 provides the foreground texture received from the silhouette image derivation unit 102 to the virtual viewpoint image generation unit 105 as it is.

一方、ステップＳ８０２においてフラッシュ発光有りと判定された場合、処理はステップＳ８０３に進む。ステップＳ８０３において、フラッシュ情報処理部１０６は、ステップＳ８０１で受信した前景テクスチャを、ステップＳ８０６で保存した非発光時のヒストグラムを用いて補正すし、フラッシュの影響が低減または排除された前景テクスチャを生成する。ステップＳ８０４において、フラッシュ情報処理部１０６は、フラッシュ情報に含まれているフラッシュの発光位置の３次元位置を計算する。ステップＳ８０５において、フラッシュ情報処理部１０６は、ステップＳ８０４で計算された３次元位置を仮想視点画像生成部１０５に提供する。その後、ステップＳ８０７において、ステップＳ８０３で補正された前景テクスチャを仮想視点画像生成部１０５に提供する。ステップ８６０８において、フラッシュ情報処理部１０６は、すべてのフレームの処理が完了したか否かを判断する。すべてのフレームの処理が完了していれば本処理は終了し、すべてのフレームの処理が完了していなければ、処理はステップＳ８０１に遷移する。 On the other hand, if it is determined in step S802 that there is flash emission, the process proceeds to step S803. In step S803, the flash information processing unit 106 corrects the foreground texture received in step S801 by using the histogram at the time of non-emission saved in step S806, and generates a foreground texture in which the influence of the flash is reduced or eliminated. .. In step S804, the flash information processing unit 106 calculates the three-dimensional position of the light emitting position of the flash included in the flash information. In step S805, the flash information processing unit 106 provides the virtual viewpoint image generation unit 105 with the three-dimensional position calculated in step S804. Then, in step S807, the foreground texture corrected in step S803 is provided to the virtual viewpoint image generation unit 105. In step 8608, the flash information processing unit 106 determines whether or not the processing of all the frames is completed. If the processing of all frames is completed, this processing ends, and if the processing of all frames is not completed, the processing proceeds to step S801.

仮想視点画像生成部１０５は、フラッシュ情報処理部１０６から提供された発光位置、前景テクスチャ、背景データ保存部１０７に保存されている背景データ、操作装置１１０により指定された仮想カメラの位置に基づいて、仮想視点画像を生成する。この時、フラッシュ（光源）の輝度は、一般的なフラッシュが有する輝度を参考にして予め定められていてもよいし、フラッシュ情報取得部１０３がフラッシュ発光時の撮像画像から取得した輝度ヒストグラムを用いて光源の輝度が設定されてもよい。 The virtual viewpoint image generation unit 105 is based on the light emitting position provided by the flash information processing unit 106, the foreground texture, the background data stored in the background data storage unit 107, and the position of the virtual camera designated by the operation device 110. , Generate a virtual viewpoint image. At this time, the brightness of the flash (light source) may be predetermined with reference to the brightness of a general flash, or the brightness histogram acquired by the flash information acquisition unit 103 from the captured image at the time of flash emission is used. The brightness of the light source may be set.

以上のように、第２実施形態によれば、３次元空間におけるフラッシュの発光位置を導出してレンダリング時にフラッシュを再現することができる。これにより、不自然な仮想視点画像の生成を抑制することができ、視聴者に臨場感を損なうことなく違和感を与えない仮想視点画像の提供が可能となる。 As described above, according to the second embodiment, it is possible to derive the light emitting position of the flash in the three-dimensional space and reproduce the flash at the time of rendering. As a result, it is possible to suppress the generation of an unnatural virtual viewpoint image, and it is possible to provide a virtual viewpoint image that does not give a sense of discomfort to the viewer without impairing the sense of presence.

以上説明した各実施形態によれば、フラッシュ等により撮像環境（照明条件）が急激に変化した場合でも、自然な仮想視点画像が生成される。また、被写体である前景のみフラッシュの影響を受け白く点滅するような不自然な仮想視点画像の生成が抑制されるため、視聴者に臨場感を損なうことなく違和感を与えない仮想視点画像を提供することが可能となる。 According to each of the above-described embodiments, a natural virtual viewpoint image is generated even when the imaging environment (illumination conditions) is suddenly changed by a flash or the like. In addition, since the generation of an unnatural virtual viewpoint image that blinks white under the influence of the flash only in the foreground, which is the subject, is suppressed, a virtual viewpoint image that does not impair the viewer's sense of presence and does not give a sense of discomfort is provided. It becomes possible.

（他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

本発明は上記実施の形態に制限されるものではなく、本発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、本発明の範囲を公にするために、以下の請求項を添付する。 The present invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the present invention. Therefore, in order to publicize the scope of the present invention, the following claims are attached.

１：仮想視点画像生成システム、１００：画像処理装置、１０１：画像入力部、１０２：シルエット画像導出部、１０３：フラッシュ情報取得部、１０４：３次元形状導出部、１０５：仮想視点画像生成部、１０６：フラッシュ情報処理部、１１０：操作装置、１２０：表示装置、１３０：撮像装置 1: Virtual viewpoint image generation system, 100: Image processing device, 101: Image input unit, 102: Silhouette image derivation unit, 103: Flash information acquisition unit, 104: Three-dimensional shape derivation unit, 105: Virtual viewpoint image generation unit, 106: Flash information processing unit, 110: Operation device, 120: Display device, 130: Image pickup device

Claims

The three-dimensional shape information of the foreground, which is the subject based on the plurality of captured images acquired by the plurality of imaging devices, the color information of the foreground, and the background 3 in the imaging space of the plurality of imaging devices, which is different from the foreground. A generation means for generating a virtual viewpoint image based on the three-dimensional shape information, the background color information, and the designated virtual viewpoint.
A determination means for determining whether or not a change exceeding a predetermined reference has occurred in the lighting conditions based on at least one of the plurality of captured images.
When it is determined by the determination means that the lighting conditions have changed, the foreground color information and the background are brought closer to each other so that the lighting conditions of the foreground color information and the background color information used by the generation means are brought closer to each other. A virtual viewpoint image generation system characterized by having an adjusting means for adjusting at least one of the color information of the above.

The virtual according to claim 1, wherein the determination means determines whether or not a change has occurred in the lighting conditions based on a change in the average luminance value of the captured image acquired by the plurality of imaging devices. Viewpoint image generation system.

The determination means is the illumination when the difference between the average luminance value of the captured image of the current frame to be determined and the average luminance value of the captured images of a predetermined number of frames before the current frame exceeds the threshold value. The virtual viewpoint image generation system according to claim 2, wherein it is determined that a change has occurred in the conditions.

The determination means determines that the lighting condition has changed based on the number of the captured images determined that the lighting condition has changed among the plurality of captured images acquired by the plurality of imaging devices. The virtual viewpoint image generation system according to any one of claims 1 to 3, which is characterized.

The determination means is characterized in that it determines whether or not the lighting conditions have changed by using an image taken from an image pickup device closest to the virtual viewpoint among the plurality of image pickup devices. The virtual viewpoint image generation system according to any one of 1 to 3.

The virtual viewpoint image according to any one of claims 1 to 5, wherein the adjusting means adjusts the color information of the background so as to approach the characteristics of the captured image when the change is detected. Generation system.

The virtual viewpoint image generation system according to claim 6, wherein the adjusting means uses a histogram of the brightness of the captured image when the change is detected by the determination means as the characteristic.

The adjusting means according to claim 1 to 7, wherein the adjusting means adjusts the color information of the foreground so that the characteristic of the color information of the foreground approaches the characteristic of the captured image when the change is not detected. The virtual viewpoint image generation system according to any one of the items.

The virtual viewpoint image generation system according to claim 8, wherein the adjusting means uses a histogram of the brightness of the captured image when the change is not detected by the determination means as the characteristic.

Further provided with specific means for identifying the position of the light source in the three-dimensional space that causes the change in the lighting condition,
The virtual viewpoint according to any one of claims 1 to 9, wherein the generation means arranges a light source specified by the specific means in the three-dimensional space to render the virtual viewpoint image. Image generation system.

The virtual viewpoint image generation system according to claim 10, wherein the adjusting means adjusts the color information of the foreground to the color information corresponding to the lighting condition before the change occurs.

The virtual viewpoint image generation system according to claim 10, wherein the generation means renders the virtual viewpoint image using the light source as a point light light source.

The virtual viewpoint image generation system according to claim 10, wherein the generation means renders the virtual viewpoint image using the light source as a spotlight light source.

The virtual viewpoint image generation system according to claim 13, wherein the generation means sets the direction of the spotlight light source in a direction from the position of the light source toward a predetermined position in the imaging space.

An acquisition means for acquiring three-dimensional shape information of the foreground, which is a subject, and color information of the foreground from an image captured by an image pickup device.
Based on the captured image, a determination means for determining whether or not a change exceeding a predetermined reference has occurred in the lighting conditions, and a determination means.
It has a three-dimensional shape of the foreground acquired by the acquisition means, color information of the foreground, and a transmission means for transmitting the determination result of the determination means to an external device.
The image processing device is characterized in that, when it is determined by the determination means that a change has occurred, the transmission means further transmits information representing the characteristics of the luminance of the captured image.

The three-dimensional shape information of the foreground, which is the subject based on the plurality of captured images acquired by the plurality of imaging devices, the color information of the foreground, and the background 3 in the imaging space of the plurality of imaging devices, which is different from the foreground. A generation means for generating a virtual viewpoint image based on the three-dimensional shape information, the background color information, and the designated virtual viewpoint.
When it is notified that the lighting conditions of the imaging space have changed beyond a predetermined reference, the foreground color information used by the generation means and the background color information are brought closer to each other. An image generation device comprising: an adjusting means for adjusting at least one of the color information of the background and the color information of the background.

The three-dimensional shape information of the foreground, which is the subject based on the plurality of captured images acquired by the plurality of imaging devices, the color information of the foreground, and the background 3 in the imaging space of the plurality of imaging devices, which is different from the foreground. A generation process for generating a virtual viewpoint image based on the three-dimensional shape information, the background color information, and the designated virtual viewpoint.
A determination step of determining whether or not a change exceeding a predetermined reference has occurred in the lighting conditions based on at least one of the plurality of captured images.
When it is determined by the determination step that the lighting conditions have changed, the foreground color information and the foreground color information and the above so as to bring the lighting conditions of the foreground color information and the background color information used in the generation step closer to each other. A control method for a virtual viewpoint image generation system, which comprises an adjustment step of adjusting at least one of background color information.

The acquisition process of acquiring the three-dimensional shape information of the foreground, which is the subject, and the color information of the foreground from the image captured by the image pickup device.
A determination step for determining whether or not a change exceeding a predetermined standard has occurred in the lighting conditions based on the captured image, and a determination step.
It has a three-dimensional shape of the foreground acquired by the acquisition step, color information of the foreground, and a transmission step of transmitting the determination result of the determination step to an external device.
A control method for an image processing apparatus, characterized in that, in the transmission step, when it is determined that a change has occurred in the determination step, information representing the characteristics of the luminance of the captured image is further transmitted.

The three-dimensional shape information of the foreground, which is the subject based on the plurality of captured images acquired by the plurality of imaging devices, the color information of the foreground, and the background 3 in the imaging space of the plurality of imaging devices, which is different from the foreground. A generation process for generating a virtual viewpoint image based on the three-dimensional shape information, the background color information, and the designated virtual viewpoint.
When it is notified that the lighting conditions of the imaging space have changed beyond a predetermined reference, the lighting conditions of the foreground color information and the background color information used in the generation step are brought close to each other. A control method for an image generator, comprising: an adjustment step of adjusting at least one of a foreground color information and the background color information.

A program for causing a computer to execute each step of the control method according to any one of claims 17 to 19.