JP2022016929A

JP2022016929A - Information processing device, information processing method, and program

Info

Publication number: JP2022016929A
Application number: JP2020119932A
Authority: JP
Inventors: 尚志中本; Hisashi Nakamoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2022-01-25

Abstract

To suppress the degradation of image quality in a virtual viewpoint image due to noise caused by errors in 3D shape estimation in a virtual viewpoint image.SOLUTION: An information processing device identifies a position of a virtual viewpoint corresponding to a virtual viewpoint image generated on the basis of the result of estimating a three-dimensional shape of an object by shooting a shooting area from different directions by a plurality of shooting devices. The information processing device also determines whether a predetermined condition related to the degradation of image quality due to the occurrence in the virtual viewpoint image of noise at the position where an error in the estimation of the three-dimensional shape is caused by the inability to shoot from at least any of the plurality of shooting devices has been met. The information processing device then controls the position of the virtual viewpoint to be closer to the position on an optical axis of a shooting device included in the plurality of shooting devices when it is determined that the predetermined condition is met.SELECTED DRAWING: Figure 6

Description

本発明は、複数の撮影装置を用いて仮想視点画像を生成する技術に関するものでる。 The present invention relates to a technique for generating a virtual viewpoint image using a plurality of photographing devices.

複数の撮影装置を異なる位置に設置して同期撮影し、当該撮影により得られた複数の撮影画像を用いて、視点を任意に変更可能な仮想視点画像を生成する技術がある。具体的には、複数の撮影画像に基づいて撮影領域内のオブジェクトの三次元形状が推定され、推定された三次元形状を表す三次元モデルを用いてレンダリングを行うことで仮想視点画像が生成される。仮想視点画像を生成する技術によれば、例えば、サッカーやラグビーの映像を様々な視点から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることが出来る。 There is a technique of installing a plurality of imaging devices at different positions to perform synchronous imaging, and using a plurality of captured images obtained by the imaging to generate a virtual viewpoint image in which the viewpoint can be arbitrarily changed. Specifically, the three-dimensional shape of the object in the shooting area is estimated based on a plurality of shot images, and a virtual viewpoint image is generated by rendering using a three-dimensional model representing the estimated three-dimensional shape. To. According to the technology for generating a virtual viewpoint image, for example, a soccer or rugby image can be viewed from various viewpoints, so that a user can be given a high sense of presence as compared with a normal image.

特許文献１には、複数のカメラのカメラ情報に基づいて、撮影対象領域の中でユーザが希望する解像度で仮想視点画像を生成できる領域を特定し、その特定の結果をユーザに通知することが記載されている。 In Patent Document 1, it is possible to specify an area within a shooting target area where a virtual viewpoint image can be generated at a resolution desired by the user based on the camera information of a plurality of cameras, and notify the user of the specific result. Are listed.

特開２０１９－１４５８９４号公報Japanese Unexamined Patent Publication No. 2019-145894

しかしながら、同じ領域の仮想視点画像を生成する場合であっても、その領域に存在するオブジェクトの状況や、どの位置からその領域を見るように仮想視点を設定するかなどによって、生成される仮想視点画像の画質は異なる。例えば、ラグビーのスクラムなどで複数の選手が密集している領域の仮想視点画像を生成する場合に、選手により遮蔽されて一部の撮影装置から撮影できない位置については、撮影画像に基づく三次元形状の推定に誤りが生じてしまう。このような不正確な三次元形状の推定結果を用いて仮想視点画像を生成した場合に、仮想視点の位置によっては、仮想視点画像にノイズが発生して画質が低下することが考えられる。 However, even when a virtual viewpoint image of the same area is generated, the virtual viewpoint generated depends on the situation of the objects existing in the area and the position from which the virtual viewpoint is set to be viewed. The image quality is different. For example, when generating a virtual viewpoint image of an area where multiple players are crowded in a rugby scrum, etc., the position that is shielded by the players and cannot be shot from some shooting devices is a three-dimensional shape based on the shot image. An error will occur in the estimation of. When a virtual viewpoint image is generated using the estimation result of such an inaccurate three-dimensional shape, it is conceivable that noise is generated in the virtual viewpoint image and the image quality is deteriorated depending on the position of the virtual viewpoint.

本発明は上記の課題に鑑みてなされたものであり、仮想視点画像において三次元形状推定の誤りに起因するノイズが生じることによる、仮想視点画像の画質の低下を抑制することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to suppress deterioration of the image quality of the virtual viewpoint image due to noise caused by an error in three-dimensional shape estimation in the virtual viewpoint image.

上述した課題を解決するために、本発明に係る情報処理装置は、例えば以下の構成を有する。すなわち、情報処理装置は、複数の撮影装置により撮影領域を複数の方向から撮影することで前記撮影領域内のオブジェクトの三次元形状を推定した結果に基づいて生成される仮想視点画像に対応する仮想視点の位置を特定する特定手段と、前記撮影領域内の位置であって前記複数の撮影装置の少なくとも何れかから撮影できないことにより三次元形状の推定の誤りが生じる位置のノイズが前記仮想視点画像に生じることによる画質の低下に関わる所定の条件が満たされたかを判定する判定手段と、前記判定手段により前記所定の条件が満たされたと判定された場合に、前記仮想視点画像に対応する仮想視点の位置を前記特定手段により特定された位置から前記複数の撮影装置に含まれる撮影装置の光軸上の位置に近づけるための制御を行う制御手段と、を有する。 In order to solve the above-mentioned problems, the information processing apparatus according to the present invention has, for example, the following configuration. That is, the information processing device corresponds to a virtual viewpoint image generated based on the result of estimating the three-dimensional shape of an object in the shooting area by shooting the shooting area from a plurality of directions by a plurality of shooting devices. The virtual viewpoint image is a noise at a position where an error in estimating a three-dimensional shape occurs due to a specific means for specifying the position of a viewpoint and a position in the shooting area where shooting is not possible from at least one of the plurality of shooting devices. A determination means for determining whether or not a predetermined condition related to deterioration of image quality due to the occurrence of the above is satisfied, and a virtual viewpoint corresponding to the virtual viewpoint image when the determination means determines that the predetermined condition is satisfied. It has a control means for controlling the position of the above to be closer to a position on the optical axis of the photographing apparatus included in the plurality of photographing devices from the position specified by the specific means.

本発明によれば、仮想視点を移動させることにより、仮想視点から見た光景に近しい光景を写す撮影装置を用いて仮想視点画像を生成できるようになるため、仮想視点画像において三次元形状推定の誤りに起因するノイズが生じにくくなる。これにより、仮想視点画像の画質の低下を抑制することができる。 According to the present invention, by moving the virtual viewpoint, it is possible to generate a virtual viewpoint image by using a photographing device that captures a scene close to the scene seen from the virtual viewpoint. Therefore, three-dimensional shape estimation is performed in the virtual viewpoint image. Noise caused by errors is less likely to occur. As a result, deterioration of the image quality of the virtual viewpoint image can be suppressed.

画像処理システムの構成例を示す図である。It is a figure which shows the configuration example of an image processing system. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of an information processing apparatus. 情報処理装置の機能構成例を示すブロック図である。It is a block diagram which shows the functional structure example of an information processing apparatus. 仮想視点と撮影装置の配置の例を示す図である。It is a figure which shows the example of the arrangement of a virtual viewpoint and a photographing apparatus. 望遠カメラと広角カメラの配置の例を示す図である。It is a figure which shows the example of the arrangement of a telephoto camera and a wide-angle camera. 表示装置により表示される画面の例を示す図である。It is a figure which shows the example of the screen displayed by the display device. 撮影装置に関する撮影情報の例を示す図である。It is a figure which shows the example of the shooting information about a shooting device. 情報処理装置の動作の例を示すフローチャートである。It is a flowchart which shows the example of the operation of an information processing apparatus. 仮想視点の位置と撮影装置の光軸の関係を示す図である。It is a figure which shows the relationship between the position of a virtual viewpoint and the optical axis of a photographing apparatus.

［システム構成］
図１は、本実施形態に係る画像処理システム１００の構成例を示す図である。画像処理システム１００は、複数の撮影装置１０１、情報処理装置１０２、視点入力装置１０３、および表示装置１０４を有する。画像処理システム１００は、複数の撮影装置１０１による撮影に基づく複数の撮影画像と、指定された仮想視点とに基づいて、指定された仮想視点から見た光景を表す仮想視点画像を生成するためのシステムである。本実施形態における仮想視点画像は、自由視点映像とも呼ばれるものであるが、ユーザが自由に（任意に）指定した視点に対応する画像に限定されず、例えば複数の候補からユーザが選択した視点に対応する画像なども仮想視点画像に含まれる。また、本実施形態では仮想視点の指定がユーザ操作により行われる場合を中心に説明するが、仮想視点の指定が画像解析の結果等に基づいて自動で行われてもよい。また、本実施形態では仮想視点画像が動画である場合を中心に説明するが、仮想視点画像は静止画であってもよい。 [System configuration]
FIG. 1 is a diagram showing a configuration example of the image processing system 100 according to the present embodiment. The image processing system 100 includes a plurality of photographing devices 101, an information processing device 102, a viewpoint input device 103, and a display device 104. The image processing system 100 is for generating a virtual viewpoint image representing a scene seen from a designated virtual viewpoint based on a plurality of shot images taken by a plurality of shooting devices 101 and a designated virtual viewpoint. It is a system. The virtual viewpoint image in the present embodiment is also called a free viewpoint image, but is not limited to an image corresponding to a viewpoint freely (arbitrarily) specified by the user, for example, a viewpoint selected by the user from a plurality of candidates. Corresponding images and the like are also included in the virtual viewpoint image. Further, in the present embodiment, the case where the virtual viewpoint is specified by the user operation will be mainly described, but the virtual viewpoint may be automatically specified based on the result of image analysis or the like. Further, in the present embodiment, the case where the virtual viewpoint image is a moving image will be mainly described, but the virtual viewpoint image may be a still image.

複数の撮影装置１０１は、スタジアム内のフィールド等の撮影領域を取り囲むようにそれぞれ異なる位置に設置され、撮影領域の一部又は全体をそれぞれ異なる方向から同期して撮影する。なお、複数の撮影装置１０１は撮影領域の全周にわたって設置されていなくてもよく、撮影装置１０１は撮影領域の周囲の一部にのみ設置されていてもよい。また、撮影装置１０１の数は図１に示す例に限定されず、例えば、撮影領域をラグビーのフィールドとする場合、フィールドの周囲に１００台程度の撮影装置１０１が設置されてもよい。 The plurality of photographing devices 101 are installed at different positions so as to surround the photographing area such as a field in the stadium, and a part or the whole of the photographing area is simultaneously photographed from different directions. The plurality of photographing devices 101 may not be installed over the entire circumference of the photographing area, and the photographing device 101 may be installed only in a part around the photographing area. Further, the number of photographing devices 101 is not limited to the example shown in FIG. 1. For example, when the photographing area is a rugby field, about 100 photographing devices 101 may be installed around the field.

図４に、スタジアム４００内のフィールドの周囲に１８台の撮影装置１０１ａ～１０１ｒが設置されており、仮想視点画像に対応する仮想視点４０１がフィールド上に設定されている例を示す。図４の例では、フィールドの周囲に配置された複数の撮影装置１０１により取得された複数の撮影画像を用いて、ゴール近くの仮想視点４０１から見た光景を表す仮想視点画像が生成される。 FIG. 4 shows an example in which 18 photographing devices 101a to 101r are installed around the field in the stadium 400, and the virtual viewpoint 401 corresponding to the virtual viewpoint image is set on the field. In the example of FIG. 4, a virtual viewpoint image representing a scene seen from a virtual viewpoint 401 near the goal is generated by using a plurality of captured images acquired by a plurality of photographing devices 101 arranged around the field.

なお、複数の撮影装置１０１には、望遠カメラと広角カメラなど機能が異なるカメラが混在していてもよい。図５（ａ）に、望遠カメラ５０２と広角カメラ５０３の２種類のカメラ群を有する撮影システムにおけるカメラ配置の一例を示す。フィールド５０１は例えばラグビー等を行うスタジアム内のグラウンド面であり、その上には被写体としての選手であるオブジェクト５００が存在している。そして、望遠カメラ群を構成する１２台の望遠カメラ５０２と広角カメラ群を構成する１２台の広角カメラ５０３がフィールド５０１を取り囲むように配置されている。 The plurality of photographing devices 101 may include cameras having different functions such as a telephoto camera and a wide-angle camera. FIG. 5A shows an example of camera arrangement in a photographing system having two types of cameras, a telephoto camera 502 and a wide-angle camera 503. The field 501 is a ground surface in a stadium where, for example, rugby is performed, and an object 500, which is a player as a subject, is present on the ground surface. The 12 telephoto cameras 502 constituting the telephoto camera group and the 12 wide-angle cameras 503 constituting the wide-angle camera group are arranged so as to surround the field 501.

図５（ｂ）に、望遠カメラ５０２の撮影範囲を示す。点線で囲まれた領域５０５は、望遠カメラ５０２の撮影範囲を示している。望遠カメラ５０２は画角が狭いため撮影範囲は狭いが、撮影画像におけるオブジェクト５００の解像度合いは高いという特性を持つ。図５（ｃ）に、広角カメラ５０３の撮影範囲を示す。点線で囲まれた領域５０７は、広角カメラ５０３の撮影範囲を示している。広角カメラ５０３は画角が広いため撮影範囲は広いが、撮影画像におけるオブジェクト５００の解像度合いは低いという特性を持つ。なお、図５（ｂ）及び図５（ｃ）では撮影範囲の形状を便宜的に楕円で示しているが、各カメラの実際の撮影範囲の形状は矩形であってもよい。 FIG. 5B shows the shooting range of the telephoto camera 502. The area 505 surrounded by the dotted line indicates the shooting range of the telephoto camera 502. The telephoto camera 502 has a narrow shooting range because the angle of view is narrow, but has a characteristic that the resolution of the object 500 in the shot image is high. FIG. 5C shows the shooting range of the wide-angle camera 503. The area 507 surrounded by the dotted line indicates the shooting range of the wide-angle camera 503. Since the wide-angle camera 503 has a wide angle of view, the shooting range is wide, but the resolution of the object 500 in the shot image is low. Although the shape of the shooting range is shown as an ellipse in FIGS. 5 (b) and 5 (c) for convenience, the shape of the actual shooting range of each camera may be rectangular.

本実施形態では、撮影領域をそれぞれ異なる位置から撮影する撮影装置１０１として、それぞれが独立した筐体を有し単一の視点で撮影可能なカメラを用いる場合について説明する。ただしこれに限らず、２以上の撮影装置１０１が同一の筐体内に構成されていてもよい。例えば、複数のレンズ群と複数のセンサを備えており複数視点から撮影可能な単体のカメラが、複数の撮影装置１０１として設置されていてもよい。本実施形態において複数の撮影装置１０１による撮影対象となる撮影領域は、ラグビーの試合が行われるフィールドである。ただし撮影領域はこれに限定されるものではなく、例えばサッカーなどの他の競技が行われる競技場であってもよいし、コンサートまたは演技が行われる舞台などであってもよい。 In the present embodiment, a case where a camera having an independent housing and capable of shooting from a single viewpoint is used as a shooting device 101 for shooting a shooting region from different positions will be described. However, the present invention is not limited to this, and two or more photographing devices 101 may be configured in the same housing. For example, a single camera having a plurality of lens groups and a plurality of sensors and capable of photographing from a plurality of viewpoints may be installed as a plurality of photographing devices 101. In the present embodiment, the shooting area to be shot by the plurality of shooting devices 101 is a field where a rugby game is played. However, the shooting area is not limited to this, and may be a stadium where other sports such as soccer are performed, or a stage where a concert or performance is performed.

撮影装置１０１は、例えば、シリアルデジタルインターフェイス（ＳＤＩ）に代表される映像信号インターフェイスを備えたデジタルビデオ撮影装置である。撮影装置１０１は、撮影により得られた映像信号にタイムコードに代表される時刻情報を付加した映像データを、情報処理装置１０２に送信する。 The photographing device 101 is, for example, a digital video photographing device provided with a video signal interface represented by a serial digital interface (SDI). The photographing device 101 transmits to the information processing apparatus 102 video data in which time information represented by a time code is added to the video signal obtained by photographing.

視点入力装置１０３は、ジョイスティック等のコントローラを有する入力装置であり、生成すべき仮想視点画像に対応する仮想視点を指定するためのユーザ操作を受け付ける。そして視点入力装置１０３は、受け付けたユーザ操作に応じた信号を情報処理装置１０２へ出力する。 The viewpoint input device 103 is an input device having a controller such as a joystick, and accepts a user operation for designating a virtual viewpoint corresponding to a virtual viewpoint image to be generated. Then, the viewpoint input device 103 outputs a signal corresponding to the received user operation to the information processing device 102.

情報処理装置１０２は、複数の撮影装置１０１による撮影に基づく複数の画像（複数視点画像）の画像データを取得する。複数視点画像に含まれる画像は、撮影画像であってもよいし、撮影画像に対して例えば所定の領域を抽出する処理などの画像処理が行われることで得られる画像であってもよい。そして情報処理装置１０２は、複数の撮影装置１０１から取得した複数視点画像と、視点入力装置１０３からの入力に基づいて取得した視点情報とに基づいて、指定された仮想視点から見た光景を表す仮想視点画像を生成する。なお、生成される仮想視点画像には、撮影画像に含まれない仮想コンテンツが表示されてもよい。 The information processing device 102 acquires image data of a plurality of images (multiple viewpoint images) based on the images taken by the plurality of photographing devices 101. The image included in the multi-viewpoint image may be a captured image, or may be an image obtained by performing image processing such as extraction of a predetermined region on the captured image. Then, the information processing device 102 represents a scene seen from a designated virtual viewpoint based on the plurality of viewpoint images acquired from the plurality of photographing devices 101 and the viewpoint information acquired based on the input from the viewpoint input device 103. Generate a virtual viewpoint image. The generated virtual viewpoint image may display virtual contents that are not included in the captured image.

仮想視点画像の生成に用いられる視点情報は、仮想視点の位置及び向き（視線方向）を示す情報である。具体的には、視点情報は、仮想視点の三次元位置（Ｘ，Ｙ，Ｚ軸の座標）を表すパラメータと、パン、チルト、及びロール方向における仮想視点の向きを表すパラメータとを含む、パラメータセットである。なお、視点情報の内容は上記に限定されない。例えば、視点情報としてのパラメータセットには、仮想視点の視野の大きさ（画角）を表すパラメータや、仮想視点画像の解像度に関するパラメータが含まれてもよい。また、視点情報は複数のパラメータセットを有していてもよい。例えば、視点情報が、仮想視点画像の動画を構成する複数のフレームにそれぞれ対応する複数のパラメータセットを有し、連続する複数の時点それぞれにおける仮想視点の位置及び向きを示す情報であってもよい。 The viewpoint information used to generate the virtual viewpoint image is information indicating the position and direction (line-of-sight direction) of the virtual viewpoint. Specifically, the viewpoint information is a parameter including a parameter representing the three-dimensional position (coordinates of the X, Y, Z axes) of the virtual viewpoint and a parameter representing the orientation of the virtual viewpoint in the pan, tilt, and roll directions. It is a set. The content of the viewpoint information is not limited to the above. For example, the parameter set as the viewpoint information may include a parameter indicating the size (angle of view) of the field of view of the virtual viewpoint and a parameter relating to the resolution of the virtual viewpoint image. Further, the viewpoint information may have a plurality of parameter sets. For example, the viewpoint information may have a plurality of parameter sets corresponding to a plurality of frames constituting a moving image of the virtual viewpoint image, and may be information indicating the position and orientation of the virtual viewpoint at each of a plurality of consecutive time points. ..

情報処理装置１０２は、例えば以下のような方法で仮想視点画像を生成する。まず、複数視点画像から、人物やボールなどの所定のオブジェクトに対応する前景領域を抽出した前景画像と、前景領域以外の背景領域を抽出した背景画像が取得される。また、所定のオブジェクトの三次元形状を表す前景モデルと前景モデルに色付けするためのテクスチャデータとが前景画像に基づいて生成され、競技場などの背景の三次元形状を表す背景モデルに色づけするためのテクスチャデータが背景画像に基づいて生成される。そして、前景モデルと背景モデルに対してテクスチャデータをマッピングし、視点情報が示す仮想視点に応じてレンダリングを行うことにより、仮想視点画像が生成される。ただし、仮想視点画像の生成方法はこれに限定されず、三次元モデルを用いずに撮影画像の射影変換により仮想視点画像を生成する方法など、種々の方法を用いることができる。 The information processing apparatus 102 generates a virtual viewpoint image by, for example, the following method. First, a foreground image in which a foreground area corresponding to a predetermined object such as a person or a ball is extracted from a plurality of viewpoint images and a background image in which a background area other than the foreground area is extracted are acquired. In addition, a foreground model representing the three-dimensional shape of a predetermined object and texture data for coloring the foreground model are generated based on the foreground image, and the background model representing the three-dimensional shape of the background such as a stadium is colored. Texture data is generated based on the background image. Then, a virtual viewpoint image is generated by mapping the texture data to the foreground model and the background model and performing rendering according to the virtual viewpoint indicated by the viewpoint information. However, the method of generating a virtual viewpoint image is not limited to this, and various methods such as a method of generating a virtual viewpoint image by projective transformation of a captured image without using a three-dimensional model can be used.

情報処理装置１０２により生成された仮想視点画像は、表示装置１０４へ出力される。表示装置１０４は、情報処理装置１０２から出力された画像を表示画面に表示する。表示装置１０４は、例えば、液晶ディスプレイやＬＥＤディスプレイ等である。 The virtual viewpoint image generated by the information processing device 102 is output to the display device 104. The display device 104 displays the image output from the information processing device 102 on the display screen. The display device 104 is, for example, a liquid crystal display, an LED display, or the like.

なお、画像処理システム１００の構成は図１に示した例に限定されない。例えば、情報処理装置１０２、視点入力装置１０３、及び表示装置１０４のいずれか２つ又は全部が一体となって構成されていてもよい。また、上述した前景画像と背景画像は、情報処理装置１０２が撮影画像からそれぞれ抽出しても良いし、それ以外の装置（例えば撮影装置１０１）が抽出しても良い。撮影装置１０１が前景画像と背景画像を抽出する場合、複数の撮影装置１０１のそれぞれが前景画像と背景画像の両方を抽出してもよい。あるいは、複数の撮影装置１０１のうち一部の撮影装置１０１が前景画像を抽出し、他の一部の撮影装置１０１が背景画像を抽出してもよい。また、複数の撮影装置１０１には、前景画像と背景画像のいずれも抽出しない撮影装置１０１が含まれていてもよい。 The configuration of the image processing system 100 is not limited to the example shown in FIG. For example, any two or all of the information processing device 102, the viewpoint input device 103, and the display device 104 may be integrally configured. Further, the foreground image and the background image described above may be extracted by the information processing device 102 from the captured image, or may be extracted by another device (for example, the photographing device 101). When the photographing device 101 extracts the foreground image and the background image, each of the plurality of photographing devices 101 may extract both the foreground image and the background image. Alternatively, some of the photographing devices 101 may extract the foreground image and some other photographing devices 101 may extract the background image. Further, the plurality of photographing devices 101 may include a photographing device 101 that does not extract either a foreground image or a background image.

［ハードウェア構成］
図２は、情報処理装置１０２のハードウェア構成例を示す図である。なお、視点入力装置１０３の構成も、以下で説明する情報処理装置１０２と同様である。情報処理装置１０２は、ＣＰＵ２１１、ＲＯＭ２１２、ＲＡＭ２１３、補助記憶装置２１４、表示部２１５、操作部２１６、通信Ｉ／Ｆ２１７、及びバス２１８を有する。 [Hardware configuration]
FIG. 2 is a diagram showing a hardware configuration example of the information processing apparatus 102. The configuration of the viewpoint input device 103 is also the same as that of the information processing device 102 described below. The information processing device 102 includes a CPU 211, a ROM 212, a RAM 213, an auxiliary storage device 214, a display unit 215, an operation unit 216, a communication I / F 217, and a bus 218.

ＣＰＵ２１１は、ＲＯＭ２１２またはＲＡＭ２１３に格納されているコンピュータプログラムおよびデータを用いて情報処理装置１０２の全体を制御することで、情報処理装置１０２の各機能を実現する。なお、情報処理装置１０２は、ＣＰＵとは異なる専用の１又は複数のハードウェアあるいはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有していてもよい。そして、ＣＰＵによる処理の少なくとも一部をＧＰＵあるいは専用のハードウェアが行うようにしても良い。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。 The CPU 211 realizes each function of the information processing apparatus 102 by controlling the entire information processing apparatus 102 by using a computer program and data stored in the ROM 212 or the RAM 213. The information processing device 102 may have one or a plurality of dedicated hardware or GPU (Graphics Processing Unit) different from the CPU. Then, at least a part of the processing by the CPU may be performed by the GPU or dedicated hardware. Examples of dedicated hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors).

ＲＯＭ２１２は、変更を必要としないプログラムなどを格納する。ＲＡＭ２１３は、補助記憶装置２１４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ２１７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置２１４は、例えばハードディスクドライブ等で構成され、画像データや音量データなどの種々のデータを記憶する。 The ROM 212 stores programs and the like that do not require changes. The RAM 213 temporarily stores programs and data supplied from the auxiliary storage device 214, data supplied from the outside via the communication I / F 217, and the like. The auxiliary storage device 214 is composed of, for example, a hard disk drive or the like, and stores various data such as image data and volume data.

表示部２１５は、例えば液晶ディスプレイやＬＥＤ等で構成され、ユーザが情報処理装置１０２を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部２１６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ２１１に入力する。ＣＰＵ２１１は、表示部２１５を制御する表示制御部、及び操作部２１６を制御する操作制御部として動作する。通信Ｉ／Ｆ２１７は、情報処理装置１０２の外部の装置との通信に用いられる。例えば、情報処理装置１０２が外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ２１７に接続される。情報処理装置１０２が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ２１７はアンテナを備える。バス２１８は、情報処理装置１０２の各部をつないで情報を伝達する。 The display unit 215 is composed of, for example, a liquid crystal display, an LED, or the like, and displays a GUI (Graphical User Interface) for the user to operate the information processing apparatus 102. The operation unit 216 is composed of, for example, a keyboard, a mouse, a joystick, a touch panel, or the like, and inputs various instructions to the CPU 211 in response to an operation by the user. The CPU 211 operates as a display control unit that controls the display unit 215 and an operation control unit that controls the operation unit 216. The communication I / F 217 is used for communication with an external device of the information processing device 102. For example, when the information processing device 102 is connected to an external device by wire, a communication cable is connected to the communication I / F 217. When the information processing device 102 has a function of wirelessly communicating with an external device, the communication I / F 217 includes an antenna. The bus 218 connects each part of the information processing apparatus 102 to transmit information.

本実施形態では表示部２１５と操作部２１６が情報処理装置１０２の内部に存在するものとするが、表示部２１５と操作部２１６との少なくとも一方が情報処理装置１０２の外部に別の装置として存在していてもよい。 In the present embodiment, it is assumed that the display unit 215 and the operation unit 216 exist inside the information processing device 102, but at least one of the display unit 215 and the operation unit 216 exists as another device outside the information processing device 102. You may be doing it.

［情報処理装置の機能構成］
図３は、情報処理装置１０２の機能構成例を示すブロック図である。情報処理装置１０２は、画像取得部３００、分離部３０１、記憶部３０２、画像生成部３０３、入力部３０４、計算部３０５、制御部３０６、および視点取得部３０７を有する。これらの構成要素は、ＣＰＵ２１１は、ＲＯＭ２１２や補助記憶装置２１４に記憶された制御プログラムを用いて、演算処理や各種プログラムを実行することで実現される。但し、これらの構成要素の少なくとも一部がＣＰＵ２１１とは異なる１又は複数の専用のハードウェアにより実現されてもよい。 [Functional configuration of information processing device]
FIG. 3 is a block diagram showing a functional configuration example of the information processing apparatus 102. The information processing apparatus 102 includes an image acquisition unit 300, a separation unit 301, a storage unit 302, an image generation unit 303, an input unit 304, a calculation unit 305, a control unit 306, and a viewpoint acquisition unit 307. These components are realized by the CPU 211 executing arithmetic processing and various programs using a control program stored in the ROM 212 or the auxiliary storage device 214. However, at least a part of these components may be realized by one or a plurality of dedicated hardware different from the CPU 211.

画像取得部３００は、複数の撮影装置１０１が撮影することによって得られた複数の撮影画像の画像データを取得する。なお、画像取得部３００は、撮影装置１０１による撮影中にリアルタイムに撮影画像を取得してもよいし、記録された撮影画像を撮影装置１０１によるイベントの撮影の完了後に取得してもよい。 The image acquisition unit 300 acquires image data of a plurality of captured images obtained by photographing by the plurality of photographing devices 101. The image acquisition unit 300 may acquire a captured image in real time during imaging by the imaging device 101, or may acquire the recorded captured image after the shooting of the event by the imaging device 101 is completed.

分離部３０１は、撮影装置１０１から取得した撮影画像それぞれから、選手や審判などの特定のオブジェクトの画像を抽出する。撮影画像から特定オブジェクトの画像を抽出する方法は限定されない。例えば、特定オブジェクトの抽出方法として背景差分法がある。背景差分法は、撮影画像と撮影画像に対応する背景画像との画素値を比較して、背景画像に含まれない特定オブジェクトを撮影画像から抽出する方法である。背景画像は、特定オブジェクトが存在しない状態の撮影領域の画像である。背景画像の取得方法としては、例えば、撮影領域が競技のフィールドである場合、競技の開始前におけるフィールドを撮影することによって背景画像が取得される。分離部３０１は、背景画像と抽出した特定オブジェクト画像とを、記憶部３０２に記憶させる。 The separation unit 301 extracts images of specific objects such as athletes and referees from each of the captured images acquired from the imaging device 101. The method of extracting the image of a specific object from the captured image is not limited. For example, there is a background subtraction method as a method for extracting a specific object. The background subtraction method is a method of comparing the pixel values of a captured image and a background image corresponding to the captured image and extracting a specific object not included in the background image from the captured image. The background image is an image of a shooting area in which a specific object does not exist. As a method of acquiring a background image, for example, when the shooting area is a field of competition, the background image is acquired by shooting the field before the start of the competition. The separation unit 301 stores the background image and the extracted specific object image in the storage unit 302.

記憶部３０２は、背景画像と特定オブジェクト画像に加え、制御部３０６から入力された複数の撮影装置１０１に関する撮影情報を記憶する。撮影情報には、三次元空間における撮影装置１０１それぞれの位置を示す情報と、撮影装置１０１それぞれの向き（撮影方向）を示す情報とが含まれる。撮影装置１０１の三次元位置は、三次元空間の高さ、幅、及び奥行きに対応する各方向における位置を示す三次元座標によって規定される。なお、撮影情報の内容は上記に限定されず、例えば撮影装置１０１の撮影に係る解像度に関する情報や、撮影装置の画角の広さに対応する焦点距離の情報などが含まれていてもよい。図７に、撮影情報の例を示す。図７の例では、撮影装置１０１それぞれの三次元位置を示すＸ座標、Ｙ座標、及びＺ座標と、撮影方向を示すＰａｎ値、Ｔｉｌｔ値、及びＲｏｌｌ値と、画角を示すＺｏｏｍ値（焦点距離）が撮影情報に含まれる。 In addition to the background image and the specific object image, the storage unit 302 stores the photographing information regarding the plurality of photographing devices 101 input from the control unit 306. The shooting information includes information indicating the position of each of the photographing devices 101 in the three-dimensional space and information indicating the direction (shooting direction) of each of the photographing devices 101. The three-dimensional position of the photographing apparatus 101 is defined by three-dimensional coordinates indicating the position in each direction corresponding to the height, width, and depth of the three-dimensional space. The content of the shooting information is not limited to the above, and may include, for example, information on the resolution related to the shooting of the shooting device 101, information on the focal length corresponding to the wide angle of view of the shooting device, and the like. FIG. 7 shows an example of shooting information. In the example of FIG. 7, the X coordinate, the Y coordinate, and the Z coordinate indicating the three-dimensional position of each of the photographing devices 101, the Pan value, the Til value, and the Roll value indicating the photographing direction, and the Zoom value (focal length) indicating the angle of view are shown. Distance) is included in the shooting information.

画像生成部３０３は、記憶部３０２から取得した特定オブジェクト画像と撮影情報とに基づいて、特定オブジェクトの三次元形状の推定結果を示す三次元形状データを生成する。三次元形状データの生成方法としては、例えば視体積交差法を用いることができるが、これに限定されない。そして画像生成部３０３は、特定オブジェクトの三次元形状データと、特定オブジェクト画像と、背景画像とを用いて、制御部３０６から入力された視点情報が示す仮想視点に応じた仮想視点画像を生成する。 The image generation unit 303 generates three-dimensional shape data showing the estimation result of the three-dimensional shape of the specific object based on the specific object image acquired from the storage unit 302 and the shooting information. As a method for generating three-dimensional shape data, for example, a visual volume crossing method can be used, but the method is not limited to this. Then, the image generation unit 303 generates a virtual viewpoint image according to the virtual viewpoint indicated by the viewpoint information input from the control unit 306 by using the three-dimensional shape data of the specific object, the specific object image, and the background image. ..

入力部３０４は、複数の撮影装置１０１の撮影情報を取得するためのデータの入力を受け付ける。計算部３０５は、入力部３０４に入力されたデータに基づいて複数の撮影装置１０１の撮影情報を取得する。入力部３０４に入力されるデータは、例えば、撮影領域内にマーカを設置して複数の撮影装置１０１で撮影することで得られる複数の撮影画像である。計算部３０５は、これらの撮影画像に基づいてカメラキャリブレーション処理を行うことで、撮影装置１０１それぞれに関する撮影情報を取得する。ただし、ユーザが情報処理装置１０２に撮影情報を入力してもよいし、撮影情報が記載されたデータファイルを入力部３０４が外部からインポートしてもよい。 The input unit 304 accepts the input of data for acquiring the shooting information of the plurality of shooting devices 101. The calculation unit 305 acquires the shooting information of the plurality of shooting devices 101 based on the data input to the input unit 304. The data input to the input unit 304 is, for example, a plurality of captured images obtained by installing a marker in the photographing area and photographing with a plurality of photographing devices 101. The calculation unit 305 acquires shooting information about each of the shooting devices 101 by performing a camera calibration process based on these shot images. However, the user may input the shooting information into the information processing apparatus 102, or the input unit 304 may import the data file in which the shooting information is described from the outside.

視点取得部３０７は、視点入力装置１０３から、ユーザ操作に応じた信号の入力を受け付ける。ユーザ操作に応じた信号は、例えば、仮想視点の位置又は向きの変化方向と変化量を示す。そして視点取得部３０７は、受け付けた入力に基づいて、仮想視点の位置及び向きを示す視点情報を生成する。視点情報が示す仮想視点の位置及び向きは、例えば、撮影領域が含まれるスタジアムの中央を原点とした座標系で表される。 The viewpoint acquisition unit 307 receives a signal input according to the user operation from the viewpoint input device 103. The signal corresponding to the user operation indicates, for example, a change direction and a change amount of the position or direction of the virtual viewpoint. Then, the viewpoint acquisition unit 307 generates viewpoint information indicating the position and direction of the virtual viewpoint based on the received input. The position and orientation of the virtual viewpoint indicated by the viewpoint information is represented by, for example, a coordinate system with the center of the stadium including the shooting area as the origin.

制御部３０６は、計算部３０５から取得した撮影装置１０１の撮影情報を記憶部３０２に出力し、視点取得部３０７から取得した仮想視点の視点情報を画像生成部３０３に出力する。また、制御部３０６は、画像生成部３０３により生成された仮想視点画像を取得し、取得した仮想視点画像を表示装置１０４へ出力して表示させる。なお、制御部３０６は、画像生成部３０３から取得した仮想視点画像に撮影情報に基づく画像を重畳して出力してもよいし、撮影領域に存在しない仮想コンテンツ（例えば仮想広告）の画像を重畳して出力してもよい。 The control unit 306 outputs the shooting information of the shooting device 101 acquired from the calculation unit 305 to the storage unit 302, and outputs the viewpoint information of the virtual viewpoint acquired from the viewpoint acquisition unit 307 to the image generation unit 303. Further, the control unit 306 acquires the virtual viewpoint image generated by the image generation unit 303, outputs the acquired virtual viewpoint image to the display device 104, and displays the acquired virtual viewpoint image. The control unit 306 may superimpose an image based on the shooting information on the virtual viewpoint image acquired from the image generation unit 303 and output it, or superimpose an image of virtual content (for example, a virtual advertisement) that does not exist in the shooting area. And output.

また、制御部３０６は、情報処理装置１０２又は表示装置１０４に対するユーザ操作に応じた入力を受け付けて、仮想視点画像の画質が向上するように仮想視点を変更するための制御を行う。この制御について、図６を用いて説明する。図６は、表示装置１０４により表示される画面の例を示す。表示画面６００には、情報処理装置１０２により生成された仮想視点画像を含む表示画像６０１と操作可能なボタン６０２が含まれる。ボタン６０２がユーザ操作により押下されると、制御部３０６は、仮想視点画像の画質が向上するように、仮想視点を動かすべき方向をユーザに通知するための矢印６０３を表示画面６００に表示させる。 Further, the control unit 306 receives an input to the information processing device 102 or the display device 104 according to the user operation, and controls to change the virtual viewpoint so that the image quality of the virtual viewpoint image is improved. This control will be described with reference to FIG. FIG. 6 shows an example of a screen displayed by the display device 104. The display screen 600 includes a display image 601 including a virtual viewpoint image generated by the information processing apparatus 102 and an operable button 602. When the button 602 is pressed by a user operation, the control unit 306 causes the display screen 600 to display an arrow 603 for notifying the user of the direction in which the virtual viewpoint should be moved so that the image quality of the virtual viewpoint image is improved.

本実施形態における仮想視点画像の画質の低下の要因について説明する。例えば、ラグビーの試合が行われているフィールドを複数の撮影装置１０１で撮影することで得られた複数の撮影画像を用いて仮想視点画像を生成する場合を考える。ラグビーにおけるスクラムなどのシーンでは複数の選手６０４が密集しており、選手により遮蔽されて少なくとも何れかの撮影装置１０１が撮影できない位置が存在する。すると、撮影画像から三次元形状データを生成する際に、その位置に関しての三次元形状の推定に誤りが生じてしまう。一方、仮想視点を自由に設定すると、そのような撮影できていない位置が仮想視点からの視界に含まれることがある。この場合には、仮想視点に応じた仮想視点画像を生成する処理において、三次元形状の不正確な部分に色付けがされてしまうため、仮想視点画像内の撮影できていない位置にゼリー状のノイズ６０５が発生し、仮想視点画像の画質が低下する。 The factors of deterioration of the image quality of the virtual viewpoint image in the present embodiment will be described. For example, consider a case where a virtual viewpoint image is generated using a plurality of captured images obtained by photographing a field in which a rugby game is played with a plurality of photographing devices 101. In a scene such as scrum in rugby, a plurality of athletes 604 are densely packed, and there is a position where at least one of the imaging devices 101 cannot photograph because the athletes are shielded by the athletes. Then, when the three-dimensional shape data is generated from the captured image, an error occurs in the estimation of the three-dimensional shape with respect to the position. On the other hand, if the virtual viewpoint is freely set, such a position that cannot be photographed may be included in the field of view from the virtual viewpoint. In this case, in the process of generating the virtual viewpoint image according to the virtual viewpoint, the inaccurate part of the three-dimensional shape is colored, so that jelly-like noise is generated at the position in the virtual viewpoint image that cannot be captured. 605 is generated, and the image quality of the virtual viewpoint image is deteriorated.

このようなノイズ６０５を抑制して仮想視点画像の画質を向上させるためには、選手により遮蔽されて撮影装置１０１が撮影できない位置が仮想視点からの視界に含まれないようにすればよい。あるいは、仮想視点からの視界に含まれるそのような撮影できない位置を減らせばよい。そこで制御部３０６は、ボタン６０２が押された場合に、何れかの撮影装置１０１の光軸上の位置に仮想視点を近づけるために仮想視点をどの方向に動かしたらよいかを示す矢印６０３を表示させる制御を行う。表示画面６００を見ているユーザは、矢印６０３の表示に従って視点入力装置１０３を操作し、仮想視点を撮影装置１０１の光軸上の位置に移動させる。仮想視点が撮影装置１０１の光軸上に位置すれば、仮想視点から見える範囲はオブジェクトに遮蔽されることなく撮影装置１０１から撮影できるため、仮想視点画像に含まれるノイズが少なくなる。 In order to suppress such noise 605 and improve the image quality of the virtual viewpoint image, the position shielded by the athlete and cannot be photographed by the photographing apparatus 101 may not be included in the field of view from the virtual viewpoint. Alternatively, it is sufficient to reduce such positions that cannot be photographed in the field of view from the virtual viewpoint. Therefore, the control unit 306 displays an arrow 603 indicating in which direction the virtual viewpoint should be moved in order to bring the virtual viewpoint closer to the position on the optical axis of any of the photographing devices 101 when the button 602 is pressed. Control to make it. The user looking at the display screen 600 operates the viewpoint input device 103 according to the display of the arrow 603 to move the virtual viewpoint to a position on the optical axis of the photographing device 101. If the virtual viewpoint is located on the optical axis of the photographing device 101, the range visible from the virtual viewpoint can be photographed from the photographing device 101 without being obscured by the object, so that the noise included in the virtual viewpoint image is reduced.

本実施形態における撮影装置１０１の光軸は、撮影装置１０１が備えるレンズの中心と撮像センサの中心とを通る直線である。言い換えると、撮影装置１０１の光軸は、撮影装置１０１の位置と、その撮影装置１０１により取得される撮影画像の中心に写る位置とを通る直線である。 The optical axis of the photographing apparatus 101 in the present embodiment is a straight line passing through the center of the lens included in the photographing apparatus 101 and the center of the imaging sensor. In other words, the optical axis of the photographing device 101 is a straight line passing through the position of the photographing device 101 and the position reflected in the center of the photographed image acquired by the photographing device 101.

また、撮影装置１０１の撮影方向と仮想視点からの視線方向とが大きく異なると、撮影装置１０１の撮影範囲に含まれない領域が仮想視点からの視界に含まれてしまうことがある。そこで制御部３０６は、仮想視点からの視線方向を撮影装置１０１の撮影方向に近づけるための制御を行ってもよい。具体的には、制御部３０６は、仮想視点からの視線方向を撮影装置１０１の撮影方向に近づけるために視線方向をどの方向に変化させればよいかを示す矢印を表示画面６００に表示させてもよい。 Further, if the shooting direction of the photographing device 101 and the line-of-sight direction from the virtual viewpoint are significantly different, a region not included in the photographing range of the photographing device 101 may be included in the field of view from the virtual viewpoint. Therefore, the control unit 306 may perform control for bringing the line-of-sight direction from the virtual viewpoint closer to the shooting direction of the shooting device 101. Specifically, the control unit 306 displays on the display screen 600 an arrow indicating which direction the line-of-sight direction should be changed in order to bring the line-of-sight direction from the virtual viewpoint closer to the shooting direction of the photographing device 101. May be good.

なお、仮想視点を移動させるべき方向や視線方向を変化させるべき方向をユーザに通知するための表示の態様は図６の例に限定されず、矢印以外の指標による表示が行われてもよい。また、この指標が仮想視点画像の外側に表示されてもよい。また、仮想視点画像の画質が向上するように仮想視点を変更するための制御部３０６による制御は、画像の表示制御に限定されない。例えば、仮想視点の変更をユーザに促す音声をスピーカから出力させてもよい。また例えば、制御部３０６は、ボタン６０２が押されたことに応じて仮想視点を撮影装置１０１の光軸上の位置に自動で移動させてもよい。この場合、制御部３０６は、瞬間的に仮想視点を移動させてもよいし、連続的に移動させてもよい。また、制御部３０６は、ユーザがボタン６０２を押している期間中だけ仮想視点を撮影装置１０１の光軸上の位置に徐々に近づけていってもよい。同様に、制御部３０６は、仮想視点画像に対応する視線方向が撮影装置１０１の撮影方向に近づくように自動で視線方向を変更してもよい。 The display mode for notifying the user of the direction in which the virtual viewpoint should be moved or the direction in which the line-of-sight direction should be changed is not limited to the example of FIG. 6, and display may be performed by an index other than the arrow. Further, this index may be displayed outside the virtual viewpoint image. Further, the control by the control unit 306 for changing the virtual viewpoint so as to improve the image quality of the virtual viewpoint image is not limited to the image display control. For example, a voice prompting the user to change the virtual viewpoint may be output from the speaker. Further, for example, the control unit 306 may automatically move the virtual viewpoint to a position on the optical axis of the photographing device 101 in response to the button 602 being pressed. In this case, the control unit 306 may move the virtual viewpoint instantaneously or continuously. Further, the control unit 306 may gradually bring the virtual viewpoint closer to the position on the optical axis of the photographing device 101 only while the user is pressing the button 602. Similarly, the control unit 306 may automatically change the line-of-sight direction so that the line-of-sight direction corresponding to the virtual viewpoint image approaches the shooting direction of the photographing device 101.

情報処理装置１０２の構成は図３に示す例に限定されない。例えば、画像取得部３００、分離部３０１、視点取得部３０７、及び画像生成部３０３と同様の機能を有する他の１つまたは複数の装置によって仮想視点画像の生成が行われてもよい。その場合、情報処理装置１０２は、他の装置により生成された仮想視点画像のデータを取得し、所定の加工を行って表示装置１０４へ出力してもよい。 The configuration of the information processing apparatus 102 is not limited to the example shown in FIG. For example, the virtual viewpoint image may be generated by the image acquisition unit 300, the separation unit 301, the viewpoint acquisition unit 307, and another device having the same function as the image generation unit 303. In that case, the information processing device 102 may acquire the data of the virtual viewpoint image generated by another device, perform predetermined processing, and output the data to the display device 104.

［動作フロー］
図８は、情報処理装置１０２の動作について説明するためのフローチャートである。図８のフローチャートで示される一連の処理は、情報処理装置１０２のＣＰＵ２１１がＲＯＭ２１２に記憶されているプログラムコードをＲＡＭ２１３に展開し実行することにより行われる。また、図８に示す処理の一部または全部を１又は複数の専用のハードウェアで実現してもよい。図８に示す処理は、複数の撮影装置１０１による撮影が行われ、情報処理装置１０２が仮想視点画像を表示するための指示を受け付けたタイミングで開始される。ただし、図８に示す処理の開始タイミングは上記に限定されない。図８に示す処理フローは、例えば仮想視点画像の動画フレームごとに繰り返し実行される。 [Operation flow]
FIG. 8 is a flowchart for explaining the operation of the information processing apparatus 102. The series of processes shown in the flowchart of FIG. 8 is performed by the CPU 211 of the information processing apparatus 102 expanding the program code stored in the ROM 212 into the RAM 213 and executing the program code. Further, a part or all of the processing shown in FIG. 8 may be realized by one or a plurality of dedicated hardware. The process shown in FIG. 8 is started at the timing when shooting is performed by a plurality of shooting devices 101 and the information processing device 102 receives an instruction for displaying a virtual viewpoint image. However, the start timing of the process shown in FIG. 8 is not limited to the above. The processing flow shown in FIG. 8 is repeatedly executed, for example, for each moving image frame of the virtual viewpoint image.

Ｓ８００において、画像取得部３００は、複数の撮影装置１０１による撮影に基づく複数の撮影画像を取得する。Ｓ８０１において、視点取得部３０７は、仮想視点の位置及び向きを示す視点情報を取得する。Ｓ８０２において、入力部３０４及び計算部３０５は、複数の撮影装置それぞれの位置及び向きを示す撮影情報を取得する。Ｓ８０３において、画像生成部３０３は、複数の撮影画像と撮影情報とを用いて、視点情報が示す仮想視点の位置及び向きに応じた仮想視点画像を生成する。 In S800, the image acquisition unit 300 acquires a plurality of captured images based on the capture by the plurality of imaging devices 101. In S801, the viewpoint acquisition unit 307 acquires viewpoint information indicating the position and orientation of the virtual viewpoint. In S802, the input unit 304 and the calculation unit 305 acquire shooting information indicating the position and orientation of each of the plurality of shooting devices. In S803, the image generation unit 303 uses a plurality of captured images and captured information to generate a virtual viewpoint image according to the position and orientation of the virtual viewpoint indicated by the viewpoint information.

Ｓ８０４において、制御部３０６は、仮想視点画像の画質の低下の抑制を指示するユーザ操作が行われたかを判定する。例えばユーザは図６の表示画像６０１においてゼリー状のノイズが目立つ場合に、ノイズを低減するためのボタン６０２を押下する。ボタン６０２が押されなかった場合、Ｓ８０７に進み、制御部３０６はＳ８０３で生成された仮想視点画像を表示装置１０４へ出力する。一方、ボタン６０２が押された場合には、Ｓ８０５へ進む。 In S804, the control unit 306 determines whether or not a user operation instructing the suppression of deterioration of the image quality of the virtual viewpoint image has been performed. For example, when the jelly-like noise is conspicuous in the display image 601 of FIG. 6, the user presses the button 602 for reducing the noise. If the button 602 is not pressed, the process proceeds to S807, and the control unit 306 outputs the virtual viewpoint image generated in S803 to the display device 104. On the other hand, when the button 602 is pressed, the process proceeds to S805.

Ｓ８０５において、制御部３０６は、仮想視点の視点情報と撮影装置１０１の撮影情報とに基づいて、複数の撮影装置１０１それぞれの光軸のうち現在の仮想視点の位置に最も近い光軸を特定する。例えば、図９に示すように、仮想視点４０１に最も近い光軸である撮影装置１０１ｉの光軸１０００が特定される。撮影装置１０１それぞれの光軸は、撮影情報が示す撮影装置１０１それぞれの位置及び向きから算出される。 In S805, the control unit 306 identifies the optical axis closest to the position of the current virtual viewpoint among the optical axes of each of the plurality of photographing devices 101 based on the viewpoint information of the virtual viewpoint and the photographing information of the photographing device 101. .. For example, as shown in FIG. 9, the optical axis 1000 of the photographing apparatus 101i, which is the optical axis closest to the virtual viewpoint 401, is specified. The optical axis of each of the photographing devices 101 is calculated from the position and orientation of each of the photographing devices 101 indicated by the photographing information.

Ｓ８０６において、制御部３０６は、Ｓ８０５で特定された光軸が視点情報から特定される現在の仮想視点位置に対してどの方向に存在するかを示すガイドをＳ８０３で生成された仮想視点画像に重畳する。Ｓ８０７において、制御部３０６は、ガイドが重畳された仮想視点画像を表示装置１０４へ出力して表示させる。例えば図６の例では、仮想視点の向きを基準として仮想視点より右方向に撮影装置１０１の光軸が存在することを示すガイドである矢印６０３が重畳された仮想視点画像が表示されている。 In S806, the control unit 306 superimposes a guide indicating in which direction the optical axis specified in S805 exists with respect to the current virtual viewpoint position specified from the viewpoint information on the virtual viewpoint image generated in S803. do. In S807, the control unit 306 outputs the virtual viewpoint image on which the guide is superimposed to the display device 104 and displays it. For example, in the example of FIG. 6, a virtual viewpoint image is displayed on which an arrow 603, which is a guide indicating that the optical axis of the photographing apparatus 101 exists to the right of the virtual viewpoint with respect to the direction of the virtual viewpoint, is superimposed.

ガイドが重畳された仮想視点画像が表示されると、ユーザはそのガイドに従って仮想視点を移動させるための操作を行う。この操作に応じて移動した仮想視点に対応する仮想視点画像は、Ｓ８０５で特定された光軸を有する撮影装置１０１による撮影に基づく画像と三次元形状データとを用いたレンダリング処理により生成される。つまり、仮想視点から見た光景を表す仮想視点画像が、仮想視点から見た光景に近しい光景を表す撮影画像に基づいて生成される。そのため、仮想視点画像において三次元形状推定の誤りの影響が表れにくく、ゼリー状のノイズの発生が抑制され、仮想視点画像の画質を向上させることができる。 When the virtual viewpoint image on which the guide is superimposed is displayed, the user performs an operation for moving the virtual viewpoint according to the guide. The virtual viewpoint image corresponding to the virtual viewpoint moved in response to this operation is generated by a rendering process using an image based on shooting by the shooting device 101 having an optical axis specified in S805 and three-dimensional shape data. That is, a virtual viewpoint image representing a scene seen from a virtual viewpoint is generated based on a photographed image representing a scene close to the scene seen from the virtual viewpoint. Therefore, the influence of the error of the three-dimensional shape estimation is less likely to appear in the virtual viewpoint image, the generation of jelly-like noise is suppressed, and the image quality of the virtual viewpoint image can be improved.

［変形例］
上記の説明では、仮想視点に最も近い光軸を特定してその光軸に仮想視点を近づける場合の例を説明したが、仮想視点画像の画質を向上させるための仮想視点の制御はこれに限定されない。例えば、制御部３０６は、現在の仮想視点の位置から所定距離以内にある光軸を複数特定し、特定された複数の光軸の中から高解像度で選手６０４を撮影可能な撮影装置１０１（例えば望遠カメラ）の光軸を選択してもよい。そして制御部３０６は、選択された光軸上の位置に仮想視点が近づくようにガイド表示を行ってもよい。 [Modification example]
In the above explanation, an example of specifying the optical axis closest to the virtual viewpoint and bringing the virtual viewpoint closer to the optical axis has been described, but the control of the virtual viewpoint for improving the image quality of the virtual viewpoint image is limited to this. Not done. For example, the control unit 306 identifies a plurality of optical axes within a predetermined distance from the position of the current virtual viewpoint, and can photograph the athlete 604 with high resolution from the specified plurality of optical axes (for example, a photographing device 101 (for example). You may select the optical axis of the telephoto camera). Then, the control unit 306 may perform a guide display so that the virtual viewpoint approaches the position on the selected optical axis.

また例えば、制御部３０６は、仮想視点画像に対応する視線方向に最も近い撮影方向を有する撮影装置１０１を特定して、その撮影装置１０１の光軸上の位置に仮想視点が近づくようにガイド表示を行ってもよい。具体的には、仮想視点からの視線方向に対応するベクトルと撮影装置１０１の撮影方向に対応するベクトルとの内積を複数の撮影装置１０１それぞれについて計算して、内積が最も大きくなる撮影装置１０１が特定されてもよい。 Further, for example, the control unit 306 identifies an imaging device 101 having an imaging direction closest to the line-of-sight direction corresponding to the virtual viewpoint image, and guides and displays the virtual viewpoint so as to approach the position on the optical axis of the imaging device 101. May be done. Specifically, the inner product of the vector corresponding to the line-of-sight direction from the virtual viewpoint and the vector corresponding to the shooting direction of the shooting device 101 is calculated for each of the plurality of shooting devices 101, and the shooting device 101 having the largest inner product It may be specified.

また例えば、制御部３０６は、仮想視点に最も近い位置の撮影装置１０１を特定して、その撮影装置１０１の光軸上の位置に仮想視点が近づくようにガイド表示を行ってもよい。あるいは、仮想視点に最も近い撮影装置１０１の位置に仮想視点が近づくようにガイド表示を行ってもよい。 Further, for example, the control unit 306 may specify the photographing device 101 at the position closest to the virtual viewpoint, and perform a guide display so that the virtual viewpoint approaches the position on the optical axis of the photographing device 101. Alternatively, the guide display may be performed so that the virtual viewpoint approaches the position of the photographing device 101 closest to the virtual viewpoint.

なお、上記の何れの例においても、制御部３０６はガイド表示に替えて、又はガイド表示と共に、ボタン６０２が押されたことに応じて仮想視点を自動で移動させてもよい。すなわち、制御部３０６は、仮想視点に最も近い光軸上の位置、仮想視点の向きと最も近い向きの撮影装置１０１の光軸上の位置、解像度の高い撮影装置１０１の光軸上の位置、又は仮想視点に最も近い撮影装置１０１の位置に、仮想視点を移動させてもよい。ガイド表示のメリットとしては、画質の向上と仮想視点の自由な設定とのどちらを優先するかをユーザが選択できる点がある。一方、自動で仮想視点を移動させることのメリットとしては、ユーザの操作が簡単になる点や、低画質の仮想視点画像が表示される時間を短くできる点がある。 In any of the above examples, the control unit 306 may automatically move the virtual viewpoint in response to the button 602 being pressed, instead of the guide display or together with the guide display. That is, the control unit 306 has a position on the optical axis closest to the virtual viewpoint, a position on the optical axis of the photographing device 101 in the direction closest to the direction of the virtual viewpoint, and a position on the optical axis of the high-resolution photographing device 101. Alternatively, the virtual viewpoint may be moved to the position of the photographing device 101 closest to the virtual viewpoint. The merit of the guide display is that the user can select whether to prioritize the improvement of the image quality or the free setting of the virtual viewpoint. On the other hand, the merit of automatically moving the virtual viewpoint is that the user's operation becomes easy and the time for displaying the low-quality virtual viewpoint image can be shortened.

また、上記の説明においては、情報処理装置１０２は、ユーザによりボタン６０２が押されたことに応じて、仮想視点画像にゼリー状のノイズが発生していると判定し、仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行うものとした。ただし、情報処理装置１０２は、ノイズが発生しているかを判定することなく（すなわちボタン６０２が押されたかを判定することなく）、常に図６の矢印６０３のようなガイド表示を行ってもよい。この場合には、表示画面６００にボタン６０２が含まれなくてもよい。 Further, in the above description, the information processing apparatus 102 determines that jelly-like noise is generated in the virtual viewpoint image in response to the button 602 being pressed by the user, and captures the virtual viewpoint 101. It was decided to perform control to bring it closer to the position on the optical axis of. However, the information processing apparatus 102 may always display the guide as shown by the arrow 603 in FIG. 6 without determining whether noise is generated (that is, without determining whether the button 602 is pressed). .. In this case, the display screen 600 may not include the button 602.

また、情報処理装置１０２が仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行うか否かの条件は、上記に限定されない。例えば情報処理装置１０２は、図８のＳ８０４におけるボタン押下の判定の代わりに、Ｓ８０３において生成された仮想視点画像を解析してノイズが含まれる否かの判定を行ってもよい。そして情報処理装置１０２は、仮想視点画像にノイズが含まれる場合に、仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行ってもよい。 Further, the condition of whether or not the information processing device 102 controls to bring the virtual viewpoint closer to the position on the optical axis of the photographing device 101 is not limited to the above. For example, the information processing apparatus 102 may analyze the virtual viewpoint image generated in S803 to determine whether or not noise is included, instead of determining that the button is pressed in S804 of FIG. Then, the information processing apparatus 102 may perform control for bringing the virtual viewpoint closer to the position on the optical axis of the photographing apparatus 101 when the virtual viewpoint image contains noise.

また例えば、情報処理装置１０２は、図８のＳ８０４におけるボタン押下の判定の代わりに、撮影領域におけるオブジェクトの状況が仮想視点画像にノイズが発生しやすい状況であるかを判定してもよい。そして情報処理装置１０２は、仮想視点画像にノイズが発生しやすい状況である場合に、仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行ってもよい。 Further, for example, the information processing apparatus 102 may determine whether the state of the object in the photographing area is a situation in which noise is likely to occur in the virtual viewpoint image, instead of the determination of pressing the button in S804 of FIG. Then, the information processing apparatus 102 may perform control for bringing the virtual viewpoint closer to the position on the optical axis of the photographing apparatus 101 when the virtual viewpoint image is likely to generate noise.

仮想視点画像にノイズが発生しやすい状況だと判定される場合とは、例えば、撮影画像に基づいて生成された三次元形状データから、仮想視点からの視界に含まれる領域に閾値より大きいサイズのオブジェクトが存在すると判断される場合である。この閾値は、例えば人間の大きさに基づいて決められた値である。すなわち、通常の人間のサイズより大きいオブジェクトが存在する場合、スクラムのように密集した複数の選手がまとめて一つのオブジェクトとなっており、前述したように少なくとも何れかの撮影装置１０１が撮影できない位置が存在する可能性が高い。そのため、このような状況では仮想視点画像にノイズが発生しやすい。 When it is determined that noise is likely to occur in the virtual viewpoint image, for example, the size of the area included in the view from the virtual viewpoint is larger than the threshold value from the three-dimensional shape data generated based on the captured image. This is the case when it is determined that the object exists. This threshold value is, for example, a value determined based on the size of a human being. That is, when there is an object larger than the size of a normal human being, a plurality of players who are densely packed like a scrum are collectively formed as one object, and as described above, at least one of the photographing devices 101 cannot photograph. Is likely to exist. Therefore, noise is likely to occur in the virtual viewpoint image in such a situation.

また、仮想視点画像にノイズが発生しやすい状況だと判定される場合の別の例としては、三次元形状データが、仮想視点からの視界に含まれる領域に互いの距離が閾値より小さい複数のオブジェクトが存在することを示している場合である。この場合も、スクラムの例と同様に、オブジェクトにより遮蔽されて撮影装置１０１により撮影できない位置が存在する可能性が高く、仮想視点画像にノイズが発生しやすい。 Further, as another example of the case where it is determined that the virtual viewpoint image is prone to noise, a plurality of three-dimensional shape data whose distances from each other are smaller than the threshold value in the region included in the field of view from the virtual viewpoint. This is the case when it indicates that the object exists. In this case as well, as in the case of the scrum, there is a high possibility that there is a position that is shielded by the object and cannot be photographed by the photographing apparatus 101, and noise is likely to occur in the virtual viewpoint image.

また例えば、情報処理装置１０２は、図８のＳ８０４におけるボタン押下の判定の代わりに、仮想視点画像のノイズが視認されやすい状況であるかを判定してもよい。そして情報処理装置１０２は、仮想視点画像のノイズが視認されやすい状況である場合に、仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行ってもよい。仮想視点画像のノイズが視認されやすい状況だと判定される場合とは、例えば、仮想視点の位置や向きの変化の速さが閾値より遅い場合である。 Further, for example, the information processing apparatus 102 may determine whether the noise of the virtual viewpoint image is easily visible instead of the determination of the button press in S804 of FIG. Then, the information processing apparatus 102 may perform control for bringing the virtual viewpoint closer to the position on the optical axis of the photographing apparatus 101 when the noise of the virtual viewpoint image is easily visually recognized. The case where it is determined that the noise of the virtual viewpoint image is easily visible is, for example, the case where the speed of change in the position or orientation of the virtual viewpoint is slower than the threshold value.

上述した各例のように、情報処理装置１０２は、仮想視点にノイズが発生している場合や、ノイズが発生しやすい場合や、ノイズが視認されやすい場合にのみ、仮想視点を撮影装置１０１の光軸上の位置に近づけるための制御を行ってもよい。これにより、ユーザによる仮想視点の操作が必要以上に制限されることや、表示画像６０１に表示された矢印６０３が仮想視点画像の視聴を妨害することを抑制しつつ、仮想視点画像の画質の低下を抑制することができる。 As in each of the above-mentioned examples, the information processing apparatus 102 captures the virtual viewpoint of the photographing device 101 only when noise is generated in the virtual viewpoint, noise is likely to be generated, or noise is easily visually recognized. Control may be performed to bring it closer to a position on the optical axis. As a result, the operation of the virtual viewpoint image by the user is restricted more than necessary, and the arrow 603 displayed on the display image 601 is suppressed from interfering with the viewing of the virtual viewpoint image, while the image quality of the virtual viewpoint image is deteriorated. Can be suppressed.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ等）によっても実現可能である。また、そのプログラムをコンピュータにより読み取り可能な記録媒体に記録して提供してもよい。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC or the like) that realizes one or more functions. Further, the program may be recorded and provided on a recording medium readable by a computer.

１００画像処理システム
１０１撮影装置
１０２情報処理装置
１０３視点入力装置
１０４表示装置 100 Image processing system 101 Imaging device 102 Information processing device 103 Viewpoint input device 104 Display device

Claims

Specifying the position of the virtual viewpoint corresponding to the virtual viewpoint image generated based on the result of estimating the three-dimensional shape of the object in the shooting area by shooting the shooting area from a plurality of directions with a plurality of shooting devices. Means and
A predetermined position related to deterioration of image quality due to noise generated in the virtual viewpoint image at a position within the photographing region where an error in estimating a three-dimensional shape occurs due to the inability to photograph from at least one of the plurality of photographing devices. Judgment means for determining whether the condition of
When it is determined by the determination means that the predetermined condition is satisfied, the position of the virtual viewpoint corresponding to the virtual viewpoint image is determined by the specific means to be included in the plurality of photographing devices. An information processing device characterized by having a control means for controlling the position on the optical axis.

The specific means identifies the line-of-sight direction from the virtual viewpoint corresponding to the virtual viewpoint image, and determines the direction of the line of sight.
When the determination means determines that the predetermined condition is satisfied, the control means sets the line-of-sight direction from the virtual viewpoint corresponding to the virtual viewpoint image from the line-of-sight direction specified by the specific means. The information processing device according to claim 1, wherein the information processing device is controlled so as to bring the image pickup device included in the image pickup device closer to the image pickup direction.

Three-dimensional shape data representing the result of estimating the three-dimensional shape of an object in the photographing area from a virtual viewpoint image corresponding to a virtual viewpoint approaching a position on the optical axis of the photographing apparatus according to control by the control means. The information processing apparatus according to claim 1 or 2, wherein the information processing apparatus has a generation means generated by a rendering process using the image taken by the photographing apparatus and an image based on the photographing by the photographing apparatus.

When the determination means determines that the predetermined condition is satisfied, the control means determines the position of the virtual viewpoint corresponding to the virtual viewpoint image by the specific means among the optical axes of the plurality of photographing devices. The information processing apparatus according to any one of claims 1 to 3, wherein control is performed to bring the image closer to the position on the optical axis closest to the specified position.

The specific means identifies the line-of-sight direction from the virtual viewpoint corresponding to the virtual viewpoint image, and determines the direction of the line of sight.
When the determination means determines that the predetermined condition is satisfied, the control means determines the position of the virtual viewpoint corresponding to the virtual viewpoint image by the specific means among the plurality of photographing devices. The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus is controlled so as to approach a position on the optical axis of the photographing apparatus closest to the specified line-of-sight direction.

The control according to any one of claims 1 to 5, wherein the control by the control means includes a control for moving the position of the virtual viewpoint corresponding to the virtual viewpoint image toward a position on the optical axis of the photographing apparatus. The information processing apparatus according to any one of the following items.

The control by the control means includes a control for notifying the user to perform an operation of bringing the position of the virtual viewpoint corresponding to the virtual viewpoint image closer to the position on the optical axis of the photographing device. The information processing apparatus according to any one of claims 1 to 6.

The control by the control means includes the control of displaying an image indicating the direction of the position on the optical axis of the photographing apparatus with respect to the position specified by the specific means, according to claims 1 to 7. The information processing apparatus according to any one of the following items.

The information processing according to any one of claims 1 to 8, wherein the predetermined condition includes a plurality of objects whose distances from each other are shorter than a threshold value in the photographing region. Device.

The information processing apparatus according to any one of claims 1 to 9, wherein the predetermined condition includes determining that an object larger than the threshold value exists in the photographing area.

The information processing according to any one of claims 1 to 10, wherein the predetermined condition includes that the speed of change of the virtual viewpoint corresponding to the virtual viewpoint image is slower than the threshold value. Device.

The information processing apparatus according to any one of claims 1 to 11, wherein the predetermined condition includes a user operation for instructing the suppression of deterioration of the image quality.

A specific step of specifying the position of a virtual viewpoint corresponding to a virtual viewpoint image generated based on the result of estimating the three-dimensional shape of an object in the shooting area by shooting the shooting area from different directions with a plurality of shooting devices. When,
A predetermined position related to deterioration of image quality due to noise generated in the virtual viewpoint image at a position within the photographing region where an error in estimating a three-dimensional shape occurs due to the inability to photograph from at least one of the plurality of photographing devices. Judgment process to determine whether the conditions of
When it is determined in the determination step that the predetermined condition is satisfied, the position of the virtual viewpoint corresponding to the virtual viewpoint image is determined by the imaging device included in the plurality of imaging devices from the position specified in the specific step. An information processing method characterized by having a control process for controlling the position on the optical axis.

In the specific step, the line-of-sight direction from the virtual viewpoint corresponding to the virtual viewpoint image is specified.
In the control step, when it is determined in the determination step that the predetermined condition is satisfied, the line-of-sight direction from the virtual viewpoint corresponding to the virtual viewpoint image is set to a plurality of line-of-sight directions from the line-of-sight direction specified in the specific step. 13. The information processing method according to claim 13, wherein control is performed to bring the image pickup device included in the image pickup apparatus closer to the image pickup direction.

Three-dimensional shape data representing the result of estimating the three-dimensional shape of an object in the photographing area from a virtual viewpoint image corresponding to a virtual viewpoint approaching a position on the optical axis of the photographing apparatus according to the control in the control step. The information processing method according to claim 13, wherein the information processing method includes a generation step of generating the image and an image based on the image taken by the photographing apparatus by a rendering process.

A program for making a computer function as each means of the information processing apparatus according to any one of claims 1 to 12.