JP2019057070A

JP2019057070A - Image processing device, image processing method, and program

Info

Publication number: JP2019057070A
Application number: JP2017180439A
Authority: JP
Inventors: 花本　貴志; Takashi Hanamoto; 貴志花本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2019-04-11

Abstract

To enable the position of a subject in a real space to be specifiable when designating the position of a viewpoint pertaining to a virtual viewpoint image.SOLUTION: An image processing device of the present invention comprises: acquisition means for acquiring, on the basis of an image in which imaging target areas of a plurality of cameras are captured, the position information of an object seen in the image; display control means for causing information that indicates the object to be displayed at a position, in a computer graphics indicating the imaging target areas, that corresponds to the position information acquired by the acquisition means; and acceptance means for accepting the designation of a position of viewpoint pertaining to a virtual viewpoint image while the information indicating the object is being displayed by the display control means.SELECTED DRAWING: Figure 5

Description

本発明は、仮想視点画像を生成する場合における、仮想カメラパスを設定するための画像処理の技術に関する。 The present invention relates to an image processing technique for setting a virtual camera path when generating a virtual viewpoint image.

複数台の実カメラ画像を用いて、３次元空間内に仮想的に配置したカメラから撮影された画像を再現して、実カメラの設置位置から撮影した画像だけでなく任意の視点の画像を仮想視点画像として生成する技術がある。特許文献１では、複数のカメラにより撮影された画像から仮想視点画像を生成する技術が開示されている。 Using multiple real camera images, reproduce the image taken from the camera virtually placed in the three-dimensional space, and not only the image taken from the actual camera installation position but also the image of any viewpoint There is a technique for generating a viewpoint image. Patent Document 1 discloses a technique for generating a virtual viewpoint image from images taken by a plurality of cameras.

特開２００８−２１７２４３号公報JP 2008-217243 A

仮想視点画像に係る視点の位置を指定する場合、被写体の位置がわからなければ、ユーザは、どの位置を仮想視点画像に係る視点の位置とすれば、所望とする仮想視点画像が得られるかを判断することが難しかった。したがって、所望とする仮想視点画像に係る視点の指定を容易にするために、実空間での被写体の位置をユーザに特定可能にするための技術が望まれる。 When specifying the position of the viewpoint related to the virtual viewpoint image, if the position of the subject is not known, the user can determine which position is the position of the viewpoint related to the virtual viewpoint image and obtain the desired virtual viewpoint image. It was difficult to judge. Therefore, a technique for enabling the user to specify the position of the subject in real space is desired in order to easily specify the viewpoint related to the desired virtual viewpoint image.

本発明は、仮想視点画像に係る視点の位置を指定する際に、実空間での被写体の位置を特定可能にするための技術を提供することを目的とする。 An object of the present invention is to provide a technique for enabling the position of a subject in a real space to be specified when designating the position of a viewpoint related to a virtual viewpoint image.

本発明の一態様に係る画像処理装置は、複数のカメラにより撮影された複数の画像を用いて生成される仮想視点画像に係る視点の位置の指定を受け付ける画像処理装置であって、前記複数のカメラの撮影対象領域を撮影した画像に基づいて、当該画像に写るオブジェクトの位置情報を取得する取得手段と、前記撮影対象領域を示すコンピュータグラフィックにおいて、前記取得手段により取得された位置情報に対応する位置に、前記オブジェクトを示す情報を表示させる表示制御手段と、前記表示制御手段により前記オブジェクトを示す情報を表示させている状態で、仮想視点画像に係る視点の位置の指定を受け付ける受付手段と、を備えることを特徴とする。 An image processing apparatus according to an aspect of the present invention is an image processing apparatus that accepts designation of a viewpoint position related to a virtual viewpoint image generated using a plurality of images captured by a plurality of cameras. In an acquisition unit that acquires position information of an object shown in the image based on an image obtained by shooting a shooting target area of the camera, and corresponding to the position information acquired by the acquiring unit in a computer graphic indicating the shooting target area Display control means for displaying information indicating the object at a position, and receiving means for receiving designation of the position of the viewpoint related to the virtual viewpoint image in a state where the information indicating the object is displayed by the display control means; It is characterized by providing.

本発明の他の態様に係る画像処理装置は、フィールドを撮影した画像を、前記フィールドを上空から見た鳥瞰図画像に変換する変換手段と、前記鳥瞰図画像から、被写体の２次元の位置情報を取得する第一の取得手段と、前記２次元の位置情報を用いて、前記被写体を含む前記フィールドの領域をＣＧ（コンピュータグラフィック）モデルで再現したＣＧシーンを生成する生成手段と、前記ＣＧシーンを用いて特定の被写体の３次元の位置情報を取得する第二の取得手段と、前記特定の被写体の３次元の位置情報を用いて前記特定の被写体の位置情報を修正する修正手段とを備えることを特徴とする。 An image processing apparatus according to another aspect of the present invention obtains two-dimensional position information of a subject from conversion means for converting an image obtained by photographing a field into a bird's eye view image obtained by viewing the field from the sky, and the bird's eye view image. First generation means for generating, using the two-dimensional position information, generating means for generating a CG scene in which the area of the field including the subject is reproduced by a CG (computer graphic) model, and using the CG scene Second acquisition means for acquiring the three-dimensional position information of the specific subject, and correction means for correcting the position information of the specific subject using the three-dimensional position information of the specific subject. Features.

本発明によれば、仮想視点画像に係る視点の位置を指定する際に、実空間での被写体の位置を特定可能にすることができる。 ADVANTAGE OF THE INVENTION According to this invention, when designating the position of the viewpoint which concerns on a virtual viewpoint image, the position of the to-be-photographed object can be specified in real space.

画像処理装置、カメラ装置の構成を示すブロック図。The block diagram which shows the structure of an image processing apparatus and a camera apparatus. カメラ装置の配置を示す概要図。The schematic diagram which shows arrangement | positioning of a camera apparatus. 仮想カメラパスを設定するＧＵＩを示す図。The figure which shows GUI which sets a virtual camera path | pass. 画像処理装置の機能ブロックの一例を示す図。The figure which shows an example of the functional block of an image processing apparatus. 仮想視点画像生成の処理を示すフローチャート。The flowchart which shows the process of virtual viewpoint image generation. 被写体の２次元位置算出処理を示すフローチャート。The flowchart which shows the to-be-photographed object's two-dimensional position calculation process. 鳥瞰図およびシルエット抽出例を示した図。The figure which showed the bird's-eye view and the example of silhouette extraction. ＣＧシーン生成処理を示すフローチャート。The flowchart which shows a CG scene production | generation process. ＣＧシーン生成例を示した図。The figure which showed the example of CG scene production | generation. 特定の被写体の３次元位置算出処理を示すフローチャート。The flowchart which shows the three-dimensional position calculation process of a specific subject. 特定の被写体３次元位置取得例を示した図。The figure which showed the specific subject 3D position acquisition example. 楕円被写体の３次元位置取得処理を示すフローチャート。The flowchart which shows the three-dimensional position acquisition process of an elliptical object.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and all the combinations of features described in the present embodiment are not necessarily essential to the solution means of the present invention. In addition, about the same structure, the same code | symbol is attached | subjected and demonstrated.

以下に示す実施形態では、実カメラにより撮影された撮影画像に基づいて被写体の位置を特定し、特定した被写体の位置を、実空間を模したＣＧ（コンピュータグラフィック）上に示すことで、仮想視点画像に係る視点の位置の指定を容易にする形態を説明する。 In the embodiment described below, a virtual viewpoint is obtained by specifying the position of a subject on the basis of a photographed image taken by a real camera and showing the position of the identified subject on a CG (computer graphic) imitating real space. A mode for facilitating designation of the position of the viewpoint relating to the image will be described.

また、実空間における被写体の位置情報を取得する方法としては、複数台のカメラを用いた三角測量によって、被写体の位置情報を取得する方法が一般的である。しかし、このような方法では、処理時間が長くなる問題がある。また、処理時間を短縮するには専用のハードウェアを用いる必要がある点などの問題がある。また、撮影対象である人物やボールなどの被写体の高さなどは刻々と変化していくものであり、従来の技術では３次元位置の精度は低いという課題がある。また、以下に示す実施形態では、被写体位置を高精度かつ短時間で取得することを可能とする。 Further, as a method of acquiring subject position information in real space, a method of acquiring subject position information by triangulation using a plurality of cameras is generally used. However, such a method has a problem that the processing time becomes long. In addition, there is a problem that it is necessary to use dedicated hardware in order to shorten the processing time. In addition, the height of a subject such as a person to be photographed or a ball changes every moment, and the conventional technology has a problem that the accuracy of the three-dimensional position is low. In the embodiment described below, the subject position can be acquired with high accuracy and in a short time.

また、以下に示す実施形態では、仮想視点画像を撮影する仮想的なカメラ（以下、仮想化カメラと呼ぶ）を見立て、仮想視点画像に係る視点の位置及び視線の方向を指定することを仮想カメラの位置姿勢を指定するとも呼ぶ。また、仮想カメラの位置、注視点、および画角などの各種のパラメータを時間軸に沿って設定することを、仮想カメラパスを設定するとも呼ぶ。 In the embodiment described below, it is assumed that a virtual camera that captures a virtual viewpoint image (hereinafter referred to as a virtual camera) is used, and that the viewpoint position and the direction of the line of sight associated with the virtual viewpoint image are designated. It is also called to specify the position and orientation. Setting various parameters such as the position of the virtual camera, the gazing point, and the angle of view along the time axis is also referred to as setting a virtual camera path.

また、以下に示す実施形態において、特に断りがない限り、画像という文言は、静止画も動画も含むもとして説明を行う。即ち、以下に示す実施形態において、仮想視点画像は、静止画であっても動画であってもよい。 In the embodiment described below, the term “image” will be described as including a still image and a moving image unless otherwise specified. That is, in the embodiment described below, the virtual viewpoint image may be a still image or a moving image.

＜＜実施形態１＞＞
本実施形態では、特定のオブジェクトである被写体の位置情報（被写体位置情報ともいう）の取得に際して、サッカー競技を題材として説明を行う。即ち、仮想視点画像を生成するための撮影対象はサッカーが行われるフィールドであるものとして説明を行う。サッカーでは、選手などの人物被写体はフィールドに接地しているケースが大半である。そのため、本実施形態では、フィールド平面に則した２次元の位置情報を被写体位置情報として取得し、処理を高速化する。また、サッカーで、３次元的な動きをする被写体は限られている。具体的には、３次元的な動きをする被写体は、滞空時間が長いボールである。そこで、本実施形態では、被写体情報として３次元の位置情報を取得する対象を、ボールに絞り込む。これにより、処理時間の効率化を図るとともに、実空間における被写体位置情報の精度を高めることができる。 << Embodiment 1 >>
In the present embodiment, a description will be given using soccer competition as a subject when acquiring position information (also referred to as object position information) of a subject that is a specific object. That is, the description will be made assuming that the shooting target for generating the virtual viewpoint image is a field where soccer is played. In soccer, human subjects such as players are mostly in contact with the field. For this reason, in this embodiment, two-dimensional position information conforming to the field plane is acquired as subject position information, and the processing speed is increased. Also, there are a limited number of subjects that make a three-dimensional movement in soccer. Specifically, a subject that moves three-dimensionally is a ball with a long flight time. Therefore, in the present embodiment, the target for acquiring the three-dimensional position information as the subject information is narrowed down to the ball. As a result, the processing time can be improved and the accuracy of the subject position information in the real space can be increased.

＜システム構成＞
図１は、本実施形態における、画像処理装置１００とカメラ装置１１０とを含むシステム構成を示すブロック図である。画像処理装置１００は、ＣＰＵ１０１、メインメモリ１０２、記憶部１０３、入力部１０４、表示部１０５、外部Ｉ／Ｆ部１０６、バス１０７を備える。ＣＰＵ（Central Processing Unit）１０１は、演算処理や各種プログラムの実行を行う。メインメモリ１０２は、処理に必要なプログラム、データ、作業領域などをＣＰＵ１０１に提供する。記憶部１０３は、画像処理プログラムやＧＵＩ（Graphical User Interface）表示に必要な各種データを蓄積する装置で、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。入力部１０４は、キーボードやマウス、電子ペン、タッチパネル等の装置であり、ユーザからの操作入力を受け付ける。表示部１０５はＧＵＩなどの画面の表示を行う。外部Ｉ／Ｆ（InterFace）部１０６はカメラ装置１１０と接続し、画像データや制御信号データの送受信をＬＡＮ１２０経由で行う。バス１０７は上述の各部を接続し、データ転送を行う。 <System configuration>
FIG. 1 is a block diagram showing a system configuration including an image processing apparatus 100 and a camera apparatus 110 in the present embodiment. The image processing apparatus 100 includes a CPU 101, a main memory 102, a storage unit 103, an input unit 104, a display unit 105, an external I / F unit 106, and a bus 107. A CPU (Central Processing Unit) 101 performs arithmetic processing and execution of various programs. The main memory 102 provides the CPU 101 with programs, data, work areas, and the like necessary for processing. The storage unit 103 is a device that accumulates various types of data necessary for image processing programs and GUI (Graphical User Interface) display. For example, a nonvolatile memory such as a hard disk or a silicon disk is used. The input unit 104 is a device such as a keyboard, a mouse, an electronic pen, or a touch panel, and receives an operation input from a user. The display unit 105 displays a screen such as a GUI. An external I / F (InterFace) unit 106 is connected to the camera device 110 and transmits and receives image data and control signal data via the LAN 120. A bus 107 connects the above-described units and performs data transfer.

カメラ装置１１０は、複数台のカメラから構成されるカメラ群１１１を備える。カメラ群１１１はＬＡＮ（Local Area Network）１２０経由で画像処理装置１００と接続されている。カメラ群１１１は、画像処理装置１００からの制御信号をもとに、撮影の開始および停止、カメラ設定（シャッタースピード、焦点距離、絞値など）の変更、ならびに撮影データの転送を行う。 The camera device 110 includes a camera group 111 including a plurality of cameras. The camera group 111 is connected to the image processing apparatus 100 via a LAN (Local Area Network) 120. The camera group 111 starts and stops shooting, changes camera settings (such as shutter speed, focal length, and aperture value), and transfers shooting data based on control signals from the image processing apparatus 100.

なお、システム構成については、上記以外にも、様々な構成要素が存在し得るが、本実施形態の主眼ではないので、その説明は省略する。 In addition to the above, there may be various components of the system configuration, but since it is not the main point of the present embodiment, the description thereof is omitted.

図２は、カメラ群１１１の配置を示した図である。ここでは、スポーツなどを行う競技場にカメラ群１１１を設置したケースで説明する。競技を行うフィールド２０１上に被写体２０２が存在する。図２では、例示的に１つの被写体２０２のみを示している。カメラ群１１１は、フィールド２０１を取り囲むように配置されている。カメラ群１１１を構成する個々のカメラ２０３においては、フィールドの部分的な領域が収まるように焦点距離・画角（θ°）と撮影方向とが設定されている。また、トラッキングカメラ２０４を別途配置しており、フィールド全域が撮影できるよう焦点距離・画角（φ°）を設定している。ここで、θ＜φの関係となっている。本実施形態では、主にフィールド全域の領域を撮影するトラッキングカメラ２０４で撮影した画像を元に、被写体２０２の位置情報を取得する処理が行われる。すなわち、１台のカメラで撮影した画像を元に被写体２０２の位置情報を取得する処理が行われる。なお、トラッキングカメラ２０４を複数台にしてもよい。 FIG. 2 is a diagram showing the arrangement of the camera group 111. Here, a case where the camera group 111 is installed in a stadium where sports are performed will be described. A subject 202 exists on the field 201 where the competition is performed. FIG. 2 shows only one subject 202 as an example. The camera group 111 is arranged so as to surround the field 201. In the individual cameras 203 constituting the camera group 111, the focal length, the angle of view (θ °), and the shooting direction are set so that a partial area of the field is accommodated. In addition, a tracking camera 204 is provided separately, and the focal length and the angle of view (φ °) are set so that the entire field can be photographed. Here, θ <φ. In the present embodiment, processing for acquiring position information of the subject 202 is performed based mainly on an image captured by the tracking camera 204 that captures the entire area of the field. That is, processing for acquiring position information of the subject 202 is performed based on an image photographed by one camera. Note that a plurality of tracking cameras 204 may be provided.

＜ＧＵＩ画面＞
図３は、仮想視点画像生成時の仮想カメラパス設定を行うＧＵＩ画面を示した図である。前述したように、仮想視点画像を生成するには、仮想カメラパスを設定する必要がある。仮想カメラパスとは、仮想カメラの位置、注視点、および画角を時間軸に沿って設定した仮想カメラの経路である。仮想視点画像は、この仮想カメラで撮影されたような画像が再現された画像である。所望の仮想カメラパスを設定するためには、ユーザが、フィールド上における被写体の位置を確認できていることが必要となる。本実施形態では、画像処理装置１００が、被写体の位置情報を取得する処理を行う。画像処理装置１００は、１台のトラッキングカメラ２０４で撮影した画像に基づいて被写体の位置情報を取得する。そして、取得した位置情報を用いて、実空間をＣＧモデルで再現したＣＧシーンを生成する。ユーザは、ＣＧシーンを用いて再現されたＣＧ画像を見て、仮想カメラパスを指定する。そして、画像処理装置１００は、指定された仮想カメラパスを設定し、この仮想カメラパスに基づいてＣＧモデルで仮想視点画像を生成する。このような処理によれば、画像処理装置１００は、仮想カメラパスを設定するための被写体の位置情報を、１台のトラッキングカメラ２０４からの画像に基づいて取得することができるので、低コストで被写体の位置情報を取得できる。また、カメラ台数が少ない方が、データ量が少なくなるので、処理時間も短縮できる。なお、詳細は後述するが、画像処理装置１００は、被写体の位置情報のうち、特定の被写体（例えばボール）については、３次元の位置情報を取得し、他の被写体については実質的に２次元の位置情報を取得する。このような処理によれば、短時間で高精度の位置情報を取得することができる。 <GUI screen>
FIG. 3 is a diagram showing a GUI screen for setting a virtual camera path when generating a virtual viewpoint image. As described above, in order to generate a virtual viewpoint image, it is necessary to set a virtual camera path. The virtual camera path is a path of the virtual camera in which the position, gazing point, and angle of view of the virtual camera are set along the time axis. The virtual viewpoint image is an image in which an image as if taken with this virtual camera is reproduced. In order to set a desired virtual camera path, the user needs to be able to confirm the position of the subject on the field. In the present embodiment, the image processing apparatus 100 performs a process of acquiring subject position information. The image processing apparatus 100 acquires the position information of the subject based on an image captured by one tracking camera 204. And the CG scene which reproduced real space with the CG model is generated using the acquired position information. The user looks at the CG image reproduced using the CG scene and designates the virtual camera path. Then, the image processing apparatus 100 sets a designated virtual camera path, and generates a virtual viewpoint image using a CG model based on the virtual camera path. According to such a process, the image processing apparatus 100 can acquire the position information of the subject for setting the virtual camera path based on the image from one tracking camera 204, so that the cost can be reduced. The position information of the subject can be acquired. Also, the smaller the number of cameras, the smaller the amount of data, so the processing time can be shortened. Although details will be described later, the image processing apparatus 100 acquires three-dimensional position information for a specific subject (for example, a ball) out of the position information of the subject, and is substantially two-dimensional for other subjects. Get location information. According to such a process, highly accurate position information can be acquired in a short time.

図３（ａ）のＧＵＩ画面３００は、ＣＧウィンドウ３１０、操作ボタン類３２０、カメラパス設定ダイアログ３３０を含む構成である。ＣＧウィンドウ３１０は、フィールドＣＧモデル３１１、選手ＣＧモデル３１２、ボールＣＧモデル３１３を含む各種の３次元形状ＣＧモデル（以下、ＣＧモデルという）を可視化するウィンドウである。各ＣＧモデルはポリゴン面で構成されており、レイトレーシング法やスキャンライン法など、既存のＣＧレンダリング手法を用いて可視化される。本実施形態では、所望の時間帯の、カメラ群１１１の撮影対象であるフィールド上を再現するＣＧシーンを、ＣＧウィンドウ３１０で表示する。ユーザは、ＣＧウィンドウ３１０で可視化された選手などの被写体を示す各ＣＧモデルの位置を確認しながら、仮想カメラを配置・移動させ、仮想視点画像の仮想カメラパスを指定することになる。本実施形態は、このＣＧウィンドウ３１０に可視化する被写体（ＣＧモデル）の位置情報を短時間で高精度に取得する処理についても説明する。特に、本実施形態では、選手ＣＧモデル３１２およびボールＣＧモデル３１３の被写体位置情報を短時間で高精度に取得する。詳細は後述する。 The GUI screen 300 shown in FIG. 3A includes a CG window 310, operation buttons 320, and a camera path setting dialog 330. The CG window 310 is a window for visualizing various three-dimensional shape CG models (hereinafter referred to as CG models) including a field CG model 311, a player CG model 312, and a ball CG model 313. Each CG model is composed of polygonal surfaces and is visualized using an existing CG rendering method such as a ray tracing method or a scan line method. In the present embodiment, a CG scene that reproduces the field on which the camera group 111 is to be captured in a desired time zone is displayed in the CG window 310. The user locates and moves the virtual camera while checking the position of each CG model indicating a subject such as a player visualized in the CG window 310, and designates the virtual camera path of the virtual viewpoint image. In the present embodiment, a process of acquiring the position information of the subject (CG model) visualized in the CG window 310 in a short time with high accuracy will be described. In particular, in the present embodiment, the subject position information of the player CG model 312 and the ball CG model 313 is acquired with high accuracy in a short time. Details will be described later.

操作ボタン類３２０は、必要な入力データの読み込みや選手トラッキングの開始に利用される。選手トラッキングとは、各被写体である各選手の位置を経時的に追跡することである。具体的には、各選手の被写体位置情報を経時的に示すトラッキングデータが収集されることになる。操作ボタン類３２０には、ＣＧモデル読み込みボタン３２１、トラッキングデータ適用ボタン３２２、仮想視点画像生成ボタン３２３、および選手トラッキングボタン３２４が含まれる。 The operation buttons 320 are used for reading necessary input data and starting player tracking. Player tracking is to track the position of each player as a subject over time. Specifically, tracking data indicating the subject position information of each player over time is collected. The operation buttons 320 include a CG model reading button 321, a tracking data application button 322, a virtual viewpoint image generation button 323, and a player tracking button 324.

カメラパス設定ダイアログ３３０は、仮想カメラパスの設定を詳細に行う際に利用される。カメラパス設定ダイアログ３３０は、仮想カメラパス設定ボタン３３１およびボール高さ設定ボタン３３２を含むものである。 The camera path setting dialog 330 is used when setting the virtual camera path in detail. The camera path setting dialog 330 includes a virtual camera path setting button 331 and a ball height setting button 332.

図３（ｂ）のＧＵＩ画面３５０は、図３（ａ）の操作ボタン類３２０の選手トラッキングボタン３２４が押下されることにより、ＣＧウィンドウ３１０が、トラッキングウィンドウ３６０に遷移した画面である。トラッキングウィンドウ３６０には、トラッキングカメラ２０４で撮影された画像が表示されたり、フィールドを上方から見た鳥瞰図画像が表示されたりする。詳細は後述する。ユーザは、トラッキングウィンドウ３６０に表示された鳥瞰図画像に基づいて、選手やボールの被写体位置情報をトラッキングする際に必要な設定を行う。なお、図３（ｂ）のＧＵＩ画面３５０では、トラッキングウィンドウ３６０への遷移に併せて、図３（ａ）のカメラパス設定ダイアログ３３０が、トラッキングダイアログ３７０に遷移する。図３の詳細については、後述する処理と関連して適宜説明をする。 The GUI screen 350 in FIG. 3B is a screen in which the CG window 310 is changed to the tracking window 360 when the player tracking button 324 of the operation buttons 320 in FIG. In the tracking window 360, an image taken by the tracking camera 204 is displayed, or a bird's eye view image of the field viewed from above is displayed. Details will be described later. Based on the bird's-eye view image displayed in the tracking window 360, the user performs settings necessary for tracking the subject position information of the player and the ball. In the GUI screen 350 in FIG. 3B, the camera path setting dialog 330 in FIG. 3A transitions to the tracking dialog 370 in conjunction with the transition to the tracking window 360. The details of FIG. 3 will be described as appropriate in connection with the processing described later.

＜機能ブロック図＞
図４は、本実施形態の画像処理装置１００の機能ブロック図の一例を示す図である。画像処理装置は、画像データ格納部４０１、画像時刻取得部４０２、画像データ抽出部４０３、画像変換部４０４、２次元位置情報取得部４０５、ＣＧモデル格納部４０６、ＣＧシーン生成部４０７を有する。また、３次元位置情報取得部４０８、修正部４０９、表示制御部４１０、仮想カメラパス設定部４１１、仮想視点画像生成部４１２を有する。 <Functional block diagram>
FIG. 4 is a diagram illustrating an example of a functional block diagram of the image processing apparatus 100 according to the present embodiment. The image processing apparatus includes an image data storage unit 401, an image time acquisition unit 402, an image data extraction unit 403, an image conversion unit 404, a two-dimensional position information acquisition unit 405, a CG model storage unit 406, and a CG scene generation unit 407. In addition, it includes a three-dimensional position information acquisition unit 408, a correction unit 409, a display control unit 410, a virtual camera path setting unit 411, and a virtual viewpoint image generation unit 412.

図１に示す記憶部１０３は、画像データ格納部４０１およびＣＧモデル格納部４０６として機能する。なお、不図示のネットワーク接続されている外部装置を仮想的に用いて画像データ格納部４０１およびＣＧモデル格納部４０６として機能させてもよい。図４に示す他の各部は、ＣＰＵ１０１が、メインメモリ１０２または記憶部１０３に格納されたソフトウェアプログラムに従った処理を行うことで、ＣＰＵ１０１が図４に示す各部として機能する。このように、本実施形態においては、ソフトウェアによる処理によって画像処理装置１００の構成を説明することとする。しかしながら、同様の処理を専用に行うハードウェアによって、あるいは、このようなハードウェアとソフトウェアとの組み合わせによって、処理が実現されるように構成されていてもよい。また、図４で示す各部は、画像処理装置１００に含まれる形態を示しているが、図４で示す各部の処理を複数の装置によって分散して処理を行うような形態を採用してもよい。 The storage unit 103 illustrated in FIG. 1 functions as an image data storage unit 401 and a CG model storage unit 406. Note that an external device connected to a network (not shown) may be virtually used to function as the image data storage unit 401 and the CG model storage unit 406. The CPU 101 functions as each unit illustrated in FIG. 4 when the CPU 101 performs processing according to the software program stored in the main memory 102 or the storage unit 103. Thus, in the present embodiment, the configuration of the image processing apparatus 100 will be described by processing by software. However, the processing may be realized by hardware dedicated to performing similar processing, or by a combination of such hardware and software. 4 shows a form included in the image processing apparatus 100, but a form in which processing of each part shown in FIG. 4 is distributed and processed by a plurality of apparatuses may be adopted. .

画像データ格納部４０１は、トラッキングカメラ２０４で撮影された画像データを格納する。画像時刻取得部４０２は、ユーザから指定された画像時刻（開始時刻と終了時刻を含む）を取得する。画像データ抽出部４０３は、画像時刻取得部４０２で取得された画像時刻に対応する画像データを、画像データ格納部４０１から抽出する。画像変換部４０４は、画像データ抽出部４０３で抽出された画像データが示す画像（トラッキングカメラ２０４で撮影された画像）を、鳥瞰図画像に変換する。 The image data storage unit 401 stores image data captured by the tracking camera 204. The image time acquisition unit 402 acquires an image time (including a start time and an end time) designated by the user. The image data extraction unit 403 extracts image data corresponding to the image time acquired by the image time acquisition unit 402 from the image data storage unit 401. The image conversion unit 404 converts the image indicated by the image data extracted by the image data extraction unit 403 (image captured by the tracking camera 204) into a bird's eye view image.

２次元位置情報取得部（第一の取得部）４０５は、画像変換部４０４で変換された鳥瞰図画像を用いて、各被写体の被写体位置情報を取得する。ここでは、高さ（フィールドの鉛直方向の座標）については、所定の値（例えば「０」）で固定し、実質的に２次元の位置情報を取得する。つまり、フィールド平面上に各被写体が位置していると想定し、各被写体の２次元の被写体位置情報を取得する。なお、ここでは理解を容易にするために、２次元の被写体位置情報と説明しているが、実質的には高さを示す座標が、上述したように、「０」で固定されている３次元の被写体位置情報を取得しているものである。２次元位置情報取得部４０５は、被写体位置情報を経時的に示したトラッキングデータを取得する。 A two-dimensional position information acquisition unit (first acquisition unit) 405 acquires subject position information of each subject using the bird's eye view image converted by the image conversion unit 404. Here, the height (the vertical coordinate of the field) is fixed at a predetermined value (for example, “0”), and substantially two-dimensional position information is acquired. That is, assuming that each subject is located on the field plane, two-dimensional subject position information of each subject is acquired. Here, in order to facilitate understanding, it is described as two-dimensional subject position information, but the coordinates indicating the height are substantially fixed to “0” as described above. Dimensional subject position information is acquired. The two-dimensional position information acquisition unit 405 acquires tracking data indicating subject position information over time.

ＣＧモデル格納部４０６は、各種のＣＧモデルを格納している。たとえば、フィールドＣＧモデル３１１、選手ＣＧモデル３１２、ボールＣＧモデル３１３を格納している。ＣＧシーン生成部４０７は、２次元位置情報取得部４０５で取得されたトラッキングデータ（経時的な被写体位置情報）に基づいて、ＣＧモデル格納部４０６で格納されるＣＧモデルを用いてＣＧシーンを生成する。 The CG model storage unit 406 stores various CG models. For example, a field CG model 311, a player CG model 312, and a ball CG model 313 are stored. The CG scene generation unit 407 generates a CG scene using the CG model stored in the CG model storage unit 406 based on the tracking data (subject position information with time) acquired by the two-dimensional position information acquisition unit 405. To do.

ＣＧシーンとは、ＣＧ画像を制作するための素材である。ＣＧ画像は、仮想空間内に被写体や背景の３次元形状モデルを配置し、その後、仮想カメラを設定し、仮想カメラから撮影した画像をレンダリングすることで再現される。ＣＧシーンを生成することは、上記の仮想空間内に被写体や背景の３次元形状モデルを配置することに相当する。ＣＧシーンには、時間の概念（アニメーション）が含まれる。 A CG scene is a material for producing a CG image. A CG image is reproduced by placing a three-dimensional shape model of a subject or background in a virtual space, then setting a virtual camera and rendering an image taken from the virtual camera. Generating a CG scene is equivalent to arranging a three-dimensional shape model of a subject or background in the virtual space. The CG scene includes the concept of time (animation).

３次元位置情報取得部（第二の取得部）４０８は、ＣＧシーン生成部４０７で生成されたＣＧシーンを用いて、特定の被写体の３次元位置情報を取得する。特定の被写体としては、例えばフィールドの上空を移動することがあるボールが挙げられる。ボールが空中にある場合には、２次元位置情報取得部４０５で取得した位置情報が正しくない可能性が高い。そこで、３次元位置情報取得部４０８は、特定の被写体の位置情報（３次元位置）の座標の値を取得する。詳細については後述する。 A three-dimensional position information acquisition unit (second acquisition unit) 408 acquires three-dimensional position information of a specific subject using the CG scene generated by the CG scene generation unit 407. An example of the specific subject is a ball that may move over the field. When the ball is in the air, there is a high possibility that the position information acquired by the two-dimensional position information acquisition unit 405 is not correct. Therefore, the three-dimensional position information acquisition unit 408 acquires the coordinate value of the position information (three-dimensional position) of the specific subject. Details will be described later.

修正部４０９は、ＣＧシーン生成部４０７で生成されたＣＧシーンを修正する。例えば、３次元位置情報取得部４０８において３次元位置情報を取得できるように、ＣＧシーンにおける特定の被写体の位置情報を変更する。詳細は後述する。また、修正部４０９は、３次元位置情報取得部４０８によって取得された被写体位置情報（３次元位置情報）の座標の値を用いて、その特定の被写体（ボール）のトラッキングデータを修正する。そして、修正後のトラッキングデータを用いてＣＧシーン生成部４０７で生成されているＣＧシーンを修正する。表示制御部４１０は、ＣＧシーン生成部４０７で生成されたＣＧシーンもしくは修正部４０９で修正されたＣＧシーン、または、このＣＧシーンを用いたＣＧ画像を表示部１０５に表示するように制御する。 The correction unit 409 corrects the CG scene generated by the CG scene generation unit 407. For example, the position information of a specific subject in the CG scene is changed so that the three-dimensional position information acquisition unit 408 can acquire the three-dimensional position information. Details will be described later. The correction unit 409 corrects the tracking data of the specific subject (ball) using the coordinate value of the subject position information (three-dimensional position information) acquired by the three-dimensional position information acquisition unit 408. Then, the CG scene generated by the CG scene generation unit 407 is corrected using the corrected tracking data. The display control unit 410 controls the display unit 105 to display the CG scene generated by the CG scene generation unit 407, the CG scene corrected by the correction unit 409, or a CG image using the CG scene.

仮想カメラパス設定部４１１は、ユーザから指定を受け付ける受付部として機能する。そして、ユーザから指定された仮想カメラパスを設定する。ユーザは、表示されているＣＧシーンから、各被写体の位置を確認し、所望の仮想カメラパスを指定することになる。仮想視点画像生成部４１２は、仮想カメラパス設定部４１１で設定された仮想カメラパスに従ったＣＧシーンを用いたＣＧの仮想視点画像を生成する。生成された仮想視点画像は、表示部１０５に表示され、ユーザが指定した仮想カメラパスが所望のものであるかをユーザが確認できる。 The virtual camera path setting unit 411 functions as a reception unit that receives designation from the user. Then, a virtual camera path designated by the user is set. The user confirms the position of each subject from the displayed CG scene and designates a desired virtual camera path. The virtual viewpoint image generation unit 412 generates a CG virtual viewpoint image using a CG scene according to the virtual camera path set by the virtual camera path setting unit 411. The generated virtual viewpoint image is displayed on the display unit 105, and the user can confirm whether or not the virtual camera path designated by the user is desired.

なお、ユーザが指定した仮想カメラパスが所望のものである場合、カメラ群１１１を構成するカメラ２０３によって撮影された画像を用いて、指定された仮想カメラパスに従った実際の画像の仮想視点画像が生成される。 Note that if the virtual camera path specified by the user is desired, a virtual viewpoint image of an actual image according to the specified virtual camera path using images taken by the cameras 203 constituting the camera group 111. Is generated.

＜フローチャート＞
図５は、上述した各部によってＣＧシーンを用いた仮想視点画像を生成するまでの処理過程を示したフローチャート図である。 <Flowchart>
FIG. 5 is a flowchart showing the process until the above-described units generate a virtual viewpoint image using a CG scene.

ステップＳ５０１においてＣＰＵ１０１は、カメラ装置１１０に対して、撮影時の露光が適切となるようにカメラ設定を変更し、撮影開始の信号を送信する。カメラ装置１１０は、撮影信号の受信に応じて撮影を開始し、撮影した画像データをＬＡＮ１２０経由で画像処理装置１００に転送する。カメラ装置１１０は、トラッキングカメラ２０４の画像データも併せて転送する。送信された画像データは、画像データ格納部４０１に格納される。 In step S 501, the CPU 101 changes the camera settings so that exposure at the time of shooting is appropriate and transmits a shooting start signal to the camera device 110. The camera device 110 starts shooting in response to the reception of the shooting signal, and transfers the shot image data to the image processing device 100 via the LAN 120. The camera device 110 also transfers the image data of the tracking camera 204. The transmitted image data is stored in the image data storage unit 401.

ステップＳ５０２において画像時刻取得部４０２は、ユーザによって設定された画像時刻を取得する。画像時刻は、仮想視点画像生成の対象となる画像が撮影されている時刻である。画像時刻は、不図示のＵＩ画面などを通じてユーザから設定される。ステップＳ５０３において画像データ抽出部４０３は、画像時刻に対応する画像データを、画像データ格納部４０１から抽出する。ステップＳ５０３においてＣＰＵ１０１は、図３（ａ）のＧＵＩ画面３００を起動し、表示部１０５に表示する。起動直後はＣＧウィンドウ３１０には何も表示されていない。 In step S502, the image time acquisition unit 402 acquires the image time set by the user. The image time is a time at which an image that is a target of virtual viewpoint image generation is taken. The image time is set by the user through a UI screen (not shown). In step S 503, the image data extraction unit 403 extracts image data corresponding to the image time from the image data storage unit 401. In step S 503, the CPU 101 activates the GUI screen 300 in FIG. 3A and displays it on the display unit 105. Immediately after startup, nothing is displayed in the CG window 310.

ステップＳ５０５において２次元位置情報取得部４０５は、被写体の２次元位置情報を取得する。例えば、ステップＳ５０４で表示されているＧＵＩ画面３００の選手トラッキングボタン３２４がユーザによって押下される。選手トラッキングボタン３２４の押下に応じて、ＣＰＵ１０１は、表示部１０５に表示する画面を図３（ｂ）に示すＧＵＩ画面３５０に遷移させる。すなわち、ＣＰＵ１０１は、表示部１０５に、トラッキングウィンドウ３６０やトラッキングダイアログ３７０を含むＧＵＩ画面３５０を表示する。２次元位置情報取得部４０５は、画像データ抽出部４０３にて抽出された画像データ、すなわち、ステップＳ５０２で取得された画像時刻に含まれる全てのフレーム（画像データ）を取得する。そして、それらのフレーム（画像データ）において、フィールド面に沿った被写体２０２の被写体位置情報（２次元位置）を、トラッキングカメラ２０４の画像に基づいて取得する。本ステップにおいては、被写体位置情報として３次元の位置情報ではなく、２次元の位置情報を取得するので、高速に処理が実行可能である。本処理の詳細は後程詳述する。取得した被写体位置情報（２次元位置）は、記憶部１０３に保存される。 In step S505, the two-dimensional position information acquisition unit 405 acquires the two-dimensional position information of the subject. For example, the player tracking button 324 on the GUI screen 300 displayed in step S504 is pressed by the user. In response to pressing of the player tracking button 324, the CPU 101 causes the screen displayed on the display unit 105 to transition to the GUI screen 350 shown in FIG. That is, the CPU 101 displays the GUI screen 350 including the tracking window 360 and the tracking dialog 370 on the display unit 105. The two-dimensional position information acquisition unit 405 acquires the image data extracted by the image data extraction unit 403, that is, all frames (image data) included in the image time acquired in step S502. Then, in those frames (image data), the subject position information (two-dimensional position) of the subject 202 along the field plane is acquired based on the image of the tracking camera 204. In this step, not the three-dimensional position information but the two-dimensional position information is acquired as the subject position information, so that the processing can be executed at high speed. Details of this processing will be described later. The acquired subject position information (two-dimensional position) is stored in the storage unit 103.

ステップＳ５０６においてＣＧシーン生成部４０７は、取得された被写体位置情報（２次元位置）と、ＣＧモデル格納部４０６に格納されている既存のＣＧモデルとを用いて、３次元のＣＧシーンを生成する。本処理の詳細も後程詳述する。 In step S506, the CG scene generation unit 407 generates a three-dimensional CG scene using the acquired subject position information (two-dimensional position) and the existing CG model stored in the CG model storage unit 406. . Details of this process will also be described later.

ステップＳ５０７において３次元位置情報取得部４０８は、被写体の中から特定の被写体（ボール）を抽出し、その特定の被写体の３次元位置を算出する。ボールは、競技中に地面ではなくフィールド上空に位置していることがある。したがって、３次元位置情報取得部４０８は、このような特定の被写体の３次元の被写体位置情報を取得する。これにより、ボールの位置取得精度が向上する。また、３次元位置を取得する処理対象をボールに絞り込むため、高速に処理が実行可能である。本処理の詳細も後程詳述する。 In step S507, the three-dimensional position information acquisition unit 408 extracts a specific subject (ball) from the subjects, and calculates the three-dimensional position of the specific subject. The ball may be located above the field and not the ground during competition. Therefore, the three-dimensional position information acquisition unit 408 acquires such three-dimensional subject position information of the specific subject. Thereby, the position acquisition accuracy of the ball is improved. In addition, since the processing target for obtaining the three-dimensional position is narrowed down to the ball, the processing can be executed at high speed. Details of this process will also be described later.

ステップＳ５０８において仮想カメラパス設定部４１１は、仮想カメラパス設定を行う。例えば、図３（ａ）のＣＧを表示する表示領域であるＣＧウィンドウ３１０に表示されているＣＧシーンにおいてユーザが、仮想カメラパスを指定する。例えば、ＣＧウィンドウ３１０に対するユーザのドラッグ操作またはタッチ操作を受け付け、これらの操作に対応する軌跡を仮想カメラパスとして指定する構成としてもよい。そして、仮想カメラパス設定ボタン３３１の押下に応じて、仮想カメラパス設定部４１１は、指定された仮想カメラパスを設定する。仮想カメラパス設定は３次元のＣＧシーン内で実施される。その手法は例えば、仮想カメラ位置を時間軸方向に何点かサンプリング設定し、その間をスプライン関数などで補間する方法を用いることができる。 In step S508, the virtual camera path setting unit 411 performs virtual camera path setting. For example, the user designates a virtual camera path in the CG scene displayed in the CG window 310 which is a display area for displaying the CG in FIG. For example, a configuration may be adopted in which a user's drag operation or touch operation on the CG window 310 is received and a trajectory corresponding to these operations is designated as a virtual camera path. Then, in response to pressing of the virtual camera path setting button 331, the virtual camera path setting unit 411 sets the designated virtual camera path. The virtual camera path setting is performed in a three-dimensional CG scene. As the method, for example, a method of sampling a virtual camera position at several points in the time axis direction and interpolating between the positions using a spline function or the like can be used.

ステップＳ５０９において仮想視点画像生成部４１２は、設定された仮想カメラパスに従った仮想視点画像を生成する。具体的には、ユーザによって図３（ａ）の仮想視点画像生成ボタン３２３が押下されると、仮想視点画像生成部４１２は、設定された仮想カメラパスに基づいて、仮想視点画像を生成する。仮想視点画像生成は、被写体の３次元形状に対して、仮想カメラから見た画像を公知のコンピュータグラフィックスの技術を用いて生成することができる。 In step S509, the virtual viewpoint image generation unit 412 generates a virtual viewpoint image according to the set virtual camera path. Specifically, when the virtual viewpoint image generation button 323 in FIG. 3A is pressed by the user, the virtual viewpoint image generation unit 412 generates a virtual viewpoint image based on the set virtual camera path. In the virtual viewpoint image generation, an image viewed from a virtual camera can be generated for a three-dimensional shape of a subject using a known computer graphics technique.

以上説明した処理により、ユーザは、実際の画像の仮想視点画像を生成するための仮想カメラパスを指定する際に、ＣＧシーンの仮想視点画像を見て、所望の仮想視点画像が再現できるか否かを容易に判断することができる。 Whether or not the user can reproduce the desired virtual viewpoint image by viewing the virtual viewpoint image of the CG scene when the virtual camera path for generating the virtual viewpoint image of the actual image is designated by the processing described above. Can be easily determined.

＜被写体位置情報（２次元位置）の取得処理＞
次に、ステップＳ５０４の被写体位置情報（２次元位置）の取得処理の詳細について、図６のフローチャートを用いて説明する。図６は、Ｓ５０４の処理過程を示したフローチャート図である。 <Subject position information (two-dimensional position) acquisition processing>
Next, details of the subject position information (two-dimensional position) acquisition process in step S504 will be described with reference to the flowchart of FIG. FIG. 6 is a flowchart showing the process of S504.

ステップＳ６０１において画像変換部４０４は、画像データ抽出部４０３によって抽出された、トラッキングカメラ２０４の画像を取得する。例えば図３（ａ）の選手トラッキングボタン３２４が、ユーザによって押下されたタイミングで、画像の取得を実行する。取得された画像は、図３（ｂ）のトラッキングウィンドウ３６０に表示される。例えば、トラッキングウィンドウ３６０には、図７（ａ）のトラッキングカメラ画像７０１のように、トラッキングカメラ２０４で撮影され、指定された画像時刻のトラッキングカメラ画像７０１が表示される。 In step S 601, the image conversion unit 404 acquires the image of the tracking camera 204 extracted by the image data extraction unit 403. For example, image acquisition is executed when the player tracking button 324 in FIG. 3A is pressed by the user. The acquired image is displayed in the tracking window 360 of FIG. For example, the tracking window 360 displays the tracking camera image 701 at the specified image time, which is taken by the tracking camera 204, as in the tracking camera image 701 in FIG.

ステップＳ６０２では、画像変換部４０４は、フィールドのコーナー４点の位置情報を取得する。コーナー４点は、トラッキングカメラ画像を、フィールドの上から見た鳥瞰図画像に変換する際の、鳥瞰図画像の四隅の位置に対応することになる。コーナー４点の位置情報は、例えば、トラッキングウィンドウ３６０に表示されているトラッキングカメラ画像７０１を見たユーザが、フィールドのコーナー４点を指定することで取得される。コーナーは、図７（ａ）のコーナー７０５に示す位置であり、ユーザは、マウスやタッチ操作で指定すればよい。もちろん、ハフ変換などの直線抽出手法を用いて、コーナーの位置を検出してもよい。 In step S602, the image conversion unit 404 acquires position information of the four corners of the field. The four corners correspond to the positions of the four corners of the bird's-eye view image when the tracking camera image is converted into a bird's-eye view image viewed from above the field. The position information of the four corners is acquired, for example, when the user who has seen the tracking camera image 701 displayed in the tracking window 360 specifies the four corners of the field. The corner is a position indicated by a corner 705 in FIG. 7A, and the user may specify with a mouse or a touch operation. Of course, the position of the corner may be detected using a straight line extraction method such as Hough transform.

ステップＳ６０３において画像変換部４０４は、トラッキングカメラ画像を射影変換し、フィールドを上から見た鳥瞰図画像を生成する。サッカーフィールドのサイズは規定されている。このため、コーナー４点の座標値を、そのまま物理的な位置座標（ワールド座標）に変換可能である。また、トラッキングカメラ２０４の焦点距離は既知である。焦点距離が既知の場合、フィールドを構成する四辺形の歪み角度から、トラッキングカメラ２０４の物理的な位置（ワールド座標）を算出できる。 In step S603, the image conversion unit 404 performs projective conversion on the tracking camera image, and generates a bird's-eye view image of the field viewed from above. The size of the soccer field is prescribed. For this reason, the coordinate values of the four corners can be directly converted into physical position coordinates (world coordinates). The focal length of the tracking camera 204 is known. When the focal length is known, the physical position (world coordinate) of the tracking camera 204 can be calculated from the distortion angle of the quadrilateral that forms the field.

以下では、トラッキングカメラ２０４の画像を「Ｉｍａｇｅ（Ｘ，Ｙ）」で表す。トラッキングカメラ画像に映っている被写体は、トラッキング画像上では２次元座標（スクリーン座標）上に位置しているが、物理的には、３次元座標（ワールド座標）上に位置していることになる。したがって、トラッキング画像Ｉｍａｇｅ（Ｘ,Ｙ）（２次元座標）は、式１に従った画像といえる。なお、Ｘ，Ｙはトラッキング画像の画素位置（２次元座標）を示す。
Image(X,Y) = Screen( Proj * View * Pos(x,y,z) ); 式１
ここで、
Ｐｏｓ（ｘ，ｙ，ｚ）：被写体の三次元的な位置（３次元座標）
Ｖｉｅｗ：視点変換行列（カメラの位置、方向を決定）
Ｐｒｏｊ：２次元射影行列（カメラの画角を決定）
Ｓｃｒｅｅｎ（）：スクリーン座標変換関数
である。 Hereinafter, the image of the tracking camera 204 is represented by “Image (X, Y)”. The subject shown in the tracking camera image is located on the two-dimensional coordinates (screen coordinates) on the tracking image, but physically located on the three-dimensional coordinates (world coordinates). . Therefore, the tracking image Image (X, Y) (two-dimensional coordinates) can be said to be an image according to Equation 1. X and Y indicate pixel positions (two-dimensional coordinates) of the tracking image.
Image (X, Y) = Screen (Proj * View * Pos (x, y, z)); Equation 1
here,
Pos (x, y, z): three-dimensional position of the subject (three-dimensional coordinates)
View: Viewpoint transformation matrix (determines camera position and direction)
Proj: Two-dimensional projection matrix (determines the angle of view of the camera)
Screen (): Screen coordinate conversion function.

また、鳥瞰図画像を「ＩｍａｇｅＶ（Ｘ,Ｙ）」で表す。鳥瞰図画像に映っている被写体も、鳥瞰図画像上ではトラッキング画像上では２次元座標（スクリーン座標）上に位置しているが、物理的には、３次元座標（ワールド座標）上に位置していることになる。したがって、鳥瞰図画像ＩｍａｇｅＶ（Ｘ,Ｙ）（２次元座標）は、式２に従った画像といえる。なお、Ｘ，Ｙは鳥瞰図画像の画素位置（２次元座標）を示す。
ImageV(X,Y) = Screen( Proj * ViewV * Pos(x,y,z) ); 式２
ここで、
Ｐｏｓ（ｘ，ｙ，ｚ）：被写体の三次元的な位置
ＶｉｅｗＶ：視点変換行列（カメラの位置、方向を決定）
Ｐｒｏｊ：２次元射影行列（カメラの画角を決定）
Ｓｃｒｅｅｎ（）：スクリーン座標変換関数
である。 The bird's eye view image is represented by “ImageV (X, Y)”. The subject shown in the bird's-eye view image is also located on the two-dimensional coordinates (screen coordinates) on the tracking image on the bird's-eye view image, but physically located on the three-dimensional coordinates (world coordinates). It will be. Therefore, it can be said that the bird's-eye view image ImageV (X, Y) (two-dimensional coordinates) is an image according to Equation 2. X and Y indicate pixel positions (two-dimensional coordinates) of the bird's eye view image.
ImageV (X, Y) = Screen (Proj * ViewV * Pos (x, y, z)); Equation 2
here,
Pos (x, y, z): Three-dimensional position of the subject ViewV: View point transformation matrix (determines the position and direction of the camera)
Proj: Two-dimensional projection matrix (determines the angle of view of the camera)
Screen (): Screen coordinate conversion function.

式１および式２を用いて、Ｐｏｓ（ｘ，ｙ，ｚ）を対応させると、Ｉｍａｇｅ（Ｘ，Ｙ）からＩｍａｇｅＶ（Ｘ，Ｙ）への射影変換を行うことができる。すなわち、トラッキングカメラ画像上のある位置の画素Ｉｍａｇｅ（Ｘ，Ｙ）（スクリーン座標）は、ワールド座標の第一の点Ｐｏｓ（ｘ，ｙ，ｚ）に、式１を用いて変換することができる。さらに、そのワールド座標系の第一の点Ｐｏｓ（ｘ，ｙ，ｚ）を、式２を用いて変換することで、トラッキング画像Ｉｍａｇｅ（Ｘ，Ｙ）を、鳥瞰図画像ＩｍａｇｅＶ（Ｘ，Ｙ）（スクリーン座標）に変換することができる。なお、本実施形態では、ワールド座標の位置Ｐｏｓ（ｘ，ｙ，ｚ）におけるｚ座標（高さを示す）は、フィールド面で固定される（０で固定される）。つまり、本実施形態では、被写体（例えば選手）の位置するｚ座標（高さ）を、所定の高さ（フィールド面）に固定した固定値として扱う。このように処理を行うことで、１台のトラッキングカメラの画像を用いて、鳥瞰図画像を生成することができる。 When Pos (x, y, z) is made to correspond using Equation 1 and Equation 2, projective transformation from Image (X, Y) to ImageV (X, Y) can be performed. That is, the pixel Image (X, Y) (screen coordinates) at a certain position on the tracking camera image can be converted into the first point Pos (x, y, z) in the world coordinates using Equation 1. . Further, by converting the first point Pos (x, y, z) of the world coordinate system using Expression 2, the tracking image Image (X, Y) is converted into the bird's eye view image ImageV (X, Y) ( Screen coordinates). In the present embodiment, the z coordinate (indicating the height) at the position Pos (x, y, z) of the world coordinate is fixed on the field plane (fixed at 0). That is, in the present embodiment, the z coordinate (height) at which the subject (for example, a player) is located is handled as a fixed value fixed at a predetermined height (field surface). By performing processing in this way, it is possible to generate a bird's eye view image using an image of one tracking camera.

なお、前述したように、被写体のｚ座標を得るには、複数台のカメラを用いた三角測量等の方法があるが、データ量が増えることなどにより処理時間を要してしまう。本実施形態では、被写体は、フィールド面に固定されているものと扱い、１台のカメラの画像から鳥瞰図画像に変換している。鳥瞰図画像を生成することで、各被写体のフィールド面上における被写体位置情報（２次元位置）を取得することが可能となる。 As described above, there are methods such as triangulation using a plurality of cameras in order to obtain the z-coordinate of the subject, but processing time is required due to an increase in the amount of data. In the present embodiment, the subject is treated as being fixed on the field surface, and the image of one camera is converted into a bird's eye view image. By generating the bird's-eye view image, it is possible to acquire subject position information (two-dimensional position) on the field plane of each subject.

画像変換部４０４は、上述した画像変換処理を行い、図７（ａ）のトラッキングカメラ画像７０１を、図７（ｂ）の鳥瞰図画像７０２に変換する。変換された鳥瞰図画像７０２は、図３（ｂ）のトラッキングウィンドウ３６０に表示される。 The image conversion unit 404 performs the above-described image conversion process, and converts the tracking camera image 701 in FIG. 7A to the bird's eye view image 702 in FIG. 7B. The converted bird's-eye view image 702 is displayed in the tracking window 360 of FIG.

ステップＳ６０４において２次元位置情報取得部４０５は、ステップＳ６０３で生成された鳥瞰図画像７０２の全フレームにおいて画素単位で中間値を抽出し、抽出した中間値によって各画素が表現されている背景画像を生成する。この背景画像は、後述する被写体シルエット画像を生成するために用いられる。背景画像は、被写体が映っていない画像に相当する。 In step S604, the two-dimensional position information acquisition unit 405 extracts an intermediate value in units of pixels in all frames of the bird's eye view image 702 generated in step S603, and generates a background image in which each pixel is expressed by the extracted intermediate value. To do. This background image is used to generate a subject silhouette image to be described later. The background image corresponds to an image in which no subject is shown.

ステップＳ６０５において２次元位置情報取得部４０５は、被写体シルエット画像を生成する。具体的には、２次元位置情報取得部４０５は、ステップＳ６０３で生成された鳥瞰図画像７０２の全フレームにおいて、鳥瞰図画像７０２とステップＳ６０４で生成された背景画像との差分を画素単位で行う。そして、差分の画素値によって各画素が表現されている被写体シルエット画像を生成する。図７（ｃ）および図７（ｄ）は、フレーム（時刻）の異なるシルエット画像７０３および７０４を、それぞれ示している。シルエット画像７０３および７０４の白い領域が、被写体（選手やボール）のシルエットに相当する領域であり、被写体のみが抽出されていることがわかる。 In step S605, the two-dimensional position information acquisition unit 405 generates a subject silhouette image. Specifically, the two-dimensional position information acquisition unit 405 performs a pixel-by-pixel difference between the bird's eye view image 702 and the background image generated in step S604 in all frames of the bird's eye view image 702 generated in step S603. Then, a subject silhouette image in which each pixel is expressed by the difference pixel value is generated. FIGS. 7C and 7D show silhouette images 703 and 704 having different frames (time), respectively. It can be seen that the white areas of the silhouette images 703 and 704 are areas corresponding to the silhouette of the subject (player or ball), and only the subject is extracted.

ステップＳ６０６において２次元位置情報取得部４０５は、各フレームのシルエット画像の被写体領域をラベリングする。例えば、２次元位置情報取得部４０５は、孤立した被写体シルエット領域に対して、固有のラベル番号を付与する。図７（ｃ）および図７（ｄ）では、被写体シルエットに対して、ラベルＭ１からＭ４が付与されていることを示している。ステップＳ６０６では、２次元位置情報取得部４０５は、各フレーム単位で、画面の左上から右下にかけて順番に被写体領域にラベルを付与している。 In step S606, the two-dimensional position information acquisition unit 405 labels the subject area of the silhouette image of each frame. For example, the two-dimensional position information acquisition unit 405 gives a unique label number to an isolated subject silhouette area. FIG. 7C and FIG. 7D show that labels M1 to M4 are assigned to the subject silhouette. In step S606, the two-dimensional position information acquisition unit 405 assigns labels to the subject area in order from the upper left to the lower right of the screen for each frame.

ステップＳ６０７において２次元位置情報取得部４０５は、フレーム単位で設定した被写体とラベル番号とを、全フレームにおいて整合性を保つように調整する。先のステップＳ６０６では、他のフレームとの関係を考慮せず、単純にフレーム内での位置に応じてラベル番号を付与している。ステップＳ６０７では、他のフレームとの関係を考慮し、同じ被写体に対して全フレームを通じて同じラベル番号が付与されるように調整する処理である。例えば、フレームｔ０、ｔ１において、ある被写体Ｘに付与したラベル番号がずれていた場合、被写体Ｘに付与されるラベル番号が同一のラベル番号になるように各フレームで付与されているラベル番号を変更する。具体的には、フレームｔ０におけるラベルＭＸ１の被写体領域の位置が、フレームｔ１におけるラベルＭＸ２の被写体領域の位置に近い場合、フレームｔ１のラベルＭＸ２を、ラベルＭＸ１に書き換える処理が行われる。なお、ラベリングの調整は、シルエット画像のほかに、鳥瞰図画像を用いて行なってもよい。例えば、被写体の顔や背番号などを画像認識して用いてラベリングを調整してもよい。以上の処理により、ある特定の被写体Ｘの、鳥瞰図画像７０２の全フレームにおける位置を特定することができる。 In step S607, the two-dimensional position information acquisition unit 405 adjusts the subject and the label number set in units of frames so as to maintain consistency in all frames. In the previous step S606, the label number is simply assigned according to the position in the frame without considering the relationship with other frames. In step S607, adjustment is performed so that the same label number is assigned to the same subject throughout all frames in consideration of the relationship with other frames. For example, if the label number assigned to a certain subject X is shifted in frames t0 and t1, the label number assigned to each frame is changed so that the label number assigned to the subject X becomes the same label number. To do. Specifically, when the position of the subject area of the label MX1 in the frame t0 is close to the position of the subject area of the label MX2 in the frame t1, a process of rewriting the label MX2 of the frame t1 with the label MX1 is performed. The labeling adjustment may be performed using a bird's eye view image in addition to the silhouette image. For example, the labeling may be adjusted by recognizing and using the face, back number, etc. of the subject. Through the above processing, the position of a specific subject X in all the frames of the bird's eye view image 702 can be specified.

ステップＳ６０８において２次元位置情報取得部４０５は、被写体に対して属性を追加する。属性とは、選手のチーム、キーパー、審判、ボールなどの付加情報を意味する。ユーザの指定に基づいて被写体に対して属性を追加する例を、図３（ｂ）を用いて説明する。図３（ｂ）のＧＵＩ画面３５０のトラッキングウィンドウ３６０には、ラベル番号が付与された被写体Ｍ１、Ｍ２、Ｍ３が表示されている。トラッキングウィンドウ３６０に表示されている画像は、被写体シルエットおよびラベル番号を鳥瞰図画像に重畳した画像である。ここで、例えば被写体Ｍ１をユーザがマウスでクリック、またはタッチすると、トラッキングダイアログ３７０に被写体Ｍ１の属性を指定するためのコンボボックス３７１が表示される。ユーザが、コンボボックス３７１の中から属性を選択する。本実施形態は、サッカーの場面を想定しており、属性としては、チームＡのフィールド選手、チームＢのフィールド選手，チームＡのゴールキーパー、チームＢのゴールキーパー、審判、およびボールのいずれかが付与されることになる。属性が選択されると、ＣＧシーン生成部４０７が、ＣＧモデル格納部４０６から属性に合わせたＣＧモデルを取得してプレビューする。ＣＧモデルについては、後述する。２次元位置情報取得部４０５は、選択された属性を被写体Ｍ１に付与する。この一連の処理を全被写体に実施する。なお、ここではユーザの指定に基づいて属性付与が行われる形態を例に挙げて説明したが、属性付与は機械学習を用いた画像認識を用いることで自動処理することも可能である。また、ボールが円形もしくは楕円形である特徴を用いることにより、ボールの属性を自動的に付与することもできる。 In step S608, the two-dimensional position information acquisition unit 405 adds an attribute to the subject. The attribute means additional information such as a player's team, keeper, referee, and ball. An example of adding an attribute to a subject based on user designation will be described with reference to FIG. In the tracking window 360 of the GUI screen 350 in FIG. 3B, subjects M1, M2, and M3 to which label numbers are assigned are displayed. The image displayed in the tracking window 360 is an image in which the subject silhouette and the label number are superimposed on the bird's eye view image. Here, for example, when the user clicks or touches the subject M1 with the mouse, a combo box 371 for designating the attribute of the subject M1 is displayed in the tracking dialog 370. The user selects an attribute from the combo box 371. In this embodiment, a soccer scene is assumed. As an attribute, any one of a team A field player, a team B field player, a team A goalkeeper, a team B goalkeeper, a referee, and a ball is used. Will be granted. When an attribute is selected, the CG scene generation unit 407 acquires a CG model according to the attribute from the CG model storage unit 406 and previews it. The CG model will be described later. The two-dimensional position information acquisition unit 405 gives the selected attribute to the subject M1. This series of processing is performed on all subjects. Note that, here, an example in which attribute assignment is performed based on user designation has been described as an example, but attribute assignment can also be automatically processed using image recognition using machine learning. Further, by using the feature that the ball is circular or elliptical, the attribute of the ball can be automatically given.

ステップＳ６０９において２次元位置情報取得部４０５は、ラベル番号毎に、被写体の２次元位置を算出する。具体的には、各被写体シルエットの所定方向（Ｙ座標）の最小値のＸＹ座標を被写体位置とする。所定方向の最小値のＸＹ座標は、各被写体の足元（Ｙ座標の最小値）のＸＹ座標に相当する。被写体シルエットの足元のＸＹ座標を被写体位置とすることで、被写体の姿勢に影響を受けにくい被写体位置情報を取得できる。なお、ＸＹ座標は図７（ｃ）および（ｄ）のシルエット画像７０３、７０４に示す軸に沿って定義する。このようにして、各被写体について全対象フレームを通じた被写体位置情報（２次元位置）が取得される。全対象フレームを通じた被写体位置情報のことを、経時的な位置情報を示すトラッキングデータともいう。その後、図３（ｂ）のＧＵＩ画面３５０においてユーザがトラッキングダイアログ３７０のトラッキング終了ボタン３７２を押下する。すると、２次元位置情報取得部４０５は、被写体の経時的な２次元位置の情報を記録したトラッキングデータを記憶部１０３に保存する。ＣＰＵ１０１はまた、図３（ａ）に示すＧＵＩ画面３００に表示画面を遷移させ、一連の処理を終了する。このとき、ＣＧウィンドウ３１０は空白のままである。なお、図６の処理においては、全被写体に対して一律に処理を実施しているが、属性を用いてトラッキングデータを生成する対象を限定してもよい。例えば、仮想視点画像において重要な、ボール属性の被写体の周辺に位置する被写体のみをトラッキング対象とすることで、より処理を効率的に実施することも可能である。 In step S609, the two-dimensional position information acquisition unit 405 calculates the two-dimensional position of the subject for each label number. Specifically, the XY coordinate of the minimum value in the predetermined direction (Y coordinate) of each subject silhouette is set as the subject position. The XY coordinate of the minimum value in the predetermined direction corresponds to the XY coordinate of the foot of each subject (minimum value of the Y coordinate). By using the XY coordinates of the foot of the subject silhouette as the subject position, it is possible to acquire subject position information that is not easily influenced by the posture of the subject. The XY coordinates are defined along the axes shown in the silhouette images 703 and 704 in FIGS. 7C and 7D. In this way, subject position information (two-dimensional position) through all target frames is acquired for each subject. The subject position information through all the target frames is also referred to as tracking data indicating temporal position information. Thereafter, the user presses the tracking end button 372 of the tracking dialog 370 on the GUI screen 350 in FIG. Then, the two-dimensional position information acquisition unit 405 stores tracking data in which information on the two-dimensional position of the subject with time is recorded in the storage unit 103. The CPU 101 also transitions the display screen to the GUI screen 300 shown in FIG. 3A, and ends a series of processing. At this time, the CG window 310 remains blank. In the processing of FIG. 6, the processing is uniformly performed on all subjects, but the target for generating tracking data may be limited using attributes. For example, the processing can be performed more efficiently by setting only the subject located around the subject of the ball attribute, which is important in the virtual viewpoint image, as the tracking target.

以上が、図５のステップＳ５０５の被写体の２次元位置を取得する処理の詳細である。次に、図５のステップＳ５０６のＣＧシーンの生成処理の詳細について、図８を用いて説明する。 The above is the details of the process of acquiring the two-dimensional position of the subject in step S505 in FIG. Next, details of the CG scene generation processing in step S506 in FIG. 5 will be described with reference to FIG.

＜ＣＧシーンの生成処理＞
図８は、図５のステップＳ５０６のＣＧシーンの生成処理の詳細を示したフローチャート図である。ステップＳ８０１においてＣＧシーン生成部４０７は、ユーザによって図３（ａ）のＣＧモデル読み込みボタン３２１が押下されたことを検知し、フィールドおよび被写体のＣＧモデルをそれぞれ読み込む。 <CG scene generation processing>
FIG. 8 is a flowchart showing details of the CG scene generation processing in step S506 of FIG. In step S801, the CG scene generation unit 407 detects that the user has pressed the CG model reading button 321 in FIG. 3A, and reads the CG model of the field and the subject, respectively.

図９は、ＣＧモデルの例を示す図である。フィールドモデル９０７は、ポリゴンから構成されており、カメラ群１１１の撮影対象領域であるフィールドのラインなどがテクスチャとして表現されている。被写体モデルは図９に示すように、チームＡモデル９０１、チームＢモデル９０２、キーパーモデル（チームＡ）９０３、キーパーモデル（チームＢ）９０４、審判モデル９０５、ボールモデル９０６が存在する。また、全モデルがポリゴンから構成されており、ユニフォームやボールの模様がテクスチャとして表現されている。これらのモデルは、仮想カメラ経路を設定する際に、ユーザが混乱しないよう、ユニフォームの種類と同数のモデルを準備しておく。もちろん、背番号毎に用意するなど、種類が多い程望ましい。例えば、サッカーの場合、チームＡモデル９０１、チームＢモデル９０２は、フィールドプレイヤーの１０人分をそれぞれのチームで用意すると良い。 FIG. 9 is a diagram illustrating an example of a CG model. The field model 907 is made up of polygons, and field lines, which are shooting target areas of the camera group 111, are represented as textures. As shown in FIG. 9, the subject model includes a team A model 901, a team B model 902, a keeper model (team A) 903, a keeper model (team B) 904, a referee model 905, and a ball model 906. All models are composed of polygons, and uniforms and ball patterns are expressed as textures. As many models as the types of uniforms are prepared so that the user is not confused when setting the virtual camera path. Of course, it is desirable to have more types, such as preparing for each number. For example, in the case of soccer, the team A model 901 and the team B model 902 may prepare 10 field players for each team.

ステップＳ８０２においてＣＧシーン生成部４０７は、ユーザによって図３（ａ）のトラッキングデータ適用ボタン３２２が押下されたことを検知し、被写体の２次元位置を記録したトラッキングデータを読み込む。 In step S 802, the CG scene generation unit 407 detects that the user has pressed the tracking data application button 322 in FIG. 3A, and reads tracking data in which the two-dimensional position of the subject is recorded.

ステップＳ８０３においてＣＧシーン生成部４０７は、読み込んだトラッキングデータを参照して、属性に応じたモデルを複製する。例えば、ＣＧシーン生成部４０７は、チームＡの属性を持つ被写体数ｎＡをカウントする。そして、チームＡモデル９０１をｎＡ個分複製する。同様に、チームＢの属性を持つ被写体数ｎＢをカウントし、チームＢモデル９０２をｎＢ個分複製する。また、審判の属性を持つ被写体数ｎＳをカウントし、審判モデル９０５をｎＳ個分複製する。なお、ここでは、各チームおよび審判のＣＧモデルは同じモデルとして示しているが、被写体ごとにＣＧモデルが用意されている場合には、各被写体のモデルを、対応する分の数だけ複製すればよい。また、キーパーモデルについても同様に複製する。 In step S803, the CG scene generation unit 407 refers to the read tracking data and duplicates a model according to the attribute. For example, the CG scene generation unit 407 counts the number of subjects nA having the team A attribute. Then, nA team A models 901 are duplicated. Similarly, the number of subjects nB having the attribute of team B is counted, and nB team B models 902 are duplicated. In addition, the number nS of subjects having a referee attribute is counted, and nS reference models 905 are duplicated. Here, the CG models of each team and the referee are shown as the same model. However, in the case where a CG model is prepared for each subject, it is necessary to duplicate each subject model by the corresponding number. Good. Similarly, the keeper model is duplicated.

ステップＳ８０４においてＣＧシーン生成部４０７は、フィールドモデル９０７を除く全ＣＧモデルに対して、ステップＳ８０２で読み込んだトラッキングデータに基づいて２次元位置情報を設定する。図９においては、各モデルが必要分複製され、フィールドモデル９０７上に配置されていることが確認できる。また、トラッキングデータは、フレーム単位で記録されている２次元位置情報であるので、フレームが進むごとに被写体が移動するＣＧシーンが構築できる。 In step S804, the CG scene generation unit 407 sets two-dimensional position information for all CG models except the field model 907 based on the tracking data read in step S802. In FIG. 9, it can be confirmed that each model is duplicated as necessary and arranged on the field model 907. In addition, since the tracking data is two-dimensional position information recorded in units of frames, a CG scene in which the subject moves as the frame advances can be constructed.

ステップＳ８０５において表示制御部４１０は、図３（ａ）のＣＧウィンドウ３１０に、生成されたＣＧシーンを表示する。以上の処理により、ＣＧシーンの生成処理が終了する。即ち、表示制御部４１０は、カメラ群１１１の撮影対象領域を示す画像であるフィールドモデル９０７上において、取得された位置情報に対応する位置に、被写体である選手等のオブジェクトを示す情報であるＣＧモデルを重畳して表示させる。 In step S805, the display control unit 410 displays the generated CG scene on the CG window 310 of FIG. With the above processing, the CG scene generation processing ends. In other words, the display control unit 410 is CG which is information indicating an object such as a player who is a subject at a position corresponding to the acquired position information on the field model 907 which is an image indicating a shooting target area of the camera group 111. Display the model superimposed.

なお、このとき、全被写体の高さ情報（Ｚ座標値）は０に設定されている。このため、フィールドに接地していないことがあるボールに対しては、被写体位置情報が正しくない可能性がある。このため、生成されたＣＧシーンにおいて、ボールの位置が正しくない可能性がある。そこで、引き続き図５のステップＳ５０７において特定の被写体（ボール）の被写体位置情報（３次元情報）の取得処理が行われる。 At this time, the height information (Z coordinate value) of all subjects is set to zero. For this reason, the subject position information may not be correct for a ball that may not be in contact with the field. For this reason, the position of the ball may be incorrect in the generated CG scene. Accordingly, in step S507 in FIG. 5, the process of acquiring subject position information (three-dimensional information) of a specific subject (ball) is performed.

＜特定の被写体の被写体位置情報（３次元情報）の取得処理＞
図１０は、図５のステップＳ５０７のボールの３次元位置取得処理の詳細を示したフローチャート図である。この処理は、図８の処理において、誤って設定された可能性がある特定の被写体（ボール）の位置情報を修正する処理に相当する。なお、ボールが空中に存在しないような場面について仮想カメラパスを設定することもある。例えば、ドリブルの場面を再現したい場合などである。このように、ボールの３次元位置の取得が明らかに必要ない場合も想定される。そこで、本実施形態では、必要に応じてユーザが、図３（ａ）のカメラパス設定ダイアログ３３０のボール高さ設定ボタン３３２を押下することにより特定の被写体の３次元位置情報を取得する処理を開始する。なお、このようにユーザによる指示に拠らずに、図１０の処理が開始されてもよい。 <Acquisition processing of subject position information (three-dimensional information) of a specific subject>
FIG. 10 is a flowchart showing details of the three-dimensional position acquisition process of the ball in step S507 of FIG. This process corresponds to the process of correcting the position information of a specific subject (ball) that may have been set incorrectly in the process of FIG. A virtual camera pass may be set for a scene where the ball does not exist in the air. For example, when it is desired to reproduce a dribble scene. In this way, it may be assumed that acquisition of the three-dimensional position of the ball is clearly unnecessary. Therefore, in the present embodiment, a process in which the user acquires the three-dimensional position information of a specific subject by pressing the ball height setting button 332 of the camera path setting dialog 330 in FIG. Start. Note that the process of FIG. 10 may be started without relying on the user instruction.

ステップＳ１００１において３次元位置情報取得部４０８は、鳥瞰図画像における処理対象のフレームにおけるボールのシルエット領域を取得する。ステップＳ１００２において３次元位置情報取得部４０８は、ボールのシルエット領域の面積（画素数）Ｓ１を算出する。ステップＳ１００３において３次元位置情報取得部４０８は、３次元空間内でのトラッキングカメラ２０４の位置とボールのシルエットの中心座標とを結ぶ直線Ｌ（図１１（ａ）参照）を算出する。図１１は、３次元位置情報の取得処理の概要を説明する図である。図１１（ａ）に示すように、ボールの正しい位置は、この直線Ｌ上のいずれかに存在する。なお、トラッキングカメラ２０４の３次元空間における位置や撮影パラメータは既知であり、これらの既知の情報とシルエットの中心座標とを用いて直線Ｌは算出される。 In step S1001, the three-dimensional position information acquisition unit 408 acquires the silhouette area of the ball in the processing target frame in the bird's eye view image. In step S1002, the three-dimensional position information acquisition unit 408 calculates the area (number of pixels) S1 of the silhouette region of the ball. In step S1003, the three-dimensional position information acquisition unit 408 calculates a straight line L (see FIG. 11A) that connects the position of the tracking camera 204 in the three-dimensional space and the center coordinates of the silhouette of the ball. FIG. 11 is a diagram for explaining the outline of the processing for acquiring the three-dimensional position information. As shown in FIG. 11A, the correct position of the ball exists anywhere on the straight line L. Note that the position and shooting parameters of the tracking camera 204 in the three-dimensional space are known, and the straight line L is calculated using the known information and the center coordinates of the silhouette.

ここで、トラッキングカメラ２０４とボールとの間の距離と、ボールのシルエット領域との面積は反比例関係にある。図１１（ａ）と図１１（ｂ）は、直線Ｌ上でのボール位置を異ならせた場合の例を示している。図１１（ａ）と図１１（ｂ）とにおけるトラッキングカメラ２０４とボールと間の距離を比較すると、図１１（ｂ）の方が長い。このため、図１１（ｂ）の方が、図１１（ａ）に比べて鳥瞰図画像におけるボールのシルエット領域は小さくなっていることがわかる。本実施形態では、このようなボールのシルエット領域の大きさを用いてボールの３次元位置情報が取得されることになる。 Here, the distance between the tracking camera 204 and the ball and the area of the ball silhouette region are in an inversely proportional relationship. FIG. 11A and FIG. 11B show an example in which the ball position on the straight line L is varied. Comparing the distance between the tracking camera 204 and the ball in FIGS. 11 (a) and 11 (b), FIG. 11 (b) is longer. Therefore, it can be seen that the silhouette area of the ball in the bird's-eye view image is smaller in FIG. 11B than in FIG. In the present embodiment, the three-dimensional position information of the ball is acquired using the size of the silhouette area of the ball.

ステップＳ１００４において３次元位置情報取得部４０８は、ステップＳ５０６において生成したＣＧシーンにおいて、トラッキングカメラ２０４と同じ３次元位置に、ＣＧトラッキングカメラ１１０１を配置する。このとき、トラッキングカメラ２０４とＣＧトラッキングカメラ１１０１の撮影パラメータ（焦点距離など）は共通化しておく。 In step S1004, the three-dimensional position information acquisition unit 408 arranges the CG tracking camera 1101 at the same three-dimensional position as the tracking camera 204 in the CG scene generated in step S506. At this time, the shooting parameters (such as focal length) of the tracking camera 204 and the CG tracking camera 1101 are made common.

ステップＳ１００５において３次元位置情報取得部４０８は、ＣＧシーン中に直線Ｌを設定する。そして、直線Ｌ上にボールモデル９０６を配置する。つまり、トラッキングカメラ２０４で撮影した画像と同じ構図の画像を、ＣＧシーンを用いて再現する処理が行われる。そして、再現されたＣＧシーンにおけるボールのシルエットの面積が、実画像を変換した鳥瞰図画像のボールのシルエットと等しくなるような位置が、ボールの３次元位置であるものとして推定されることになる。本実施形態では、ボールモデル９０６の配置位置をずらしながら適切なボールの位置を求める処理を行う。ステップＳ１００５では、ボールモデル９０６はＣＧトラッキングカメラ１１０１に近い位置に配置される。 In step S1005, the three-dimensional position information acquisition unit 408 sets a straight line L in the CG scene. Then, the ball model 906 is arranged on the straight line L. That is, processing for reproducing an image having the same composition as the image captured by the tracking camera 204 using a CG scene is performed. Then, a position where the area of the ball silhouette in the reproduced CG scene becomes equal to the ball silhouette of the bird's eye view image obtained by converting the real image is estimated as the three-dimensional position of the ball. In the present embodiment, processing for obtaining an appropriate ball position is performed while shifting the arrangement position of the ball model 906. In step S1005, the ball model 906 is disposed at a position close to the CG tracking camera 1101.

ステップＳ１００６において修正部４０９は、ＣＧトラッキングカメラ１１０１から見たＣＧ画像をレンダリングする。ステップＳ１００７において修正部４０９は、レンダリングされたＣＧ画像を射影変換し、ＣＧ鳥瞰図画像を生成する。ＣＧ鳥瞰図画像の生成は、前述した鳥瞰図画像の生成と同様の方法で行うことができる。 In step S1006, the correction unit 409 renders the CG image viewed from the CG tracking camera 1101. In step S1007, the correction unit 409 performs projective transformation on the rendered CG image and generates a CG bird's-eye view image. The generation of the CG bird's-eye view image can be performed by the same method as the above-described generation of the bird's-eye view image.

ステップＳ１００８において３次元位置情報取得部４０８は、ＣＧ鳥瞰図画像におけるボールモデルのシルエット面積（画素数）Ｓ２を算出する。ＣＧ鳥瞰図画像におけるシルエット画像の生成についても、前述した鳥瞰図画像からシルエット画像を生成する方法と同様の方法で行うことができる。ステップＳ１００９において３次元位置情報取得部４０８は、面積Ｓ１と面積Ｓ２との差分を算出し、差分値が、所定の閾値未満であるかを判定する。差分値が、所定の閾値未満であれば、シルエットが一致したと見なして処理対象のフレームに対する一連の処理を終了し、ステップＳ１０１１に進む。差分値が閾値以上であれば、ステップＳ１０１０に進む。ステップＳ１０１０において修正部４０９は、ボールモデル９０６の位置を更新し、ステップＳ１００６に進む。 In step S1008, the three-dimensional position information acquisition unit 408 calculates a silhouette area (number of pixels) S2 of the ball model in the CG bird's eye view image. The generation of the silhouette image in the CG bird's-eye view image can also be performed by the same method as the method for generating the silhouette image from the bird's-eye view image described above. In step S1009, the three-dimensional position information acquisition unit 408 calculates a difference between the area S1 and the area S2, and determines whether the difference value is less than a predetermined threshold value. If the difference value is less than the predetermined threshold value, it is considered that the silhouettes match, and the series of processes for the processing target frame is terminated, and the process proceeds to step S1011. If the difference value is greater than or equal to the threshold value, the process proceeds to step S1010. In step S1010, the correction unit 409 updates the position of the ball model 906, and proceeds to step S1006.

ステップＳ１００５からＳ１０１０の概要を、図１０（ｃ）を用いて説明する。ＣＧシーン中の直線Ｌ上にボールモデルを配置し、ＣＧ鳥瞰図画像を生成することで、ＣＧシーンにおけるボールモデル９０６のシルエットと実空間におけるボールのシルエットが比較可能となる。そして、双方のサイズが一致した場合、その際のボールモデル９０６の３次元位置が、実空間におけるボールの３次元位置となる。ステップＳ１０１０においては、ボールモデル９０６の３次元座標値をＣＧトラッキングカメラとの距離が大きくなるよう、直線Ｌ上でボールモデル９０６の位置が調整される。 An outline of steps S1005 to S1010 will be described with reference to FIG. By arranging the ball model on the straight line L in the CG scene and generating the CG bird's eye view image, the silhouette of the ball model 906 in the CG scene and the silhouette of the ball in the real space can be compared. When both sizes match, the three-dimensional position of the ball model 906 at that time becomes the three-dimensional position of the ball in real space. In step S1010, the position of the ball model 906 on the straight line L is adjusted so that the distance between the three-dimensional coordinate value of the ball model 906 and the CG tracking camera is increased.

ステップＳ１０１１では、全フレームの処理が終了したかが判定される。全フレームの処理が終了していない場合、ステップＳ１０１２に進み、未処理のフレームを処理対象のフレームと設定し、ステップＳ１００１に戻り、処理を繰り返す。以上の処理により、ボールの正確な３次元位置が設定可能となる。 In step S1011, it is determined whether processing of all frames has been completed. If the processing of all frames has not been completed, the process proceeds to step S1012, the unprocessed frame is set as the processing target frame, the process returns to step S1001, and the process is repeated. Through the above processing, an accurate three-dimensional position of the ball can be set.

その後、前述したように、特定の被写体（ボール）の被写体位置情報を修正したＣＧシーンが生成される。また、生成されたＣＧシーンに基づいて仮想カメラパスの設定などが行われることになる。 Thereafter, as described above, a CG scene in which subject position information of a specific subject (ball) is corrected is generated. In addition, a virtual camera path is set based on the generated CG scene.

以上説明したように、本実施形態においては、３次元の被写体位置情報を取得する対象を特定の被写体に絞り込むことで、１台のカメラ画像から、被写体位置情報を高精度かつ短時間に取得することが可能となる。また、ＣＧにより選手である被写体の位置をフィールドと対応付けて示すため、実空間での被写体の位置をユーザが特定可能にすることができる。したがって、ユーザは、所望とする仮想視点画像を得るための仮想カメラパスを容易に設定することができる。 As described above, in the present embodiment, the subject position information is acquired with high accuracy and in a short time from one camera image by narrowing down the target for acquiring the three-dimensional subject position information to a specific subject. It becomes possible. Further, since the position of the subject that is the player is indicated by the CG in association with the field, the user can specify the position of the subject in the real space. Therefore, the user can easily set a virtual camera path for obtaining a desired virtual viewpoint image.

＜＜実施形態２＞＞
本実施形態では、被写体位置の取得に際して、ボールが楕円状のラグビー競技を題材として説明を行う。なお、処理の大半は実施形態１と共通であるため、重複する部分については説明を省略する。 << Embodiment 2 >>
In the present embodiment, the acquisition of the subject position will be described using a rugby game with an elliptical ball as a theme. Since most of the processing is the same as that of the first embodiment, the description of the overlapping parts is omitted.

図１２は、図５のステップＳ５０６の３次元位置情報の取得処理の詳細を示したフローチャート図である。ステップＳ１２０２、Ｓ１２０８、Ｓ１２０９以外は図１０と同じ処理であるため、説明を省略する。ステップＳ１２０２において３次元位置情報取得部４０８は、ボールのシルエット領域の面積（画素数）Ｓ１と離心率ｒ１とを算出する。離心率は楕円の長軸ａと短軸ｂの比率を表すもので、ボールの回転方向が同じならば一定の値となる。ボールが楕円状の場合、同じ３次元位置にボールがあったとしても、ボールの回転方向に応じてシルエット領域の面積が異なる。そこで、本実施形態では、離心率を考慮した処理を行う。離心率は、以下の式３で算出できる。 FIG. 12 is a flowchart showing details of the three-dimensional position information acquisition process in step S506 of FIG. Except for steps S1202, S1208, and S1209, the processing is the same as that shown in FIG. In step S1202, the three-dimensional position information acquisition unit 408 calculates the area (number of pixels) S1 and the eccentricity r1 of the silhouette region of the ball. The eccentricity represents the ratio of the major axis a and the minor axis b of the ellipse, and is a constant value if the ball rotation direction is the same. When the ball is elliptical, even if the ball is in the same three-dimensional position, the area of the silhouette region differs depending on the rotation direction of the ball. Therefore, in the present embodiment, processing is performed in consideration of the eccentricity. The eccentricity can be calculated by the following formula 3.

ステップＳ１２０８において３次元位置情報取得部４０８は、ＣＧ鳥瞰図画像におけるボールモデルのシルエット面積（画素数）Ｓ２と離心率ｒ２とを算出する。ステップＳ１２０９において修正部４０９は、ボールモデルの位置を変えずに回転のみを行い、ｒ１とｒ２が等しくまで回転を繰り返す。これにより、ボールモデルと実空間のボールの回転方向が一致する。そして、回転方向を一致させた状態で、ステップＳ１２１０においてシルエット面積の比較が行われることになる。このように、本実施形態の処理によれば、ラグビーのように、楕円状のボールに対しても正確な３次元位置が設定可能となる。なお、この例では離心率を基に回転方向を決定する手法に関して説明しているが、楕円形状を表すパラメータであれば何でも構わない。 In step S1208, the three-dimensional position information acquisition unit 408 calculates the silhouette area (number of pixels) S2 and the eccentricity r2 of the ball model in the CG bird's-eye view image. In step S1209, the correction unit 409 performs only rotation without changing the position of the ball model, and repeats rotation until r1 and r2 are equal. Thereby, the rotation direction of the ball model and the ball in the real space coincide. Then, the silhouette areas are compared in step S1210 with the rotation directions being matched. Thus, according to the processing of this embodiment, an accurate three-dimensional position can be set even for an elliptical ball such as rugby. In this example, the method for determining the rotation direction based on the eccentricity is described, but any parameter representing an elliptical shape may be used.

以上説明したように、回転方向を考慮することにより、楕円状のボールを用いる競技においても、１台のカメラ画像から、仮想視点画像生成に特化した被写体位置情報を高精度かつ短時間に取得することができる。 As described above, by considering the rotation direction, subject position information specialized for virtual viewpoint image generation can be obtained with high accuracy and in a short time from a single camera image even in a game using an elliptical ball. can do.

＜＜その他の実施形態＞＞
以上説明した実施形態では、サッカー競技とラグビー競技とを題材に説明したが、これらに限られるものではない。フィールド上において任意の球技を行う形態に本発明を適用することは可能である。また、球技だけでなく、スケート、相撲、コンサートなど様々な撮影対象にも本発明を適用することは可能である。また、３次元位置情報を取得する被写体は、円状（球状）または楕円状の被写体の形状を例に挙げて説明したが、これに限られない。例えば、バドミントンのシャトルのように、特定の形状の被写体を、３次元位置情報を取得する対象の被写体として用いることができる。 << Other Embodiments >>
In the embodiment described above, the soccer competition and the rugby competition have been described as the theme, but the present invention is not limited to these. It is possible to apply the present invention to a form in which an arbitrary ball game is performed on the field. Further, the present invention can be applied not only to ball games but also to various shooting objects such as skating, sumo, and concerts. Further, the subject from which the three-dimensional position information is acquired has been described by taking the shape of a circular (spherical) or elliptical subject as an example, but is not limited thereto. For example, a subject having a specific shape, such as a badminton shuttle, can be used as a subject to acquire three-dimensional position information.

また、ＣＧウィンドウ３１０において、選手ＣＧモデル３１２に個人を識別するための背番号や顔画像をテクスチャとしてマッピングしてもよい。本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Further, in the CG window 310, a back number or a face image for identifying an individual may be mapped to the player CG model 312 as a texture. The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

トラッキングカメラ２０４
画像変換部４０４
２次元位置情報取得部４０５
３次元位置情報取得部４０８ Tracking camera 204
Image conversion unit 404
Two-dimensional position information acquisition unit 405
3D position information acquisition unit 408

Claims

An image processing apparatus that accepts designation of a viewpoint position related to a virtual viewpoint image generated using a plurality of images photographed by a plurality of cameras,
An acquisition means for acquiring position information of an object shown in the image based on an image obtained by imaging the imaging target areas of the plurality of cameras;
Display control means for displaying information indicating the object at a position corresponding to the position information acquired by the acquisition means in the computer graphic indicating the imaging target area;
Accepting means for accepting designation of the position of the viewpoint related to the virtual viewpoint image in a state where the information indicating the object is displayed by the display control means;
An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the information displayed by the display control unit is a computer graphic.

The image processing apparatus according to claim 1, wherein the shooting target areas of the plurality of cameras are fields where soccer or rugby is played.

The reception unit receives specification of a position of a viewpoint related to a virtual viewpoint image based on an operation on a display area that displays an image indicating the imaging target area and information indicating the object. 4. The image processing device according to any one of items 1 to 3.

5. The image processing apparatus according to claim 1, wherein the acquisition unit acquires the position information of the object based on an image obtained by capturing the imaging target region from the sky.

6. The image processing apparatus according to claim 1, further comprising a conversion unit configured to convert an image obtained by photographing the photographing target region into an image obtained by capturing the photographing target region from above.

Conversion means for converting an image of the field into a bird's eye view image of the field viewed from above;
First acquisition means for acquiring two-dimensional position information of a subject from the bird's eye view image;
Generating means for generating a CG scene in which a region of the field including the subject is reproduced by a CG (computer graphic) model using the two-dimensional position information;
Second acquisition means for acquiring three-dimensional position information of a specific subject using the CG scene;
An image processing apparatus comprising: correction means for correcting position information of the specific subject using three-dimensional position information of the specific subject.

The image processing apparatus according to claim 7, wherein the correction unit corrects the CG scene using the corrected position information of the specific subject.

Display control means for displaying the CG scene on a screen;
9. The image processing apparatus according to claim 7, further comprising setting means for setting a designated virtual camera path.

The image processing apparatus according to claim 9, further comprising a first image generation unit configured to generate a virtual viewpoint image using the CG scene according to the set virtual camera path.

Third acquisition means for acquiring a plurality of images obtained by photographing the field from a plurality of viewpoints;
The image processing apparatus according to claim 9, further comprising a second image generation unit configured to generate a virtual viewpoint image using the plurality of images according to the set virtual camera path.

The image processing apparatus according to claim 7, wherein the bird's eye view image is an image converted from an image obtained by photographing the field with a single camera.

The first acquisition means includes
Extract the silhouette of the subject from the bird's eye view image,
The image processing apparatus according to claim 7, wherein coordinates of a minimum value in a predetermined direction of the silhouette are acquired as two-dimensional position information of each subject.

The image processing apparatus according to claim 13, wherein the second acquisition unit acquires the three-dimensional position information using an area of a silhouette of the specific subject.

The image processing apparatus according to claim 14, wherein the second acquisition unit acquires the three-dimensional position information by further using a parameter indicating the shape of the specific subject.

The generation unit generates a CG image having the same composition as an image obtained by photographing the field using the CG scene, generates a CG bird's eye view image from the generated CG image,
The second acquisition means includes a position of a CG model where an area of the silhouette of the specific subject in the bird's-eye view image and an area of the silhouette of the CG model of the specific subject in the CG bird's-eye view image are less than a predetermined threshold. The image processing apparatus according to claim 13, wherein the image processing device is acquired as three-dimensional position information of the specific subject.

The image processing apparatus according to claim 7, wherein the specific subject includes a circular or elliptical subject.

The image processing apparatus according to any one of claims 7 to 17, wherein the CG model uses the same number of types as or more than the types of uniforms of the subject.

The image processing apparatus according to claim 7, wherein the correction unit performs the correction in accordance with an instruction from a user.

Converting an image of the field into a bird's eye view image of the field viewed from above;
Obtaining two-dimensional position information of the subject from the bird's eye view image;
Using the two-dimensional position information to generate a CG scene in which a field area including the subject is reproduced by a CG (computer graphic) model;
Obtaining three-dimensional position information of a specific subject using the CG scene;
And correcting the position information of the specific subject using three-dimensional position information of the specific subject.

An image processing method by an image processing apparatus that accepts designation of a viewpoint position related to a virtual viewpoint image generated using a plurality of images photographed by a plurality of cameras,
An acquisition step of acquiring position information of an object shown in the image based on an image obtained by imaging the imaging target areas of the plurality of cameras;
In the computer graphic indicating the shooting target area, a display control step of displaying information indicating the object at a position corresponding to the position information acquired in the acquisition step;
An accepting step of accepting designation of a position of a viewpoint related to a virtual viewpoint image in a state in which information indicating the object is displayed in the display control step;
An image processing method comprising:

The program for functioning a computer as each means as described in any one of Claims 1-19.