JP7506022B2

JP7506022B2 - Drawing device and program

Info

Publication number: JP7506022B2
Application number: JP2021067557A
Authority: JP
Inventors: 晴久加藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2024-06-25
Anticipated expiration: 2041-04-13
Also published as: JP2022162653A

Description

本発明は、自由視点映像の描画を行う描画装置及びプログラムに関する。 The present invention relates to a rendering device and program for rendering free viewpoint video.

複数のカメラで撮影された映像から任意視点の映像を生成する自由視点映像システムが知られている。自由視点映像を実現する従来技術の例として特許文献１、２に開示のものがあり、ここでは次のような手法が公開されている。特許文献１では、物理カメラで撮影された映像を端末に配信し、端末で自由視点映像を生成することでサーバの負荷を軽減する。特許文献２では、注目する動領域とそれ以外との生成頻度を変えることで自由視点映像の生成負荷を軽減する。 Free viewpoint video systems that generate video from any viewpoint from video captured by multiple cameras are known. Examples of conventional technology for realizing free viewpoint video are disclosed in Patent Documents 1 and 2, which disclose the following method. In Patent Document 1, video captured by a physical camera is delivered to a terminal, and the free viewpoint video is generated on the terminal, thereby reducing the load on the server. In Patent Document 2, the load of generating the free viewpoint video is reduced by changing the generation frequency between the dynamic area of interest and the rest.

特開２０１９－１２１８６７号公報JP 2019-121867 A 特開２０１８－０６７１０６号公報JP 2018-067106 A

しかしながら、以上のような従来技術は、自由視点映像を効率的に提供することに関して課題を有していた。 However, the conventional techniques described above have issues with efficiently providing free viewpoint video.

特許文献１では、サーバの負荷を軽減できるが、端末の負荷が増大するという問題がある。特に、スマートフォンやスマートグラスなどの比較的処理性能に乏しい端末では負荷が大きく処理できない。特許文献２では、被写体の表面のテクスチャ生成については処理負荷軽減に言及していないという課題がある。自由視点映像では仮想視点に応じて近傍の物理カメラの映像からテクスチャを生成するが、映像が高精細であるほど映像を読みこむ処理及びメモリに展開する処理が大きい。特に、視聴者が多い場合、視点数に比例して負荷が増大するため自由視点映像をリアルタイムに生成することができないという問題がある。 Patent Document 1 reduces the load on the server, but there is a problem in that the load on the terminal increases. In particular, the load is too high for terminals with relatively poor processing performance, such as smartphones and smart glasses, to process. Patent Document 2 has the problem that it does not mention reducing the processing load for generating texture on the surface of the subject. In free viewpoint video, texture is generated from images from a nearby physical camera according to the virtual viewpoint, but the higher the resolution of the image, the larger the processing required to read the image and expand it into memory. In particular, when there are many viewers, the load increases in proportion to the number of viewpoints, and there is a problem in that free viewpoint video cannot be generated in real time.

上記従来の課題に鑑み、本発明は、効率的な描画装置及びプログラムを提供することを目的とする。 In view of the above-mentioned problems, the present invention aims to provide an efficient drawing device and program.

上記目的を達成するため、本発明は描画装置であって、多視点画像より３次元モデルを構築する構築部と、複数ユーザの各々より仮想視点の指定を受け付け、当該複数の仮想視点を統合して代表視点を得る統合部と、前記多視点画像のテクスチャを、前記代表視点に設定される仮想カメラ位置のもとで前記３次元モデルに対して描画することで代表視点での描画結果を得る描画部と、前記代表視点での描画結果を前記複数ユーザの各々が指定した仮想視点での描画結果に変換する変換部と、を備えることを特徴とする。また、コンピュータを前記描画装置として機能させるプログラムであることを特徴とする。 To achieve the above object, the present invention is a rendering device comprising: a construction unit that constructs a three-dimensional model from multi-viewpoint images; an integration unit that accepts virtual viewpoints designated by each of a plurality of users and integrates the plurality of virtual viewpoints to obtain a representative viewpoint; a rendering unit that obtains a rendering result at the representative viewpoint by rendering the texture of the multi-viewpoint images on the three-dimensional model from a virtual camera position set at the representative viewpoint; and a conversion unit that converts the rendering result at the representative viewpoint into a rendering result at a virtual viewpoint designated by each of the plurality of users. The present invention is also characterized as a program that causes a computer to function as the rendering device.

本発明によれば、複数の仮想視点を統合した代表視点のみについて描画を行い、各々の仮想視点での描画結果はこれを変換して得ることにより、効率的な描画が可能である。 According to the present invention, rendering is performed only for a representative viewpoint that combines multiple virtual viewpoints, and the rendering results for each virtual viewpoint are obtained by converting this, enabling efficient rendering.

一実施形態に係る描画システムの構成図である。FIG. 1 is a configuration diagram of a drawing system according to an embodiment. 撮像設備におけるカメラ配置の例を示す図である。FIG. 1 is a diagram showing an example of camera arrangement in an imaging facility. 描画システムにおいて提供される、ユーザ所望の仮想視点における自由視点映像表示の模式例を示す図である。1 is a diagram showing a schematic example of a free viewpoint video display at a virtual viewpoint desired by a user, provided in a rendering system; 一実施形態に係る描画システムの機能ブロック図である。FIG. 1 is a functional block diagram of a drawing system according to an embodiment. 一実施形態に係る描画システムの動作のフローチャートである。1 is a flowchart of an operation of a drawing system according to an embodiment. 一実施形態に係る統合部の処理を説明するための模式例を示す図である。FIG. 13 is a diagram illustrating a schematic example for explaining a process of an integration unit according to an embodiment. 変換部での変換処理の模式例を示す図である。FIG. 13 is a diagram showing a schematic example of a conversion process in a conversion unit. 対象の注視点の区別の例を示す図である。FIG. 13 is a diagram showing an example of distinguishing the gaze points of a target. 一般的なコンピュータにおけるハードウェア構成を示す図である。FIG. 1 is a diagram illustrating a hardware configuration of a typical computer.

図１は、一実施形態に係る描画システムの構成図である。描画システム100は、自由視点映像の視聴を行うユーザがそれぞれ利用するM個（M≧2）の端末10-1,10-2,…,10-Mと、描画装置としてのサーバ20と、当該自由視点映像の生成対象となる共通コンテンツ（例えばスポーツ）の撮像を行うN個（N≧2）のカメラC1,C2,…,CNで構成される撮像装置30と、を備える。これら描画システム100の構成要素はネットワークNWを介して相互に通信可能である。 Figure 1 is a configuration diagram of a drawing system according to one embodiment. The drawing system 100 comprises M (M≧2) terminals 10-1, 10-2, ..., 10-M each used by a user who watches a free viewpoint video, a server 20 as a drawing device, and an imaging device 30 consisting of N (N≧2) cameras C1, C2, ..., CN that capture images of common content (e.g., sports) for which the free viewpoint video is to be generated. These components of the drawing system 100 can communicate with each other via a network NW.

図２は、撮像設備30におけるカメラ配置の例を、カメラ個数N=8として示す図である。各カメラC1～C8は、自由視点映像で３次元モデルを生成して描画される対象OB（例えばスポーツ試合中の選手）を円周状に取り囲むように配置され、この対象OBを各カメラ位置において撮像し、リアルタイムの各時刻t（=1,2,…）において撮像画像を得ることにより各カメラ位置において映像を取得する。各カメラC1～C8及び対象OBは、当該撮像が行われている現実世界のフィールド上の地面PL上に概ね同じ高さで位置している。 Figure 2 shows an example of camera arrangement in imaging equipment 30, where the number of cameras is N=8. Each camera C1-C8 is arranged to circumferentially surround a target OB (e.g., a player in a sports game) that is rendered by generating a 3D model using free viewpoint video, and images of this target OB are captured at each camera position, and captured images are obtained at each time t (=1, 2, ...) in real time, thereby acquiring images at each camera position. Each camera C1-C8 and the target OB are positioned at approximately the same height above the ground PL on the field in the real world where the imaging is being performed.

なお、図２の撮像設備30等の配置は模式例に過ぎず、自由視点映像で３次元モデルを生成される対象OBを互いに異なるカメラ位置及びカメラ姿勢（向き）で撮像するように現実世界に配置された、任意数のN個のカメラで撮像設備30を構成してよい。例えば図２のように平面上で対象OBを円周状に取り囲んで撮像するのではなく、当該平面上に構成される半球面上で対象OBを取り囲んで撮像するようにしてもよい。対象OBを取り囲んで撮像しながら、時刻tに応じて撮像設備30内のカメラの全部または一部が現実世界のフィールド内を手動又は自動により移動してもよいし、カメラの全部または一部はフィールド上に固定設置されて位置姿勢が変化しないものであってもよい。対象OBについても、現実世界における１つ以上の対象（例えばスポーツ試合中の複数の選手）で構成されるものであってもよい。 Note that the arrangement of the imaging equipment 30 in FIG. 2 is merely a schematic example, and the imaging equipment 30 may be configured with any number of N cameras arranged in the real world to capture the target OB, whose three-dimensional model is to be generated in the free viewpoint video, at different camera positions and camera attitudes (directions). For example, instead of capturing the target OB by surrounding it in a circular shape on a plane as in FIG. 2, the target OB may be captured by surrounding it on a hemispherical surface configured on the plane. While capturing the target OB by surrounding it, all or some of the cameras in the imaging equipment 30 may move manually or automatically within the field in the real world according to time t, or all or some of the cameras may be fixedly installed on the field and the position and attitude may not change. The target OB may also be composed of one or more objects in the real world (e.g., multiple players in a sports game).

図１の構成の描画システム100においては通信機能により、複数M個の端末10（端末10-1,10-2,…,10-Mの任意の１つを端末10とし、以下同様とする。）の各々から、端末10が送信してサーバ20で受信する情報として各ユーザが視聴したい仮想視点の座標情報を送受信し、この逆にサーバ20が送信して端末10で受信する情報として、当該仮想視点での描画結果を送受信する。 In the drawing system 100 of the configuration shown in FIG. 1, the communication function transmits and receives from each of a number M of terminals 10 (any one of terminals 10-1, 10-2, ..., 10-M is referred to as terminal 10, and the same applies below) coordinate information of the virtual viewpoint that each user wishes to view as information transmitted by the terminal 10 and received by the server 20, and conversely, the server 20 transmits and receives the drawing result from that virtual viewpoint as information transmitted by the terminal 10 and received by the terminal 10.

描画システム100はその全体的な動作として、例えば30fps（frame per second）といったような所定の処理レートに応じて撮像設備30の複数N個のカメラC1,C2,…,CNで時刻同期がとられた各時刻tにおいて対象OBの撮像を行い、指定された仮想視点に応じた描画処理をサーバ20において行い、描画結果を端末10において表示することで、端末10を利用するユーザに対して指定された仮想視点に応じた自由視点映像表示を提供するものである。この動作の詳細に関して図４及び図５を参照して後述する。 The overall operation of the drawing system 100 is to capture images of the target OB at each time t that is time-synchronized by multiple N cameras C1, C2, ..., CN of the imaging equipment 30 according to a predetermined processing rate such as 30 fps (frames per second), perform drawing processing according to a specified virtual viewpoint in the server 20, and display the drawing results on the terminal 10, thereby providing a free viewpoint video display according to the specified virtual viewpoint to the user using the terminal 10. Details of this operation will be described later with reference to Figures 4 and 5.

図３は、描画システム100において提供される、ユーザ所望の仮想視点における自由視点映像表示の模式例を示す図であり、ある共通の時刻tにおいて図２の対象OB（スポーツ選手等の人物を模式的に示す）の３次元モデルMDを生成してユーザが所望する仮想視点で描画した３つの描画結果G1～G3の例が示されている。１つめの描画結果G1は、３次元モデルMDを正面右手側の向き且つ遠距離位置から視聴する仮想視点を１人目のユーザが指定して描画されたものであり、２つめの描画結果G2は、３次元モデルMDを正面の向き且つ近距離位置から視聴する仮想視点を２人目のユーザが指定して描画されたものであり、３つめの描画結果G3は、３次元モデルMDを正面左手側の向き且つ中距離位置から視聴する仮想視点を３人目のユーザが指定して描画されたものである。 Figure 3 is a diagram showing a schematic example of a free viewpoint video display at a user-desired virtual viewpoint provided by the drawing system 100, and shows three examples of drawing results G1 to G3 in which a three-dimensional model MD of the object OB (schematically showing a person such as an athlete) in Figure 2 is generated at a common time t and drawn at a virtual viewpoint desired by the user. The first drawing result G1 is drawn by a first user specifying a virtual viewpoint from which the three-dimensional model MD is viewed from a far-distance position facing the right hand side of the front, the second drawing result G2 is drawn by a second user specifying a virtual viewpoint from which the three-dimensional model MD is viewed from a near-distance position facing the front, and the third drawing result G3 is drawn by a third user specifying a virtual viewpoint from which the three-dimensional model MD is viewed from a medium-distance position facing the left hand side of the front.

図３の例に示されるように、本実施形態の描画システム100においては、共通時刻tにおいて共通の現実世界の対象OBについての３次元モデルMDを生成したうえで、各ユーザの所望する仮想視点において当該３次元モデルMDの描画結果を提供することが可能となる。 As shown in the example of Figure 3, in the drawing system 100 of this embodiment, a three-dimensional model MD is generated for a common real-world object OB at a common time t, and then the drawing result of the three-dimensional model MD can be provided at the virtual viewpoint desired by each user.

図４は、一実施形態に係る描画システム100の機能ブロック図であり、図示するように描画システム100において、端末10は指定部11及び表示部12を備え、サーバ20（描画装置20）は構築部21、統合部22、生成部23及び変換部24を備える。なお、前述の通り、図４の端末10は図１に示すM個のうちの任意の１つとして、当該M個の端末10-1,10-2,…,10-Mにおける共通の機能ブロック構成を示すものである。図５は、一実施形態に係る描画システム100の動作のフローチャートである。以下、図５の各ステップを説明しながら、図４の各部の処理内容の詳細に関して説明する。 Figure 4 is a functional block diagram of a drawing system 100 according to an embodiment. As shown in the figure, in the drawing system 100, the terminal 10 includes a designation unit 11 and a display unit 12, and the server 20 (drawing device 20) includes a construction unit 21, an integration unit 22, a generation unit 23, and a conversion unit 24. As described above, the terminal 10 in Figure 4 is any one of the M terminals shown in Figure 1, and shows a common functional block configuration in the M terminals 10-1, 10-2, ..., 10-M. Figure 5 is a flowchart of the operation of the drawing system 100 according to an embodiment. Below, the details of the processing contents of each unit in Figure 4 will be described while explaining each step in Figure 5.

図５のフローはステップS1～S6を備えて構成されるが、その全体概要は次の通りである。すなわち、処理ステップ群SG(t)としてのステップS1～S5は、前述したようにリアルタイムの各時刻t=1,2,…で描画システム100が繰り返し実行することで各ユーザの端末10において仮想視点での自由視点映像の視聴を可能とするための各処理を示し、ステップS6は、このようにリアルタイムの各時刻t=1,2,…で処理が行われることを示すための時間更新（及び描画システム100での処理タイミングの管理）を表している。 The flow in FIG. 5 comprises steps S1 to S6, and the overall outline thereof is as follows. That is, steps S1 to S5 as processing step group SG(t) represent each process for enabling viewing of free viewpoint video from a virtual viewpoint on each user's terminal 10 by repeatedly executing the rendering system 100 at each real-time time t=1, 2, ... as described above, and step S6 represents a time update (and management of the processing timing in the rendering system 100) to indicate that processing is thus performed at each real-time time t=1, 2, ....

ステップS1では、現時刻tにおいて以下の２つの処理を行ってから、ステップS2へと進む。 In step S1, the following two processes are performed at the current time t, and then the process proceeds to step S2.

（１）撮像設備30において撮像を行い、対象OBの多視点画像MV(t)を得る。 (1) Imaging is performed using the imaging equipment 30 to obtain a multi-view image MV(t) of the target OB.

撮像設備30のN個の各カメラC1,C2,…,CNが対象OBの撮像を行うことでN視点の多視点画像MV(t)={Pic1(t),Pic2(t),…,PicN(t)}（ここで、PicK(t)（K=1,2,…,N）はカメラCK（K=1,2,…,N）が当該時刻tで撮像した画像である）を得て、この多視点画像MV(t)をサーバ20へと送信することで構築部21及び生成部23へと出力する。 Each of the N cameras C1, C2, ..., CN of the imaging equipment 30 captures an image of the target OB, thereby obtaining a multi-view image MV(t) = {Pic1(t), Pic2(t), ..., PicN(t)} from N viewpoints (where PicK(t) (K = 1, 2, ..., N) is an image captured by camera CK (K = 1, 2, ..., N) at time t). This multi-view image MV(t) is sent to the server 20, which outputs it to the construction unit 21 and the generation unit 23.

（２） M個の端末10の各々において、仮想視点の座標pi(t)及び注視方向di(t)を指定する。 (2) For each of the M terminals 10, specify the coordinates pi(t) of the virtual viewpoint and the gaze direction di(t).

M個の各々の端末10の指定部11において、ユーザ入力により仮想視点の座標pi(t)及び注視方向di(t)（iは端末10及びこの端末10を利用するユーザの識別子であり、i=1,2,…,Mとする）の指定を受け付け、この仮想視点座標pi(t)及び注視方向di(t)(i=1,2,…,M)をサーバ20へと送信することで統合部22へと出力する。 The designation unit 11 of each of the M terminals 10 accepts designation of the virtual viewpoint coordinates pi(t) and gaze direction di(t) (i is an identifier of the terminal 10 and the user who uses this terminal 10, where i = 1, 2, ..., M) through user input, and outputs these virtual viewpoint coordinates pi(t) and gaze direction di(t) (i = 1, 2, ..., M) to the integration unit 22 by transmitting them to the server 20.

当該M人のユーザの各々における仮想視点座標pi(t) (i=1,2,…,M)は、対象OBの３次元モデルを描画する３次元仮想空間VSP内における３次元座標位置pi(t)=(xi(t),yi(t),zi(t))として指定され、空間位置を指定するための任意のインタフェースを用いて指定部11においてユーザ入力を受け付けることにより、この座標pi(t)の指定を受け付けることができる。例えば、ユーザはキーボード、タッチパネル、マウス等の入力インタフェースを操作することにより、３次元仮想空間VSP内で仮想視点を連続的に移動する指示を行うものとし、当該時刻tにおける直前時刻t-1の位置pi(t-1)からの移動量Δpi(t)=(Δxi,Δyi,Δzi)を指定することで、以下のように現時刻tの座標pi(t)の指定を受け付けるようにしてよい。
pi(t)=pi(t-1)+Δpi(t) The virtual viewpoint coordinates pi(t) (i=1,2,...,M) of each of the M users are specified as three-dimensional coordinate positions pi(t)=(xi(t),yi(t),zi(t)) in the three-dimensional virtual space VSP in which the three-dimensional model of the target OB is rendered, and the specification of the coordinates pi(t) can be accepted by accepting user input in the specification unit 11 using any interface for specifying a spatial position. For example, the user may operate an input interface such as a keyboard, a touch panel, or a mouse to give an instruction to continuously move the virtual viewpoint in the three-dimensional virtual space VSP, and may specify a movement amount Δpi(t)=(Δxi, Δyi, Δzi) from the position pi(t-1) at the previous time t-1 at the time t, thereby accepting the specification of the coordinates pi(t) at the current time t as follows:
pi(t)=pi(t-1)+Δpi(t)

また、上記のようにユーザ入力による操作で３次元仮想空間VSP内の仮想視点座標pi(t)を移動させて行った結果、ユーザが望まない仮想視点の位置となってしまう場合もありうるため、このような場合は仮想視点座標pi(t)を所定位置Ref(t)（例えば当該時刻tにおいて対象OBの３次元モデルを正面から眺める位置）にリセットする旨の指示を以下のように受け付けるようにしてもよい。なお、最初の時刻t=0では当該リセットを適用することで、仮想視点座標pi(0)を初期位置Ref(0)に設定するようにしてよい。
pi(t)=Ref(t) In addition, as a result of moving the virtual viewpoint coordinate pi(t) in the three-dimensional virtual space VSP by the operation by the user input as described above, it may become the position of the virtual viewpoint that the user does not want, so in such a case, an instruction to reset the virtual viewpoint coordinate pi(t) to a predetermined position Ref(t) (for example, a position where the three-dimensional model of the target OB is viewed from the front at the time t) may be accepted as follows. Note that, by applying this reset at the initial time t=0, the virtual viewpoint coordinate pi(0) may be set to the initial position Ref(0).
pi(t)=Ref(t)

仮想視点の注視方向di(t)=(dxi(t),dyi(t),dzi(t)) (i=1,2,…,M)についても、以上の仮想視点座標pi(t) (i=1,2,…,M)の指定と同様に、連続的な移動の指示やリセット指示として受け付けることができる。 The gaze direction of the virtual viewpoint di(t)=(dxi(t),dyi(t),dzi(t)) (i=1,2,…,M) can also be accepted as an instruction for continuous movement or a reset instruction, similar to the specification of the virtual viewpoint coordinate pi(t) (i=1,2,…,M) above.

あるいは、仮想視点の注視方向di(t)に関しては、上記のようにユーザがマニュアル指定するのではなく、マニュアル指定された仮想視点座標pi(t)に応じて自動で指定されるものとしてもよい。例えば、ユーザは予め視聴を希望する対象OB（複数のうちのいずれか）を指定しておき、当該視聴希望される対象OBの３次元モデルMD(t)内の所定位置（例えば重心位置）をpos(t)とし、注視方向di(t)は、仮想視点座標pi(t)を始点としてこの所定位置pos(t)を終点とする方向ベクトルに平行となるように設定しておいてもよい。この設定によれば、ユーザ指定される仮想視点座標pi(t)から常に３次元モデルMD(t)の方を向いて注視する状態として、仮想視点が設定されることとなる。 Alternatively, the gaze direction di(t) of the virtual viewpoint may be automatically specified according to the manually specified virtual viewpoint coordinates pi(t) instead of being manually specified by the user as described above. For example, the user may specify in advance the target OB (one of multiple) that the user wishes to view, and a predetermined position (e.g., the center of gravity position) in the three-dimensional model MD(t) of the target OB that the user wishes to view may be set as pos(t), and the gaze direction di(t) may be set to be parallel to a direction vector that starts at the virtual viewpoint coordinates pi(t) and ends at this predetermined position pos(t). According to this setting, the virtual viewpoint is set so that it always looks toward the three-dimensional model MD(t) from the virtual viewpoint coordinates pi(t) specified by the user.

ステップS2では、現時刻tにおいて以下の２つの処理を行ってから、ステップS3へと進む。 In step S2, the following two processes are performed at the current time t, and then the process proceeds to step S3.

（１）構築部21が多視点画像MV(t)より現時刻tの対象OBの3次元モデルMD(t)を生成する。 (1) The construction unit 21 generates a three-dimensional model MD(t) of the object OB at the current time t from the multi-viewpoint image MV(t).

構築部21は、撮像設備30から得たN視点の多視点画像MV(t)={Pic1(t),Pic2(t),…,PicN(t)}を画像処理することにより、撮像されている対象OBの現時刻tでの３次元形状を表現した３次元モデルMD(t)を構築し、生成部23及び変換部24へと出力する。多視点画像MV(t)から３次元モデルMD(t)を構築する処理には任意の既存手法を用いてよく、例えば視体積交差法により３次元モデルMD(t)を構築してよい。既知のように、視体積交差法では、多視点画像の各画像に例えば背景差分法を適用して対象OBが占める前景シルエット領域（画像平面における領域）を抽出し、各視点のカメラ位置からこの前景シルエット領域へと３次元逆投影を行って得られる錐体（視体積）の各視点での重複領域として、対象OBが３次元空間内で占める体積領域を求めることができる。当該３次元逆投影は例えばボクセル空間において行い、ボクセル領域として対象OBが３次元空間内で占める体積領域を求めたうえで、マーチングキューブ法等の任意の既存手法により、ポリゴンモデルとしての対象OBの３次元モデルMD(t)を構築することができる。 The construction unit 21 performs image processing on the multi-view images MV(t)={Pic1(t), Pic2(t), ..., PicN(t)} from N viewpoints obtained from the imaging equipment 30 to construct a three-dimensional model MD(t) that represents the three-dimensional shape of the captured object OB at the current time t, and outputs the model to the generation unit 23 and the conversion unit 24. Any existing method may be used to construct the three-dimensional model MD(t) from the multi-view images MV(t), and for example, the three-dimensional model MD(t) may be constructed by the visual hull intersection method. As is known, in the visual hull intersection method, for example, a background subtraction method is applied to each image of the multi-view images to extract a foreground silhouette area (area in the image plane) occupied by the object OB, and a volumetric area occupied by the object OB in three-dimensional space can be obtained as an overlapping area at each viewpoint of a cone (visual hull) obtained by performing a three-dimensional back projection from the camera position of each viewpoint onto this foreground silhouette area. The three-dimensional backprojection is performed, for example, in voxel space, and the volumetric area that the target OB occupies in three-dimensional space is obtained as a voxel area. Then, a three-dimensional model MD(t) of the target OB as a polygon model can be constructed using any existing method, such as the marching cubes method.

（２）統合部22がM個の仮想視点（座標pi(t)及び注視方向di(t)）を統合して代表視点を得る。 (2) The integration unit 22 integrates the M virtual viewpoints (coordinates pi(t) and gaze directions di(t)) to obtain a representative viewpoint.

統合部22では、端末10の各々から得られたM個の仮想視点座標pi(t)に対して、クラスタリングを適用して、M個よりも少ないクラスタの各々における代表位置としての代表視点を得て、この代表視点を生成部23へと出力する。ここで、当該クラスタリングは描画の基準位置となる仮想視点（仮想カメラの位置）としての役割を考慮したものとして行われることにより、後段側の生成部23において３次元モデルMD(t)のテクスチャを生成する際に最適な仮想視点となるように統合されたものとしての代表視点を得ることができ、具体的には以下のようにすればよい。 The integration unit 22 applies clustering to the M virtual viewpoint coordinates pi(t) obtained from each of the terminals 10 to obtain a representative viewpoint as a representative position in each of the fewer than M clusters, and outputs this representative viewpoint to the generation unit 23. Here, the clustering is performed while taking into consideration the role of the virtual viewpoint (virtual camera position) that serves as the reference position for drawing, so that a representative viewpoint can be obtained that is integrated to become the optimal virtual viewpoint when generating the texture of the three-dimensional model MD(t) in the downstream generation unit 23, and specifically, this can be done as follows.

図６は、一実施形態に係る統合部22の処理を説明するための模式例を示す図である。図６では撮像設備30の構成は、図２で示した対象OBを円周状に囲んで撮像している８個のカメラC1～C8と同一である場合を例とする。図２では現実世界において対象OB及び８個のカメラC1～C8が配置されていたが、この現実世界の３次元世界座標系をそのまま用いて、図６に示すような仮想空間VSP（対象OBをその３次元モデルMD(t)として描画するための仮想空間VSP）を定義することができる。 Figure 6 is a diagram showing a schematic example for explaining the processing of the integration unit 22 according to one embodiment. In Figure 6, the configuration of the imaging equipment 30 is the same as that of the eight cameras C1 to C8 shown in Figure 2, which capture images of the target OB by surrounding it in a circular shape. In Figure 2, the target OB and the eight cameras C1 to C8 are arranged in the real world, but the three-dimensional world coordinate system of this real world can be used as is to define a virtual space VSP as shown in Figure 6 (a virtual space VSP for depicting the target OB as its three-dimensional model MD(t)).

なお、前述の指定部11では、仮想視点（座標pi(t)及び注視方向di(t)）をこのように現実世界の３次元座標系と共通の座標系で定義される仮想空間VSPにおいて与えるようにすることで、仮想視点すなわち仮想カメラの配置を、物理カメラであるカメラC1～C8の配置と対応付けて与えることができる。（例えば、ある１つの仮想カメラを、物理カメラC1と同じ配置となるように指定するといったことが可能となる。） In addition, in the above-mentioned designation unit 11, by providing the virtual viewpoint (coordinates pi(t) and gaze direction di(t)) in the virtual space VSP defined in a coordinate system common to the three-dimensional coordinate system of the real world, the arrangement of the virtual viewpoint, i.e., the virtual camera, can be given in correspondence with the arrangement of the physical cameras C1 to C8. (For example, it is possible to designate one virtual camera so that it is in the same arrangement as the physical camera C1.)

以下、図６の例を参照しながら統合部22の処理を説明する。なお、当該説明においては、時刻tは当該現時刻のみであるため、時刻t依存の表記を省略して、仮想視点の座標pi(t)をpiと表記し、注視方向di(t)をdiと表記する。 The processing of the integration unit 22 will be described below with reference to the example in FIG. 6. Note that in this description, since time t refers only to the current time, notation dependent on time t will be omitted, and the coordinates pi(t) of the virtual viewpoint will be written as pi, and the gaze direction di(t) will be written as di.

まず、対象OBの注視点を原点としたとき同じ注視点を持つ仮想視点について方向単位で後記クラスタリングを実行するため、複数の任意視点を正規化する。すなわち、M個の視点座標pi(i=1,2,…,M)について、それぞれの正規化座標Piは次式で与えられる。
Pi=pi/|pi| First, in order to perform the clustering described below in units of directions for virtual viewpoints having the same gaze point when the gaze point of the target OB is set as the origin, multiple arbitrary viewpoints are normalized. That is, for M viewpoint coordinates pi (i = 1, 2, ..., M), each normalized coordinate Pi is given by the following equation.
Pi=pi/|pi|

なお、対象OBの注視点としては例えば重心を設定すればよい。上記の正規化座標Piは、当該重心を原点として設定した視点座標piを用いて計算することにより、対象OBから視点座標piに向かう単位方向ベクトルとして算出されるものとなる。 The center of gravity may be set as the gaze point of the target OB, for example. The normalized coordinate Pi is calculated as a unit direction vector from the target OB toward the viewpoint coordinate pi by using the viewpoint coordinate pi with the center of gravity set as the origin.

図６の例では、M=6であるものとして、6個の仮想視点座標p1～p6と、その正規化座標P1～P6のうちP1～P3の位置が白丸（○）として、対象OBの注視点を中心とする単位球SPの表面上に示されている。（なお、正規化座標P4～P6は、当該図示する正規化座標P1～P3の背面側に位置するため、図面の煩雑化を防ぐ観点から不図示としている。） In the example of Figure 6, M=6, and six virtual viewpoint coordinates p1-p6 and their normalized coordinates P1-P6, of which the positions P1-P3 are shown as white circles (○) on the surface of a unit sphere SP centered on the gaze point of the target OB. (Note that normalized coordinates P4-P6 are located behind the normalized coordinates P1-P3 shown in the figure, and are therefore not shown in order to avoid cluttering the drawing.)

なお、統合部22で利用される「対象OBの注視点」とは、各ユーザの視点座標piから見た注視方向diに応じて定まるものであるが、当該pi及びdiで定まる視線方向（仮想カメラの光軸方向）よりも広い概念である。すなわち、「対象OBの注視点」とは、描画システム100において描画する３Ｄコンテンツを構成する１つ以上の対象のうち、いずれを視聴対象とし、いずれを視聴対象としないかを区別するもの（すなわち、各ユーザに対して複数の３Ｄモデルのうちいずれの描画が必要となるかを特定するもの）であって、各ユーザが視点座標pi及び注視方向diを指定部11において指定することにより、描画対象の３Ｄコンテンツに応じて定まるものである。（例えば、あるユーザの視点座標piと別のユーザの視点座標pjとが同一（pi=pj）で、注視方向di,djが少し異なる場合であっても、同一の対象OBが視界に入っていれば、「対象OBの注視点」は当該両ユーザで共通となる。）図６の例は、描画される３Ｄコンテンツは単一の対象OB（例えば１人のスポーツ選手）のみで構成され、M=6人の全てのユーザが当該単一の対象OBが視聴対象に含まれるような視点座標pi及び注視方向diを指定した状態となることで、「対象OBの注視点」が全ユーザに共通の１点として定まる例である。以下この図６の例のように、「対象OBの注視点」は全ユーザに共通の１点となる場合に関して説明し、複数点の場合や共通ではない場合に関しては後述する。 Note that the "gazing point of the target OB" used by the integration unit 22 is determined according to the gaze direction di as seen from the viewpoint coordinate pi of each user, but is a broader concept than the line of sight direction (the optical axis direction of the virtual camera) determined by pi and di. In other words, the "gazing point of the target OB" is what distinguishes which of one or more objects constituting the 3D content rendered in the rendering system 100 are to be viewed and which are not (i.e., what specifies which of the multiple 3D models needs to be rendered for each user), and is determined according to the 3D content to be rendered by each user specifying the viewpoint coordinate pi and gaze direction di in the specification unit 11. (For example, even if the viewpoint coordinate pi of one user and the viewpoint coordinate pj of another user are the same (pi=pj) and the gaze directions di, dj are slightly different, if the same target OB is in the field of view, the "gazing point of the target OB" will be common to both users.) The example in FIG. 6 shows an example in which the rendered 3D content is composed of only a single target OB (for example, one sports player), and all M=6 users specify viewpoint coordinates pi and gaze directions di such that the single target OB is included in the viewing target, thereby determining the "gazing point of the target OB" as a single point common to all users. Below, we will explain the case where the "gazing point of the target OB" is a single point common to all users, as in the example in FIG. 6, and will discuss later the cases where there are multiple points or where they are not common.

次に、正規化された座標Pi（i=1,2,…,M）をクラスタリングし、当初の仮想視点の個数Mより少ないMc（Mc<M）個の各クラスタCLjの単位球面SP上に正規化された重心Gj (j=1,2,…,Mc)を以下の式のように求める。（Num_jはクラスタCLjの要素数であり、和Σ_iは当該クラスタCLjに属するNum_j個の要素Pi∈CLjの和として求める。）クラスタリングには、k-means法やクラスタ数を自動的に決定するx-means法、g-means法など既存の手法を利用することができる。
gj=Σ_i Pi/Num_j
Gj=gj/|gj| Next, the normalized coordinates Pi (i=1,2,...,M) are clustered, and the normalized center of gravity Gj (j=1,2,...,Mc) on the unit sphere SP of each cluster CLj (Mc<M) which is less than the initial number M of virtual viewpoints is calculated using the following formula. (Num_j is the number of elements in cluster CLj, and the sum Σ _i is calculated as the sum of Num_j elements Pi∈CLj belonging to the cluster CLj.) For clustering, existing methods such as the k-means method, the x-means method which automatically determines the number of clusters, and the g-means method can be used.
gj=Σ _i Pi/Num_j
Gj=gj/|gj|

別の実施例では、注視点から仮想視点pi（正規化座標Piではなく当初の座標pi）までの距離に比例して画像品質の劣化が目立ちにくくなることから、以下の式のように当該距離の逆数を重みwiとして単位球面SP上に正規化された重心Gj（正規化座標Piの重心）を算出する。
gj= (Σ_i wi*Pi)/(Σ_i wi)
Gj=gj/|gj| In another embodiment, since degradation of image quality becomes less noticeable in proportion to the distance from the gaze point to the virtual viewpoint pi (the original coordinate pi, not the normalized coordinate Pi), the center of gravity Gj (the center of gravity of the normalized coordinate Pi) normalized on the unit sphere SP is calculated using the inverse of the distance as weight wi, as shown in the following formula.
gj = (Σ _i wi * Pi)/(Σ _i wi)
Gj=gj/|gj|

すなわち、遠方の仮想視点における描画では描画サイズが小さくなるためテクスチャ品質の低下がわかりにくく、近い視点からはその逆であるため、この距離の逆数による重みwiにより近い視点に寄せて、近い視点でのテクスチャの方を優先的に利用させるようにすることができる。（当該利用は後段側の生成部23において行われる。）逆に近くでは動きボケなどが目立ってしまう場合には、距離の逆数ではなく距離を重みwiとして、遠い視点に寄せるようにしてもよく、遠すぎるとテクスチャが不足するため、一定値より遠距離では重みをそれ以上増やさず一定値に保ってもよい。 In other words, when drawing from a distant virtual viewpoint, the drawing size becomes smaller, making the degradation of texture quality less noticeable, and the opposite is true from a closer viewpoint, so the weight wi, which is the inverse of the distance, can be used to bring it closer to the closer viewpoint, so that the texture from the closer viewpoint is used preferentially. (This use is performed in the generation unit 23 at the subsequent stage.) Conversely, if motion blur is noticeable up close, the weight wi can be set to the distance rather than the inverse of the distance, and it can be used to bring it closer to a distant viewpoint, and because there is a lack of texture when it is too far away, the weight can be kept at a constant value without being increased any further at distances farther than a certain value.

図６の例では、６個の正規化座標P1～P6をクラスタリングした結果として２つのクラスタCL1={P1,P2,P3}とCL2={P4,P5,P6}が形成されたとし、１つ目のクラスタCL1={P1,P2,P3}から重心G1を算出している。（２つ目のクラスタCL2={P4,P5,P6}からも重心G2が同様に算出されるが、前述の通り不図示とする。） In the example of Figure 6, two clusters CL1 = {P1, P2, P3} and CL2 = {P4, P5, P6} are formed as a result of clustering six normalized coordinates P1 to P6, and the center of gravity G1 is calculated from the first cluster CL1 = {P1, P2, P3}. (The center of gravity G2 is calculated in the same way from the second cluster CL2 = {P4, P5, P6}, but as mentioned above, it is not shown in the figure.)

なお、正規化座標Piは元の仮想視点の座標piと対応しているので、正規化座標Piでのクラスタリング結果はそのまま、元の仮想視点の座標piのクラスタリング結果となる。（換言すれば、仮想視点の座標piをクラスタリングするための評価指標として、その正規化座標Piを用いている。）図６の例では以下の通りである。
CL1={P1,P2,P3}={p1,p2,p3}
CL2={P4,P5,P6}={p4,p5,p6} In addition, since the normalized coordinates Pi correspond to the coordinates pi of the original virtual viewpoint, the clustering result at the normalized coordinates Pi is the clustering result of the original virtual viewpoint coordinates pi as it is (in other words, the normalized coordinates Pi are used as an evaluation index for clustering the virtual viewpoint coordinates pi). In the example of FIG. 6, it is as follows.
CL1={P1,P2,P3}={p1,p2,p3}
CL2={P4,P5,P6}={p4,p5,p6}

続いて、対応する正規化座標Piを用いて上記クラスタリングされた仮想視点の座標piの中で最も注視点に近い距離Lj (j=1,2,…,Mc)をクラスタCLjごとに以下の式のように算出する。（前述の通り注視点を原点に設定しているため、座標piと注視点との距離は絶対値|pi|となる。）最短距離にすることで3次元モデルの模様（後段側の生成部23で描画する）の解像度を高く保持することができる効果が得られる。
Lj = min |pi|
pi ∈ Lj Next, the distance Lj (j=1, 2, ..., Mc) closest to the fixation point among the coordinates pi of the clustered virtual viewpoints is calculated for each cluster CLj using the corresponding normalized coordinates Pi according to the following formula: (As described above, since the fixation point is set to the origin, the distance between the coordinate pi and the fixation point is the absolute value |pi|.) By setting the distance as shortest, it is possible to obtain the effect of maintaining high resolution of the pattern of the 3D model (rendered by the generation unit 23 at the subsequent stage).
Lj = min |pi|
pi ∈ Lj

図６の例では、クラスタCL1={p1,p2,p3}の中で最短となるp3の距離がL1として算出される。 In the example of Figure 6, the distance of p3, which is the shortest in cluster CL1={p1, p2, p3}, is calculated as L1.

あるいは、距離Ljとして最短距離ではなく、クラスタごとの注視点との距離の平均値もしくは中央値を採用することもできる。この場合、視点の奥行きの違いによる相違を抑制できる効果が得られる。以上のように、最短、平均値、中央値等として代表距離Ljを定めればよい。 Alternatively, instead of the shortest distance, the average or median of the distance to the gaze point for each cluster can be used as the distance Lj. In this case, the effect of suppressing differences due to differences in the depth of the viewpoint can be obtained. As described above, the representative distance Lj can be determined as the shortest distance, average value, median value, etc.

最後に、各クラスタの代表距離Ljと対応するクラスタの重心座標Gjを以下の式のように乗算し生成部において生成する代表視点の座標Vjを出力する。前述の通り注視点を原点として定義しているので、以下のように乗算（ベクトルGjにスカラLjを乗算）することで、注視点から単位球SP上に正規化された重心Gj（|Gj|=1）に向かう方向に距離Ljだけ移動した位置として代表視点の座標Vjを算出することができる。
Vj = Lj*Gj Finally, the representative distance Lj of each cluster is multiplied by the center of gravity coordinate Gj of the corresponding cluster as shown in the following formula, and the coordinate Vj of the representative viewpoint generated in the generation unit is output. Since the gaze point is defined as the origin as described above, the following multiplication (vector Gj multiplied by scalar Lj) makes it possible to calculate the coordinate Vj of the representative viewpoint as a position moved by the distance Lj in the direction from the gaze point toward the center of gravity Gj (|Gj|=1) normalized on the unit sphere SP.
Vj = Lj*Gj

図６の例では、クラスタCL1={P1,P2,P3}={p1,p2,p3}の代表視点として、重心G1と中心点（注視点）を結ぶ延長線上でL1と同一距離（前述の通り、最短距離としての仮想視点p3の距離）を取る位置に、仮想視点V1を算出している。 In the example of Figure 6, a virtual viewpoint V1 is calculated as the representative viewpoint of cluster CL1 = {P1, P2, P3} = {p1, p2, p3} at a position on the extension line connecting the center of gravity G1 and the central point (point of gaze) at the same distance as L1 (the distance of virtual viewpoint p3 as the shortest distance, as mentioned above).

以上、図５のステップS2を説明した。ステップS3では、生成部23が、統合部22で得た各クラスタCLjの代表視点Vjを仮想カメラの位置とし、且つ、注視点を統合部22におけるものと同様に設定して仮想カメラの方向を設定して、撮像設備30で得た多視点画像MV(t)のテクスチャのうち、代表視点Vjの近傍にあると判定される物理カメラの画像のテクスチャを貼り付ける（３次元モデルMD(t)のポリゴン等の面要素に貼り付ける）ことによって構築部21で構築された対象OBの３次元モデルMD(t)を描画し、得られた代表視点Vjごとの描画結果Gjを変換部24へと出力してから、ステップS4へと進む。当該描画結果Gjはすなわち、代表視点Vjを仮想カメラ位置として画像平面上に３次元モデルMD(t)を描画したものである。 Step S2 in FIG. 5 has been described above. In step S3, the generation unit 23 sets the representative viewpoint Vj of each cluster CLj obtained by the integration unit 22 as the position of the virtual camera, and sets the gaze point in the same manner as in the integration unit 22 to set the direction of the virtual camera, and draws the three-dimensional model MD(t) of the target OB constructed by the construction unit 21 by pasting (pasting to surface elements such as polygons of the three-dimensional model MD(t)) the texture of the image of the physical camera determined to be in the vicinity of the representative viewpoint Vj among the textures of the multi-view image MV(t) obtained by the imaging equipment 30, and outputs the drawing result Gj for each representative viewpoint Vj obtained to the conversion unit 24, and then proceeds to step S4. The drawing result Gj is, in other words, a three-dimensional model MD(t) drawn on the image plane with the representative viewpoint Vj as the virtual camera position.

生成部23での当該テクスチャ描画には任意の既存手法を用いてよい。例えば、近傍判定される物理カメラが複数存在する場合、物理カメラの位置と代表視点Vjとの距離等に応じて複数の物理カメラのテクスチャをブレンド（距離などに基づく重みづけ和）したテクスチャを貼り付けてもよいし、ポリゴン等の面要素ごとに、代表視点Vjに最も近い物理カメラのテクスチャのみを貼り付けるようにしてもよい。例えば特許文献２など既存の手法を利用することができる。なお、物理カメラの個数Nが多数である場合には、代表視点Vjの近傍の物理カメラを選択する際に、物理カメラの座標をクラスタリングして代表点を算出しておき、代表点と仮想視点との距離を階層的に比較してもよい。最短距離の代表点を含むクラスタでさらに各物理カメラの距離を算出することで処理負荷を軽減する効果が得られる。 Any existing method may be used for the texture drawing in the generation unit 23. For example, when there are multiple physical cameras that are determined to be nearby, a texture obtained by blending the textures of the multiple physical cameras (weighted sum based on distance, etc.) according to the distance between the position of the physical camera and the representative viewpoint Vj, etc., may be applied, or only the texture of the physical camera closest to the representative viewpoint Vj may be applied for each surface element such as a polygon. For example, an existing method such as Patent Document 2 may be used. Note that when the number N of physical cameras is large, when selecting a physical camera near the representative viewpoint Vj, the coordinates of the physical cameras may be clustered to calculate representative points, and the distances between the representative points and the virtual viewpoints may be compared hierarchically. The effect of reducing the processing load is obtained by further calculating the distance of each physical camera in the cluster including the representative point with the shortest distance.

また、現時刻tにおいて代表視点Vjからの距離が一致する（閾値判定で同じ距離だと判定される）複数の物理カメラが存在する場合、過去時刻t-1の代表視点Vjでの近傍の物理カメラを利用することができる。（この際、現時刻tと過去時刻t-1とで代表視点Vjが異なる位置にある場合、過去時刻t-1の代表視点のうち、現時刻tの代表視点Vjに最も近いものを利用すればよい。） In addition, if there are multiple physical cameras with the same distance from the representative viewpoint Vj at the current time t (determined to be the same distance by a threshold), it is possible to use a nearby physical camera at the representative viewpoint Vj at past time t-1. (In this case, if the representative viewpoint Vj is in a different position at the current time t and past time t-1, it is sufficient to use the representative viewpoint from past time t-1 that is closest to the representative viewpoint Vj at the current time t.)

図６の例では、既存手法がテクスチャ画像を生成するため最寄りの物理カメラ1個を利用するとして、本発明の実施形態に対する対比例として、本発明の実施形態を適用しない方式を適用したと仮定すると、次のような処理が行われることとなる。すなわち、当該対比例では、仮想視点p1に物理カメラC1、仮想視点p2に物理カメラC2、仮想視点p3に物理カメラ３が割り当てられ、３つの各映像を読み込み展開することになるため、処理が増えてしまう。 In the example of Figure 6, if the existing method uses one of the nearest physical cameras to generate a texture image, and if a method that does not apply the embodiments of the present invention is applied as a comparison to the embodiments of the present invention, the following processing will be performed. That is, in this comparison, physical camera C1 is assigned to virtual viewpoint p1, physical camera C2 to virtual viewpoint p2, and physical camera 3 to virtual viewpoint p3, and each of the three images will be read and expanded, which increases the amount of processing.

一方、本発明の実施形態では、３つの仮想視点p1,p2,p3に対して、生成部23がこれら３つの代表視点としての仮想視点V1の最寄りの物理カメラC2だけを利用することから読み込むべき映像を対比例と比べて1/3に削減できる効果が得られる。すなわち、複数の仮想視点が類似する正規化座標を取る場合、統合部22でのクラスタリングの結果として最適な仮想視点にまとめてテクスチャ生成を行うことで処理負荷を低減することができるため、生成部23において撮像設備30の全カメラ（あるいは多数のカメラ）の撮像画像を参照する必要はなく、統合されてまとめられた代表視点Vjの近傍のカメラの撮像画像に限定することでテクスチャ生成時の画像読み込みにかかる処理時間を短縮できる効果が得られる。 On the other hand, in the embodiment of the present invention, for the three virtual viewpoints p1, p2, and p3, the generation unit 23 uses only the physical camera C2 closest to the virtual viewpoint V1 as the representative viewpoint of these three, which has the effect of reducing the amount of images to be read by one-third compared to the comparative example. In other words, when multiple virtual viewpoints have similar normalized coordinates, the processing load can be reduced by integrating the optimal virtual viewpoint as a result of clustering in the integration unit 22 and generating a texture, so that the generation unit 23 does not need to refer to images captured by all cameras (or multiple cameras) of the imaging equipment 30, and the processing time required for image reading during texture generation can be shortened by limiting the images to those captured by cameras near the integrated representative viewpoint Vj.

また、前述の代表距離Ljとして、指定された仮想視点座標に近い距離を反映することで品質を向上させる効果が得られる。さらに、クラスタリングの個数を逐次変更することで処理負荷の低減と品質の向上とのバランスをとることができる効果が得られる。あるいは、クラスタ内の仮想視点のばらつきに応じて、テクスチャ取得のために近傍判定して利用する物理カメラ台数を比例させることで品質を向上させる効果が得られる。 In addition, by reflecting a distance close to the specified virtual viewpoint coordinates as the aforementioned representative distance Lj, the effect of improving quality can be obtained. Furthermore, by successively changing the number of clusters, the effect of being able to balance between reducing the processing load and improving quality can be obtained. Alternatively, the effect of improving quality can be obtained by proportionally adjusting the number of physical cameras used for proximity determination to acquire textures according to the variation in virtual viewpoints within a cluster.

すなわち、クラスタCLjに属する仮想視点の座標piのばらつきを定量化した分散（あるいは、正規化座標Pjの分散でもよい）をvar_jとして、当該クラスタCLjについて代表視点Vjを仮想カメラの視点として３次元モデルMD(t)を描画する際に、テクスチャ取得で参照する近傍の物理カメラの個数を、当該分散var_jに比例する個数（あるいは、当該分散var_jが大きいほど多い個数）として、クラスタCLjごとに最適化された個数を設定するようにしてもよい。 In other words, the variance that quantifies the variation in the coordinates pi of the virtual viewpoints belonging to cluster CLj (or the variance of normalized coordinates Pj) is set as var_j, and when a three-dimensional model MD(t) is drawn for cluster CLj with the representative viewpoint Vj as the viewpoint of the virtual camera, the number of nearby physical cameras referenced for texture acquisition may be set to a number proportional to the variance var_j (or a larger number the larger the variance var_j), and an optimized number may be set for each cluster CLj.

ステップS4では、３次元モデルMD(t)の面要素の情報を参照することにより、生成部23でクラスタCLjごとに得た描画結果Gjを、変換部24が当該面要素ごとに変換（変形）することにより、M個の各端末10においてユーザが指定した仮想視点pi(i=1,2,…,M)における描画結果を得て、当該描画結果を対応する端末10の表示部12へと送信してから、ステップS5へと進む。 In step S4, by referring to the surface element information of the three-dimensional model MD(t), the conversion unit 24 converts (transforms) the rendering results Gj obtained for each cluster CLj by the generation unit 23 for each surface element, thereby obtaining rendering results at the virtual viewpoints pi (i = 1, 2, ..., M) specified by the user on each of the M terminals 10, and transmits the rendering results to the display units 12 of the corresponding terminals 10 before proceeding to step S5.

変換部24では、仮想視点piが所属するクラスタ（クラスタCLjとする）における描画結果Gjを、当該描画結果Gjを描画した仮想カメラ位置である代表座標Vjが仮想視点piの位置に変更されたものとして、描画結果Gjを構成する面要素ごとに平面射影変換等の変換処理TR（仮想カメラ位置をVjからpiに変更するのに対応する変換処理TR）を適用することで、仮想視点piでの描画結果を得ることができる。 The conversion unit 24 applies a conversion process TR such as planar projective transformation (conversion process TR corresponding to changing the virtual camera position from Vj to pi) to each surface element constituting the drawing result Gj of the cluster to which the virtual viewpoint pi belongs (cluster CLj) by assuming that the representative coordinate Vj, which is the virtual camera position used to draw the drawing result Gj, has been changed to the position of the virtual viewpoint pi, thereby obtaining the drawing result at the virtual viewpoint pi.

図７は、変換部24での上記変換処理の模式例を示す図である。代表視点Vjでの描画結果Gjは３次元モデルMD(t)を参照することで、複数のポリゴン等の面要素poly_k(k=1,2,…)で構成されている。これに変換TRを適用した面要素TR(poly_k)の集合として、仮想視点piでの描画結果TGi=TR(Gj)を得ることができる。なお、面要素poly_k（及びこれを変換した面要素TR(poly_k)）は一般に多数存在するため、図７ではそのうちの任意の１個のみを示し、模式図として下段側にはこれら面要素を拡大して別途に示すようにしてある。代表視点Vjで描画された２次元の面要素poly_kは、代表視点Vjでの３次元モデルMD(t)の３次元空間内の面要素3Dpoly_k（不図示）を透視投影したものであるため、当該３次元の面要素3Dpoly_kを仮想視点piに透視投影することで面要素TR(poly_k)の位置及び形状を得ることができる。この「poly_k⇒3D_poly_k⇒TR(poly_k)」の対応関係が変換TRであり、この変換TRをポリゴンごとの対応関係として記録しておき、テクスチャを変換してTR(poly_k)に割り当てる際は平面射影変換等でpoly_kのテクスチャを変形して割り当てればよい。 Figure 7 is a diagram showing a schematic example of the above conversion process in the conversion unit 24. The rendering result Gj at the representative viewpoint Vj is composed of multiple surface elements poly_k (k = 1, 2, ...) such as polygons by referring to the three-dimensional model MD (t). The rendering result TGi = TR (Gj) at the virtual viewpoint pi can be obtained as a set of surface elements TR (poly_k) to which the conversion TR is applied. Since there are generally many surface elements poly_k (and surface elements TR (poly_k) converted from it), only one of them is shown in Figure 7, and these surface elements are enlarged and shown separately at the bottom as a schematic diagram. The two-dimensional surface element poly_k rendered at the representative viewpoint Vj is a perspective projection of the surface element 3Dpoly_k (not shown) in the three-dimensional space of the three-dimensional model MD (t) at the representative viewpoint Vj, so the position and shape of the surface element TR (poly_k) can be obtained by perspective projecting the three-dimensional surface element 3Dpoly_k onto the virtual viewpoint pi. This correspondence between "poly_k ⇒ 3D_poly_k ⇒ TR(poly_k)" is the transformation TR, and this transformation TR is recorded as the correspondence for each polygon. When transforming a texture and assigning it to TR(poly_k), you can simply transform the texture of poly_k using a planar projection transformation or similar and assign it.

なお、代表視点Vjで３次元モデルMD(t)を描画する際には、オクルージョン（遮蔽）の発生しない前面側のみではなく、オクルージョンが発生する背面側のうち、当該前面側の近傍位置と判定（ポリゴン間の距離による閾値判定等で判定）される一部も含めて描画して、代表視点Vjでのポリゴンごとの描画結果として保持しておいてもよい。オクルージョン有無による前面側又は背面側の区別は、３次元コンピュータグラフィックスにおける任意の既存手法により、（例えば仮想カメラ位置に投影して最前面にあるか否かを判定する等により）ポリゴンごとに判定すればよい。代表視点Vjからの描画ではオクルージョンが発生するポリゴン3Dpoly_k（３次元モデルMD(t)のポリゴン）について、仮想視点piからはオクルージョンが発生せず前面側に現れる際に、代表視点Vjに関して当該オクルージョン領域も含めて描画しておいた結果を変換して、ポリゴンTR(poly_k)の描画結果を得ることができる。 When rendering the three-dimensional model MD(t) at the representative viewpoint Vj, not only the front side where no occlusion occurs, but also a part of the back side where occlusion occurs that is determined to be a position near the front side (determined by a threshold value determination based on the distance between polygons, etc.) may be rendered and stored as the rendering result for each polygon at the representative viewpoint Vj. The distinction between the front side and the back side based on the presence or absence of occlusion may be determined for each polygon using any existing method in three-dimensional computer graphics (for example, by projecting to the virtual camera position and determining whether it is at the forefront). For a polygon 3Dpoly_k (a polygon of the three-dimensional model MD(t)) that is occluded when rendered from the representative viewpoint Vj, when it appears on the front side without occlusion from the virtual viewpoint pi, the rendering result of the polygon TR(poly_k) can be obtained by converting the result of rendering including the occlusion area for the representative viewpoint Vj.

また、ステップS3及びS4の生成部23及び変換部24の処理の変形例として、ビルボード方式（３次元モデルMD(t)を1枚の矩形（ビルボード）として簡素化して生成するもの）を用いてもよい。この場合、代表視点Vjに対して正対しているビルボードを、仮想視点piでは正対させないようにしてもよいし、正対させてもよい。 As a modified example of the processing of the generation unit 23 and the conversion unit 24 in steps S3 and S4, a billboard method (a method in which the three-dimensional model MD(t) is simplified and generated as a single rectangle (billboard)) may be used. In this case, the billboard that faces the representative viewpoint Vj may not face the virtual viewpoint pi, or may face it.

ステップS5では端末10の表示部12（ハードウェアとしてのディスプレイ）においてステップS4で得た描画結果をユーザに対して表示することで、当該時刻tにおいてユーザ所望の仮想視点における自由視点映像の視聴を実現し、ステップS6へと進む。ステップS6は前述の通り時間更新を表すステップであり、現時刻tを次の現時刻t+1へと更新してステップS1に戻ることで、以上のステップS1～S5（ステップ群GS(t)）が次の現時刻t+1に対してステップ群SG(t+1)として同様に繰り返されることとなる。 In step S5, the rendering result obtained in step S4 is displayed to the user on the display unit 12 (display as hardware) of the terminal 10, thereby enabling the user to view the free viewpoint video from the virtual viewpoint desired by the user at that time t, and the process proceeds to step S6. As described above, step S6 is a step that represents a time update, and by updating the current time t to the next current time t+1 and returning to step S1, the above steps S1 to S5 (step group GS(t)) are repeated in the same manner as step group SG(t+1) for the next current time t+1.

以上、本発明の一実施形態によれば、複数の仮想視点を同一方向ごとに統合し最適な仮想視点を生成することにより3次元モデルのテクスチャ生成に必要な画像の読み込み及び展開の枚数を削減し、生成された3次元モデルとテクスチャを用いて本来の仮想視点から描画することで、描画情報を得ることができる。 As described above, according to one embodiment of the present invention, by integrating multiple virtual viewpoints for the same direction to generate an optimal virtual viewpoint, the number of images required to load and expand in order to generate the texture for a 3D model can be reduced, and the generated 3D model and texture can be used to draw from the original virtual viewpoint to obtain drawing information.

すなわち、本発明の一実施形態によれば、必要最低限の画像読み込み及び展開によりシステムの負荷を軽減することが可能となる。ここで、負荷の軽減により同時に処理できる端末数を増加させることも可能である。 In other words, according to one embodiment of the present invention, it is possible to reduce the load on the system by reading and extracting the minimum amount of images required. Here, by reducing the load, it is also possible to increase the number of terminals that can be processed simultaneously.

以下、種々の補足例、追加例、代替例などについて説明する。 Below, we explain various supplementary, additional, and alternative examples.

（１）本発明の実施形態によれば、その応用例として、遠隔地に存在する対象OB（例えばスポーツ試合中の選手）を自由視点映像として、臨場感を持って視聴することが可能となる。これにより、遠隔地への実際の移動を必ずしも必須とせずに、スポーツ試合等のコンテンツを視聴したり、あるいは、遠隔コミュニケーションの表示インタフェースとして用いて遠隔地の対象OBについてのアドバイス（例えばスポーツ上達のアドバイス）を行ったりすることが可能となり、ユーザ移動に必要となるエネルギー資源を節約することで二酸化炭素排出量を抑制できることから、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標１３「気候変動とその影響に立ち向かうため、緊急対策を取る」に貢献することが可能となる。 (1) According to an embodiment of the present invention, as an application example, it becomes possible to view a target OB (e.g., a player in a sports game) in a remote location as a free viewpoint video with a sense of realism. This makes it possible to view content such as a sports game without necessarily having to actually travel to the remote location, or to use it as a display interface for remote communication to give advice to a target OB in a remote location (e.g., advice on improving sports skills). By saving the energy resources required for user movement, carbon dioxide emissions can be reduced, which makes it possible to contribute to Goal 13 of the United Nations-led Sustainable Development Goals (SDGs), which is to "take urgent action to combat climate change and its impacts."

（２）端末10の数が多い場合、描画システム100をNs台（Nsは統合部22で得られたクラスタリング結果のクラスタ数）並列に配置する構成を利用できる際に前段の処理として統合部22を共通化することで、システム構成を簡略化するようにしてもよい。 (2) When the number of terminals 10 is large, it is possible to simplify the system configuration by sharing the integration unit 22 as a pre-stage process when a configuration in which Ns drawing systems 100 (Ns is the number of clusters resulting from the clustering obtained by the integration unit 22) are arranged in parallel can be used.

Ns台（Ns≧2）の並列配置は、統合部22でのクラスタリング結果のクラスタ数に応じて実現する。例えば図６の例ではNs=2であり、クラスタCL1は第１サーバ20-1（図４と同様の構築部21-1、生成部23-1及び変換部24-1を備える）で描画処理し、クラスタCL2は第２サーバ20-2（図４と同様の構築部21-2、生成部23-2及び変換部24-2を備える）で描画処理するようにしてよい。このように統合部22（及び撮像設備30）の処理は描画システム100内で共通化して実施し、以降の処理はクラスタリング結果のクラスタごとのサーバに割り振って並列に実施させることで、それぞれのサーバにおいて撮像設備30から読み込む画像の枚数を限定することができる。 The parallel arrangement of Ns units (Ns≧2) is realized according to the number of clusters resulting from the clustering in the integration unit 22. For example, in the example of FIG. 6, Ns=2, and the first server 20-1 (equipped with the construction unit 21-1, generation unit 23-1, and conversion unit 24-1 similar to those in FIG. 4) may perform the rendering process for cluster CL1, and the second server 20-2 (equipped with the construction unit 21-2, generation unit 23-2, and conversion unit 24-2 similar to those in FIG. 4) may perform the rendering process for cluster CL2. In this way, the processing of the integration unit 22 (and the imaging equipment 30) is performed in common within the rendering system 100, and the subsequent processing is assigned to the servers for each cluster resulting from the clustering and performed in parallel, thereby limiting the number of images read from the imaging equipment 30 in each server.

（３）端末10は光学シースルー型あるいはビデオシースルー型のヘッドマウントディスプレイ等の拡張現実表示を行うものとして構成し、変換部24から得た仮想視点での描画結果を、現実世界の景色に対して重畳したうえで、表示部12に表示するようにしてもよい。指定部11で指定する仮想視点は、当該ヘッドマウントディスプレイに備わるセンサ等で取得されるユーザの現実世界における視点と各時刻tで連動するものであってもよい。 (3) The terminal 10 may be configured to perform augmented reality display such as an optical see-through or video see-through head-mounted display, and the rendering result from the virtual viewpoint obtained from the conversion unit 24 may be superimposed on the scenery in the real world and displayed on the display unit 12. The virtual viewpoint specified by the specification unit 11 may be linked at each time t to the viewpoint in the real world of the user obtained by a sensor or the like provided in the head-mounted display.

（４）変換部4では仮想視点piでの３次元モデルMD(t)の描画結果に相当するものを、生成部23から得た、当該仮想視点piに「仮想視点pi∈クラスタCLj」の関係で対応する代表視点Vjでの描画結果Gjを変換して生成するが、この際、当該仮想視点piにおける背景の描画も行うことで、３次元モデルを当該背景に対して重畳させた描画結果を得るようにしてもよい。背景については３次元モデル（単純な平面などの描画負荷が少ないモデルが望ましい）を予め与えておき、仮想視点piにおいて描画すればよい。 (4) The conversion unit 4 generates the equivalent of the rendering result of the three-dimensional model MD(t) at the virtual viewpoint pi by converting the rendering result Gj at the representative viewpoint Vj corresponding to the virtual viewpoint pi in the relationship of "virtual viewpoint pi∈cluster CLj" obtained from the generation unit 23. At this time, the background at the virtual viewpoint pi may also be rendered to obtain a rendering result in which the three-dimensional model is superimposed on the background. A three-dimensional model (preferably a model with a low rendering load, such as a simple plane) is provided for the background in advance, and it is rendered at the virtual viewpoint pi.

（５）統合部22では、「対象OBの注視点」ごとに、当該注視点を原点に設定した仮想カメラの位置の正規化座標で定まる仮想カメラの方向に基づくクラスタリングを行った。この注視点は、予め描画対象として設定される１つ以上の対象OBのうちいずれが仮想カメラの視界内にあり、いずれが視界外であるかによって区別される注視シーンごとに設ければよい。図８は、対象OBの注視点の区別の例として、対象OB={OB1,OB2}と２つの対象が存在する場合の対象OBの注視点が2²=4通り存在しうることを示す図である。図８の例と同様に対象OBがn個で構成される場合、2ⁿ通りの注視点の区別が存在する。 (5) In the integration unit 22, for each "gazing point of the target OB", clustering was performed based on the direction of the virtual camera determined by the normalized coordinates of the position of the virtual camera set with the gazing point as the origin. This gazing point may be provided for each gazing scene that is distinguished according to which of one or more target OBs set in advance as a drawing target is within the field of view of the virtual camera and which is outside the field of view. FIG. 8 is a diagram showing an example of distinguishing the gazing points of the target OB, in the case where there are two targets, target OB={OB1, OB2}, that there can be 2 ² =4 different gazing points of the target OB. As in the example of FIG. 8, when the target OB is composed of n pieces, there are 2 ⁿ different distinctions of the gazing points.

図８では例えば仮想カメラC11はその画角範囲R11内に対象OB1のみが存在するので、注視点としては対象OB1の所定点（重心等）が設定され、仮想カメラC12はその画角範囲R12内に対象OB2のみが存在するので、注視点としては対象OB2の所定点が設定され、仮想カメラC13はその画角範囲R13内に対象OB1,OB2の両者が存在するので、注視点としては当該両対象OB1,OB2の所定点（両者の重心等）が設定され、仮想カメラC14はその画角範囲R14内に対象OB={OB1,OB2}が全く存在しないため、対象OBに関する描画処理が不要なものとして扱われる。 In FIG. 8, for example, for virtual camera C11, only object OB1 exists within its angle of view range R11, so a predetermined point of object OB1 (such as center of gravity) is set as the point of gaze, for virtual camera C12, only object OB2 exists within its angle of view range R12, so a predetermined point of object OB2 is set as the point of gaze, for virtual camera C13, both objects OB1 and OB2 exist within its angle of view range R13, so a predetermined point of both objects OB1 and OB2 (such as center of gravity of both objects) is set as the point of gaze, and for virtual camera C14, no object OB = {OB1, OB2} exists within its angle of view range R14, so rendering processing for object OB is treated as unnecessary.

図８の例では、＜１＞仮想カメラC11のように対象OB1のみが画角範囲にあるもの、＜２＞仮想カメラC12のように対象OB2のみが画角範囲にあるもの、＜３＞仮想カメラC13のように両対象OB1,OB2が画角範囲にあるもの、＜４＞仮想カメラC14のように対象OB={OB1,OB2}が画角範囲にないもの、の４通りに仮想視点を予めグループ分けしたうえで、当該分けられた４グループについてそれぞれ、図６を参照して説明したような統合部22によるクラスタリングを実施し、生成部23以降の処理も行うようにすればよい。＜４＞の例のように仮想カメラC14等にグループ分けされる仮想視点については、クラスタリング処理と対象OB={OB1,OB2}についての描画処理は不要となり、背景描画を行う場合は背景描画のみを行うようにすればよい。 In the example of FIG. 8, the virtual viewpoints are grouped in advance into four types: <1> virtual camera C11, where only target OB1 is in the field of view; <2> virtual camera C12, where only target OB2 is in the field of view; <3> virtual camera C13, where both targets OB1 and OB2 are in the field of view; and <4> virtual camera C14, where target OB={OB1, OB2} is not in the field of view. Then, for each of the four groups, clustering is performed by the integration unit 22 as described with reference to FIG. 6, and the processing after the generation unit 23 is also performed. For virtual viewpoints grouped into virtual camera C14, as in the example of <4>, clustering processing and rendering processing for target OB={OB1, OB2} are not required, and when background rendering is performed, only background rendering may be performed.

なお、３次元コンピュータグラフィックス等の分野において既知のように、仮想カメラについては３次元仮想空間内での位置pi及び視線方向di（カメラの外部パラメータに相当）を与えたうえで、予め設定されているカメラの内部パラメータの情報を利用することで、その画角範囲（図８の例におけるR11～R14等）が定まることとなる。構築部21にて３次元モデルMD(t)を構築する際に、当該モデル化された対象OBに何個の異なる対象が含まれているかの区別の情報も予め与えておくようにすればよい。１つ以上の対象がそれぞれの仮想カメラの画角範囲内にあるか否かの判定は、対象の少なくとも一部分が画角範囲内にあることで判定してもよいし、対象に少なくとも１つの代表点を設定しておき、代表点のうち少なくとも１つが画角範囲内にあることで判定してもよい。 As is known in the field of 3D computer graphics, the field of view range (R11 to R14 in the example of FIG. 8, etc.) of a virtual camera is determined by giving the position pi and line of sight di (corresponding to the external parameters of the camera) in the 3D virtual space and using information on the internal parameters of the camera that have been set in advance. When constructing the 3D model MD(t) in the construction unit 21, information on how many different objects are included in the modeled object OB may also be given in advance. The determination of whether one or more objects are within the field of view range of each virtual camera may be made by determining whether at least a part of the object is within the field of view range, or by setting at least one representative point on the object and determining whether at least one of the representative points is within the field of view range.

（６）図９は、一般的なコンピュータ装置70におけるハードウェア構成の例を示す図である。描画システム100を構成する端末10、サーバ20及び撮像設備30の各々は、このような構成を有する１台以上のコンピュータ装置70として実現可能である。なお、２台以上のコンピュータ装置70で端末10、サーバ20及び撮像設備30の各々を実現する場合、ネットワーク経由で処理に必要な情報の送受を行うようにしてよい。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサとしてのGPU（グラフィックス演算装置）72、CPU71（及びGPU72）にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77、カメラ78と、これらの間でデータを授受するためのバスBSと、を備える。 (6) FIG. 9 is a diagram showing an example of the hardware configuration of a general computer device 70. Each of the terminal 10, server 20, and imaging equipment 30 constituting the drawing system 100 can be realized as one or more computer devices 70 having such a configuration. When each of the terminal 10, server 20, and imaging equipment 30 is realized by two or more computer devices 70, information required for processing may be sent and received via a network. The computer device 70 includes a CPU (Central Processing Unit) 71 that executes predetermined instructions, a GPU (Graphics Processing Unit) 72 as a dedicated processor that executes some or all of the execution instructions of the CPU 71 in place of the CPU 71 or in cooperation with the CPU 71, a RAM 73 as a main storage device that provides a work area for the CPU 71 (and the GPU 72), a ROM 74 as an auxiliary storage device, a communication interface 75, a display 76, an input interface 77 that accepts user input via a mouse, keyboard, touch panel, etc., a camera 78, and a bus BS for transmitting and receiving data between them.

描画システム100の各機能部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又はGPU72によって実現することができる。なお、CPU71及びGPU72は共に、演算装置（プロセッサ）の一種である。ここで、表示関連の処理が行われる場合にはさらに、ディスプレイ76が連動して動作し、データ送受信に関する通信関連の処理が行われる場合にはさらに通信インタフェース75が連動して動作する。表示部12はディスプレイ76において実現し、撮像設備30の各カメラはカメラ78として実現してよい。 Each functional unit of the drawing system 100 can be realized by a CPU 71 and/or a GPU 72 that reads from a ROM 74 a predetermined program corresponding to the function of each unit and executes it. Both the CPU 71 and the GPU 72 are a type of computing device (processor). Here, when display-related processing is performed, the display 76 also operates in conjunction with the GPU 72, and when communication-related processing related to data transmission and reception is performed, the communication interface 75 also operates in conjunction with the GPU 72. The display unit 12 may be realized by the display 76, and each camera of the imaging equipment 30 may be realized as a camera 78.

100…描画システム、30…撮像設備、20…サーバ・描画装置、10…端末
11…指定部、12…表示部、21…構築部、22…統合部、23…生成部、24…変換部 100...drawing system, 30...imaging equipment, 20...server/drawing device, 10...terminal
11: Designation unit, 12: Display unit, 21: Construction unit, 22: Integration unit, 23: Generation unit, 24: Conversion unit

Claims

A construction unit that constructs a three-dimensional model from the multi-viewpoint images;
an integration unit that receives designation of a virtual viewpoint from each of a plurality of users and integrates the plurality of virtual viewpoints to obtain a representative viewpoint;
a generation unit that obtains a rendering result at a representative viewpoint by rendering a texture of the multi-viewpoint image on the three-dimensional model from a virtual camera position that is set at the representative viewpoint;
a conversion unit that converts a rendering result from the representative viewpoint into a rendering result from a virtual viewpoint designated by each of the plurality of users.

The rendering device according to claim 1, characterized in that the integration unit performs clustering to obtain the representative viewpoint that is composed of a number less than the number of the multiple virtual viewpoints.

The rendering device according to claim 2, characterized in that, in the integration unit, when clustering the multiple virtual viewpoints, the direction from the gaze point to each virtual viewpoint is used as a criterion for the clustering.

The drawing device according to claim 3, characterized in that the integration unit determines the direction by normalizing a vector from the gaze point toward each virtual viewpoint, and obtains a clustering result of the multiple virtual viewpoints by clustering the normalized vectors.

The drawing device according to claim 4, characterized in that the integration unit obtains a representative vector of the normalized vector for each cluster, and sets an end point that is a predetermined distance away from the gaze point in the direction of the representative vector as the representative viewpoint for that cluster.

The drawing device according to claim 5, characterized in that the integration unit determines the representative vector by calculating a weighted sum of the normalized vectors for each cluster, the weighted sum being determined by the magnitude of the vector before normalization.

The rendering device according to claim 5 or 6, characterized in that the integration unit uses the minimum, median or average of the distances between the gaze point and each virtual viewpoint in the cluster as the specified distance.

The rendering device according to any one of claims 3 to 7, characterized in that the integration unit sets the gaze point for each scene that is distinguished by which of one or more objects constituting the three-dimensional model is present and which is not present within a field of view range determined by the position of the virtual viewpoint and the line of sight direction received from each of the multiple users.

The rendering device according to claim 8, characterized in that the integration unit groups the virtual viewpoints for each of the distinguished scenes, and performs the clustering for the virtual viewpoints belonging to each group.

The rendering device according to any one of claims 3 to 9, characterized in that the integration unit sets the gaze point as a predetermined point of the three-dimensional model within a range of an angle of view determined by the positions of virtual viewpoints and line-of-sight directions received from each of a plurality of users.

The rendering device according to any one of claims 1 to 10, characterized in that the generation unit performs rendering using textures of only a portion of the multiple viewpoint images constituting the multi-view image, and determines, as the portion, a viewpoint image captured by a physical camera whose position is determined to be close to the representative viewpoint, among multiple physical cameras capturing the multi-view image.

The rendering device according to any one of claims 1 to 11, characterized in that the generation unit renders the three-dimensional model for the front side where no occlusion occurs in the three-dimensional model and the rear side near the front side where occlusion occurs, based on a virtual camera position set to the representative viewpoint.

The integration unit integrates a plurality of virtual viewpoints by performing clustering to obtain a representative viewpoint,
The rendering device according to any one of claims 1 to 12, characterized in that the generation unit performs rendering at a representative viewpoint using textures of only a portion of the multiple viewpoint images that constitute the multi-viewpoint image, and determines the number of images of only a portion to be used in accordance with the variation of virtual viewpoints belonging to a cluster corresponding to the representative viewpoint.

A program that causes a computer to function as a drawing device according to any one of claims 1 to 13.