JP2024051665A

JP2024051665A - Moving image synthesis system, moving image synthesis method, and program

Info

Publication number: JP2024051665A
Application number: JP2022157949A
Authority: JP
Inventors: 章五島; Akira Goshima
Original assignee: Konami Digital Entertainment Co Ltd
Current assignee: Konami Digital Entertainment Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2024-04-11

Abstract

To generate a natural synthetic moving image including a plurality of subjects existing in different spaces.SOLUTION: A moving image synthesis system 30 includes a moving image acquisition part 41 for acquiring a first moving image captured by a first imaging apparatus in a first real space and a second moving image captured by a second imaging apparatus in a second real space and an image processing part 42 for executing synthesis processing for generating a synthetic moving image V including the image of a first subject in the first moving image and the image of a second subject in the second moving image. The synthesis processing includes adjustment processing for adjusting the image of the first subject and the image of the second subject according to the first imaging distance of the first imaging apparatus and the second imaging distance of the second imaging apparatus.SELECTED DRAWING: Figure 4

Description

本開示は、複数の動画を合成する技術に関する。 This disclosure relates to technology for combining multiple videos.

個別に収録された複数の動画を合成する技術が従来から提案されている。例えば特許文献１には、相異なるカメラにより複数の動画を撮像し、複数の動画の各々から切取られたユーザの動画を所定の背景動画に合成する技術が開示されている。 Technology for combining multiple videos recorded separately has been proposed in the past. For example, Patent Document 1 discloses a technology for capturing multiple videos using different cameras, and combining a video of a user cut out from each of the multiple videos with a specified background video.

特許第６６２７８６１号公報Japanese Patent No. 6627861

特許文献１の技術において、被写体であるユーザとカメラとの間の撮像距離は、カメラ毎に相違し得る。したがって、特許文献１においては、例えば、被写体毎の撮像距離の相違が適切に反映されていない不自然な動画が生成されるという課題がある。以上の事情を考慮して、本開示のひとつの態様は、相異なる空間に所在する複数の被写体を含む自然な合成動画を生成することを目的とする。 In the technology of Patent Document 1, the imaging distance between the subject, the user, and the camera may differ for each camera. Therefore, Patent Document 1 has a problem in that, for example, an unnatural video is generated that does not properly reflect the difference in imaging distance for each subject. In consideration of the above circumstances, one aspect of the present disclosure aims to generate a natural composite video that includes multiple subjects located in different spaces.

以上の課題を解決するために、本開示のひとつの態様に係る動画合成システムは、第１現実空間内の第１撮像装置が撮像した第１動画と、第２現実空間内の第２撮像装置が撮像した第２動画とを取得する動画取得部と、前記第１動画における第１被写体の画像と前記第２動画における第２被写体の画像とを含む合成動画を生成する合成処理を実行する画像処理部とを具備し、前記合成処理は、前記第１撮像装置の第１撮像距離と前記第２撮像装置の第２撮像距離とに応じて前記第１被写体の画像と前記第２被写体の画像とを調整する調整処理を含む。 In order to solve the above problems, a video compositing system according to one aspect of the present disclosure includes a video acquisition unit that acquires a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space, and an image processing unit that executes a compositing process to generate a composite video including an image of a first subject in the first video and an image of a second subject in the second video, the compositing process including an adjustment process that adjusts the image of the first subject and the image of the second subject according to a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.

本開示のひとつの態様に係る動画合成方法は、第１現実空間内の第１撮像装置が撮像した第１動画と、第２現実空間内の第２撮像装置が撮像した第２動画とを取得し、前記第１動画における第１被写体の画像と前記第２動画における第２被写体の画像とを含む合成動画を生成する合成処理を実行し、前記合成処理は、前記第１撮像装置の第１撮像距離と前記第２撮像装置の第２撮像距離とに応じて前記第１被写体の画像と前記第２被写体の画像とを調整する調整処理を含む。 A video compositing method according to one aspect of the present disclosure acquires a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space, and performs a compositing process to generate a composite video including an image of a first subject in the first video and an image of a second subject in the second video, the compositing process including an adjustment process to adjust the image of the first subject and the image of the second subject according to a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.

本開示のひとつの態様に係るプログラムは、第１現実空間内の第１撮像装置が撮像した第１動画と、第２現実空間内の第２撮像装置が撮像した第２動画とを取得する動画取得部、および、前記第１動画における第１被写体の画像と前記第２動画における第２被写体の画像とを含む合成動画を生成する合成処理を実行する画像処理部、としてコンピュータシステムを機能させるプログラムであって、前記合成処理は、前記第１撮像装置の第１撮像距離と前記第２撮像装置の第２撮像距離とに応じて前記第１被写体の画像と前記第２被写体の画像とを調整する調整処理を含む。 A program according to one aspect of the present disclosure causes a computer system to function as a video acquisition unit that acquires a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space, and an image processing unit that executes a synthesis process to generate a synthetic video including an image of a first subject in the first video and an image of a second subject in the second video, the synthesis process including an adjustment process that adjusts the image of the first subject and the image of the second subject according to a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.

第１実施形態に係る動画収録システムの構成を例示するブロック図である。1 is a block diagram illustrating a configuration of a moving image recording system according to a first embodiment. 各撮像装置が撮像する動画の模式図である。3A to 3C are schematic diagrams of moving images captured by the imaging devices. 動画合成システムの構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a configuration of a moving image compositing system. 動画合成システムの機能的な構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of a moving image compositing system. 画像処理部の具体的な構成を例示するブロック図である。4 is a block diagram illustrating a specific configuration of an image processing unit. FIG. ぼかし処理の説明図である。FIG. 13 is an explanatory diagram of blurring processing. 撮像距離とぼかし量との関係を表す説明図である。11 is an explanatory diagram showing the relationship between an imaging distance and a blur amount. 合成動画の模式図である。FIG. 1 is a schematic diagram of a composite video. 合成動画の模式図である。FIG. 1 is a schematic diagram of a composite video. 合成動画の模式図である。FIG. 1 is a schematic diagram of a composite video. 動画合成システムの動作のフローチャートである。1 is a flowchart of the operation of the moving image synthesis system. 合成処理の具体的な手順を例示するフローチャートである。11 is a flowchart illustrating a specific procedure of a synthesis process. 第３実施形態における画像処理部のブロック図である。FIG. 13 is a block diagram of an image processing unit in the third embodiment. 第３実施形態における仮想動画の説明図である。FIG. 13 is an explanatory diagram of a virtual moving image in the third embodiment. 第３実施形態における合成処理のフローチャートである。13 is a flowchart of a synthesis process according to a third embodiment. 第３実施形態における合成動画の模式図である。13A to 13C are schematic diagrams of a composite moving image in the third embodiment. 第４実施形態における動画合成システムの機能的な構成を例示するブロック図である。FIG. 13 is a block diagram illustrating a functional configuration of a moving image compositing system according to a fourth embodiment. 変形例における撮像距離とぼかし量との関係を表す説明図である。13 is an explanatory diagram showing the relationship between the imaging distance and the blur amount in a modified example. 変形例における撮像距離とぼかし量との関係を表す説明図である。13 is an explanatory diagram showing the relationship between the imaging distance and the blur amount in a modified example. 変形例における撮像距離とぼかし量との関係を表す説明図である。13 is an explanatory diagram showing the relationship between the imaging distance and the blur amount in a modified example. 変形例における撮像距離とぼかし量との関係を表す説明図である。13 is an explanatory diagram showing the relationship between the imaging distance and the blur amount in a modified example.

図面を参照しながら本開示の実施の形態を説明する。以下に記載する実施の形態は、技術的に好適な種々の限定を含む。本開示の範囲は、以下に例示する形態には限定されない。 The embodiments of the present disclosure will be described with reference to the drawings. The embodiments described below include various technically suitable limitations. The scope of the present disclosure is not limited to the embodiments exemplified below.

［第１実施形態］
図１は、第１実施形態における動画収録システム１００の構成を例示するブロック図である。動画収録システム１００は、配信コンテンツＣを制作するためのコンピュータシステムである。配信コンテンツＣは、端末装置２００の利用者による視聴のために端末装置２００に配信される情報である。配信コンテンツＣは、例えば複数の対戦者がビデオゲームにより対戦する競技イベント（esports）の動画および音声で構成される。 [First embodiment]
1 is a block diagram illustrating the configuration of a video recording system 100 in the first embodiment. The video recording system 100 is a computer system for producing distribution content C. The distribution content C is information distributed to the terminal device 200 for viewing by a user of the terminal device 200. The distribution content C is composed of, for example, video and audio of a competitive event (esports) in which multiple competitors compete against each other in a video game.

端末装置２００は、例えばスマートフォン、タブレット端末またはパーソナルコンピュータ等の情報装置である。配信コンテンツＣは、例えばインターネット等の通信網を介して動画収録システム１００から端末装置２００に配信される。なお、図１においては便宜的に１個の端末装置２００のみが図示されているが、実際には複数の端末装置２００に対して配信コンテンツＣが配信される。 The terminal device 200 is an information device such as a smartphone, a tablet terminal, or a personal computer. The distribution content C is distributed from the video recording system 100 to the terminal device 200 via a communication network such as the Internet. Note that, for convenience, only one terminal device 200 is shown in FIG. 1, but in reality, the distribution content C is distributed to multiple terminal devices 200.

動画収録システム１００は、複数の収録システム２０-1～２０-3と動画合成システム３０とを具備する。各収録システム２０-n（ｎ＝１～３）は、通信網を介して動画合成システム３０と通信する。複数の収録システム２０-1～２０-3の各々は、相異なる収録スタジオＲnに設置される。各収録スタジオＲnは、相異なる現実の空間である。各収録スタジオＲnは、例えば相互に遠隔の地点に位置する。 The video recording system 100 comprises multiple recording systems 20-1 to 20-3 and a video compositing system 30. Each recording system 20-n (n = 1 to 3) communicates with the video compositing system 30 via a communication network. Each of the multiple recording systems 20-1 to 20-3 is installed in a different recording studio Rn. Each recording studio Rn is a different real space. Each recording studio Rn is located, for example, at a remote location from each other.

各収録スタジオＲnには、収録対象となる被写体Ｑnが所在する。被写体Ｑnは、例えば競技イベントの出場者または解説者等、配信コンテンツＣの出演者である。各収録スタジオＲnにおける被写体Ｑnの背景は、例えばグリーンバックまたはブルーバック等の特定色で構成される。 In each recording studio Rn, there is a subject Qn to be recorded. The subject Qn is a performer in the distribution content C, such as a contestant or commentator in a competitive event. The background of the subject Qn in each recording studio Rn is a specific color, such as a green back or a blue back.

図１に例示される通り、各収録システム２０-nは、撮像装置２１-nと収音装置２２-nと通信装置２３-nとを具備する。なお、収音装置２２-nおよび通信装置２３-nの一方または双方は、撮像装置２１-nに搭載されてもよい。 As illustrated in FIG. 1, each recording system 20-n includes an imaging device 21-n, a sound collecting device 22-n, and a communication device 23-n. Note that one or both of the sound collecting device 22-n and the communication device 23-n may be mounted on the imaging device 21-n.

撮像装置２１-nは、収録スタジオＲn内の動画Ｖnを生成するカメラである。各撮像装置２１-nは、例えば、撮影レンズ等の光学系と、光学系からの入射光を受光する撮像素子と、撮像素子による受光量に応じて動画Ｖnのデータを生成する処理回路とを具備する。なお、動画Ｖnを表すデータの形式は任意である。 The imaging device 21-n is a camera that generates a video Vn in a recording studio Rn. Each imaging device 21-n includes, for example, an optical system such as a photographing lens, an imaging element that receives incident light from the optical system, and a processing circuit that generates data for the video Vn according to the amount of light received by the imaging element. The format of the data representing the video Vn is arbitrary.

図２は、各撮像装置２１-nが生成する動画Ｖnの模式図である。撮像装置２１-1は、収録スタジオＲ1における被写体Ｑ1の撮像により動画Ｖ1を収録する。同様に、撮像装置２１-2は、収録スタジオＲ2における被写体Ｑ2の撮像により動画Ｖ2を収録し、撮像装置２１-3は、収録スタジオＲ3における被写体Ｑ3の撮像により動画Ｖ3を収録する。なお、各撮像装置２１-nは、光軸方向の広範囲にわたり実質的に合焦したパンフォーカスの状態で被写体Ｑnを撮像する。したがって、各被写体画像Ｇnには、撮像装置２１の合焦面から離間することに起因した光学的なぼけは実質的に発生しない。すなわち、各動画Ｖnにおける被写体Ｑnの画像（以下「被写体画像Ｇn」という）は、輪郭または境界が明瞭な画像である。 Figure 2 is a schematic diagram of the video Vn generated by each imaging device 21-n. The imaging device 21-1 records the video V1 by imaging the subject Q1 in the recording studio R1. Similarly, the imaging device 21-2 records the video V2 by imaging the subject Q2 in the recording studio R2, and the imaging device 21-3 records the video V3 by imaging the subject Q3 in the recording studio R3. Each imaging device 21-n captures the subject Qn in a pan-focus state in which the subject is substantially in focus over a wide range in the optical axis direction. Therefore, each subject image Gn does not substantially suffer from optical blurring due to being away from the focal plane of the imaging device 21. In other words, the image of the subject Qn in each video Vn (hereinafter referred to as the "subject image Gn") has a clear contour or boundary.

図１に例示される通り、撮像装置２１-n毎に撮像距離Ｄnは相違する。撮像距離Ｄnは、撮像装置２１-nと被写体Ｑnとの間の距離である。撮像距離Ｄnは、撮影レンズの表面または撮像素子の撮像面と、被写体Ｑnとの距離である。以下の説明においては、撮像距離Ｄ2が撮像距離Ｄ1を上回り、かつ、撮像距離Ｄ3が撮像距離Ｄ2を上回る場合を想定する（Ｄ1＜Ｄ2＜Ｄ3）。他方、例えば焦点距離または絞り値等、撮像距離Ｄn以外の撮像条件は、複数の撮像装置２１-1～２１-3において共通する。したがって、複数の被写体Ｑ1～Ｑ3の現実の身長が共通する場合でも、図２に例示される通り、各動画Ｖnにおける被写体画像Ｇnのサイズは、撮像距離Ｄnに応じて動画Ｖn毎に相違する。 As illustrated in FIG. 1, the imaging distance Dn differs for each imaging device 21-n. The imaging distance Dn is the distance between the imaging device 21-n and the subject Qn. The imaging distance Dn is the distance between the surface of the photographing lens or the imaging surface of the imaging element and the subject Qn. In the following explanation, it is assumed that the imaging distance D2 is greater than the imaging distance D1 and the imaging distance D3 is greater than the imaging distance D2 (D1<D2<D3). On the other hand, imaging conditions other than the imaging distance Dn, such as the focal length or the aperture value, are common to the multiple imaging devices 21-1 to 21-3. Therefore, even if the actual heights of the multiple subjects Q1 to Q3 are the same, the size of the subject image Gn in each video Vn differs for each video Vn according to the imaging distance Dn, as illustrated in FIG. 2.

図１の収音装置２２-nは、収録スタジオＲn内の音声Ａnを収録するマイクロホンである。音声Ａnは、例えば収録スタジオＲn内の被写体Ｑnが発音する音声である。具体的には、音声Ａnの波形を表すデータが収音装置２２-nにより生成される。なお、音声Ａnを表すデータの形式は任意である。 The sound collection device 22-n in FIG. 1 is a microphone that collects sound An in a recording studio Rn. Sound An is, for example, sound produced by a subject Qn in the recording studio Rn. Specifically, data representing the waveform of sound An is generated by the sound collection device 22-n. The format of the data representing sound An is arbitrary.

通信装置２３-nは、例えばインターネット等の通信網を介して動画合成システム３０と通信する。通信装置２３-nと動画合成システム３０との間の通信の経路は、有線区間または無線区間で構成される。具体的には、通信装置２３-nは、素材データＭnを動画合成システム３０に送信する。素材データＭnは、撮像装置２１-nが撮像した動画Ｖnと収音装置２２-nが収音した音声Ａnとを表すデータである。 The communication device 23-n communicates with the video compositing system 30 via a communication network such as the Internet. The communication path between the communication device 23-n and the video compositing system 30 is composed of a wired section or a wireless section. Specifically, the communication device 23-n transmits material data Mn to the video compositing system 30. The material data Mn is data representing the video Vn captured by the imaging device 21-n and the sound An collected by the sound collection device 22-n.

図３は、動画合成システム３０の構成を例示するブロック図である。動画合成システム３０は、配信コンテンツＣを生成および配信するコンピュータシステムである。動画合成システム３０は、例えばスマートフォン、タブレット端末またはパーソナルコンピュータ等の情報装置で実現される。なお、動画合成システム３０は、以上に例示した汎用の情報装置により実現されるほか、配信コンテンツＣの生成に専用される映像装置により実現されてもよい。 Figure 3 is a block diagram illustrating the configuration of a video compositing system 30. The video compositing system 30 is a computer system that generates and distributes distribution content C. The video compositing system 30 is realized by an information device such as a smartphone, a tablet terminal, or a personal computer. Note that the video compositing system 30 may be realized by a general-purpose information device such as those exemplified above, or by a video device dedicated to generating distribution content C.

動画合成システム３０は、制御装置３１と記憶装置３２と通信装置３３と操作装置３４と再生装置３５とを具備する。なお、動画合成システム３０は、単体の装置として実現されるほか、相互に別体で構成された複数の装置でも実現される。 The video compositing system 30 includes a control device 31, a storage device 32, a communication device 33, an operation device 34, and a playback device 35. Note that the video compositing system 30 may be realized as a single device, or may be realized as multiple devices configured separately from each other.

制御装置３１は、動画合成システム３０の各要素を制御する単数または複数のプロセッサである。具体的には、例えばＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＳＰＵ（Sound Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）、またはＡＳＩＣ（Application Specific Integrated Circuit）等の１種類以上のプロセッサにより、制御装置３１が構成される。 The control device 31 is a single or multiple processors that control each element of the video synthesis system 30. Specifically, the control device 31 is configured with one or more types of processors, such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an SPU (Sound Processing Unit), a DSP (Digital Signal Processor), an FPGA (Field Programmable Gate Array), or an ASIC (Application Specific Integrated Circuit).

記憶装置３２は、制御装置３１が実行するプログラムと、制御装置３１が使用する各種のデータとを記憶する単数または複数のメモリである。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置３２として利用される。なお、例えば、動画合成システム３０に対して着脱される可搬型の記録媒体、または、制御装置３１が通信網を介してアクセス可能な記録媒体（例えばクラウドストレージ）が、記憶装置３２として利用されてもよい。 The storage device 32 is a single or multiple memories that store the programs executed by the control device 31 and various data used by the control device 31. For example, a well-known recording medium such as a semiconductor recording medium or a magnetic recording medium, or a combination of multiple types of recording media, is used as the storage device 32. Note that, for example, a portable recording medium that is detachable from the video synthesis system 30, or a recording medium that the control device 31 can access via a communication network (e.g., cloud storage) may be used as the storage device 32.

通信装置３３は、通信網を介して端末装置２００および各収録システム２０-nと通信する。例えば、通信装置３３は、各収録システム２０-nが送信する素材データＭnを受信する。また、通信装置３３は、端末装置２００に対して配信コンテンツＣを送信する。なお、端末装置２００に対する配信コンテンツＣの配信は、動画収録システム１００とは別個の配信システムが実行してもよい。例えば、動画合成システム３０から送信された配信コンテンツＣが配信システムに保持され、配信システムから端末装置２００に対して配信コンテンツＣが配信されてもよい。また、配信コンテンツＣは、端末装置２００に配信されるほか、記憶装置３２等の記録媒体に記録されてもよい。すなわち、端末装置２００に対する配信は省略されてよい。 The communication device 33 communicates with the terminal device 200 and each recording system 20-n via a communication network. For example, the communication device 33 receives material data Mn transmitted by each recording system 20-n. The communication device 33 also transmits distribution content C to the terminal device 200. Note that the distribution of the distribution content C to the terminal device 200 may be performed by a distribution system separate from the video recording system 100. For example, the distribution content C transmitted from the video synthesis system 30 may be held in the distribution system, and the distribution content C may be distributed from the distribution system to the terminal device 200. Furthermore, the distribution content C may be recorded in a recording medium such as the storage device 32, in addition to being distributed to the terminal device 200. In other words, distribution to the terminal device 200 may be omitted.

操作装置３４は、動画収録システム１００の利用者による指示を受付ける入力機器である。動画収録システム１００の利用者は、例えば配信コンテンツＣの制作者である。例えば、利用者が操作する操作子、または、利用者による接触を検知するタッチパネルが、操作装置３４として利用される。なお、動画合成システム３０とは別体の操作装置３４が、動画合成システム３０に対して有線または無線により接続されてもよい。 The operation device 34 is an input device that accepts instructions from a user of the video recording system 100. The user of the video recording system 100 is, for example, the creator of the distribution content C. For example, a control operated by the user or a touch panel that detects contact by the user is used as the operation device 34. Note that the operation device 34, which is separate from the video compositing system 30, may be connected to the video compositing system 30 by wire or wirelessly.

再生装置３５は、制御装置３１による制御のもとで配信コンテンツＣを再生する。各収録スタジオＲnにける収録に並行して、配信コンテンツＣが再生装置３５により再生される。動画収録システム１００の利用者は、配信コンテンツＣを確認できる。具体的には、再生装置３５は、表示装置と放音装置とを具備する。表示装置は、配信コンテンツＣの動画（後述の合成動画Ｖ）を表示する。例えば液晶表示パネルまたは有機ＥＬ（Electroluminescence）パネル等の各種の表示パネルが、表示装置として利用される。放音装置は、配信コンテンツＣの音声（後述の合成音声Ａ）を放射する。例えばスピーカまたはヘッドホンが、放音装置として利用される。なお、動画合成システム３０とは別体の再生装置３５が、動画合成システム３０に対して有線または無線により接続されてもよい。 The playback device 35 plays the distribution content C under the control of the control device 31. In parallel with the recording in each recording studio Rn, the playback device 35 plays the distribution content C. Users of the video recording system 100 can check the distribution content C. Specifically, the playback device 35 includes a display device and a sound emitting device. The display device displays the video of the distribution content C (synthetic video V described below). For example, various display panels such as a liquid crystal display panel or an organic EL (Electroluminescence) panel are used as the display device. The sound emitting device emits the sound of the distribution content C (synthetic sound A described below). For example, a speaker or a headphone is used as the sound emitting device. Note that the playback device 35, which is separate from the video compositing system 30, may be connected to the video compositing system 30 by wire or wirelessly.

図４は、動画合成システム３０の機能的な構成を例示するブロック図である。制御装置３１は、記憶装置３２に記憶されたプログラムを実行することで、配信コンテンツＣを生成するための複数の機能（動画取得部４１、画像処理部４２、音声処理部４３および出力処理部４４）を実現する。なお、相互に別体で構成された複数の装置により制御装置３１の機能が実現されもよい。制御装置３１の機能の一部または全部が専用の電子回路で実現されてもよい。 Figure 4 is a block diagram illustrating an example of the functional configuration of the video synthesis system 30. The control device 31 executes a program stored in the storage device 32 to realize multiple functions (video acquisition unit 41, image processing unit 42, audio processing unit 43, and output processing unit 44) for generating the distribution content C. Note that the functions of the control device 31 may be realized by multiple devices configured separately from each other. Some or all of the functions of the control device 31 may be realized by a dedicated electronic circuit.

動画取得部４１は、複数の素材データＭ1～Ｍ3を取得する。具体的には、動画取得部４１は、各収録システム２０-nにより収録された動画Ｖnおよび音声Ａnを含む素材データＭnを、通信装置３３により収録システム２０-nから受信する。すなわち、動画取得部４１は、複数の動画Ｖ1～Ｖ3と複数の音声Ａ1～Ａ3とを取得する。各素材データＭnは、配信コンテンツＣの素材となるデータである。 The video acquisition unit 41 acquires multiple pieces of material data M1 to M3. Specifically, the video acquisition unit 41 receives material data Mn, including video Vn and audio An recorded by each recording system 20-n, from the recording system 20-n via the communication device 33. That is, the video acquisition unit 41 acquires multiple videos V1 to V3 and multiple audios A1 to A3. Each piece of material data Mn is data that will become the material for the distribution content C.

画像処理部４２は、合成処理を実行することで合成動画Ｖを生成する。合成処理は、動画取得部４１が取得した複数の動画Ｖ1～Ｖ3を合成する処理である。すなわち、合成動画Ｖは、図８から図１０に例示される通り、動画Ｖ1の被写体画像Ｇ1と動画Ｖ2の被写体画像Ｇ2と動画Ｖ3の被写体画像Ｇ3とを含む動画である。すなわち、合成処理は、複数の被写体画像Ｇ1～Ｇ3を合成する画像処理である。 The image processing unit 42 generates a composite video V by executing a compositing process. The compositing process is a process of compositing multiple videos V1 to V3 acquired by the video acquisition unit 41. That is, as illustrated in Figures 8 to 10, the composite video V is a video that includes subject image G1 of video V1, subject image G2 of video V2, and subject image G3 of video V3. That is, the compositing process is an image process that combines multiple subject images G1 to G3.

図４の音声処理部４３は、複数の音声Ａ1～Ａ3を混合することで合成音声Ａを生成する。各音声Ａ1～Ａ3の混合比は、例えば操作装置３４に対する利用者からの指示に応じて設定される。出力処理部４４は、合成動画Ｖと合成音声Ａとを含む配信コンテンツＣを生成する。第１実施形態の出力処理部４４は、配信コンテンツＣを通信装置３３から端末装置２００に配信する。 The audio processing unit 43 in FIG. 4 generates synthetic audio A by mixing multiple audios A1 to A3. The mixing ratio of each audio A1 to A3 is set, for example, according to an instruction from the user via the operation device 34. The output processing unit 44 generates distribution content C including the synthetic video V and the synthetic audio A. The output processing unit 44 in the first embodiment distributes the distribution content C from the communication device 33 to the terminal device 200.

図５は、画像処理部４２の具体的な構成を例示するブロック図である。図５に例示される通り、第１実施形態の画像処理部４２は、距離特定部４２１と被写体抽出部４２２と被写体選択部４２３と画像調整部４２４とを具備する。 Figure 5 is a block diagram illustrating a specific configuration of the image processing unit 42. As illustrated in Figure 5, the image processing unit 42 in the first embodiment includes a distance identification unit 421, a subject extraction unit 422, a subject selection unit 423, and an image adjustment unit 424.

距離特定部４２１は、複数の動画Ｖ1～Ｖ3の各々について撮像距離Ｄnを特定する。前述の通り、撮像距離Ｄnは、撮像装置２１-nと被写体Ｑnとの間の距離である。 The distance determination unit 421 determines the imaging distance Dn for each of the multiple videos V1 to V3. As described above, the imaging distance Dn is the distance between the imaging device 21-n and the subject Qn.

第１実施形態の距離特定部４２１は、動画Ｖn内の距離指標を検出することで撮像距離Ｄnを特定する。距離指標は、撮像距離Ｄnの特定のために各被写体Ｑnに事前に付加されたマーカーである。複数の被写体Ｑ1～Ｑ3には共通のサイズの距離指標が付加される。したがって、撮像距離Ｄnが大きいほど動画Ｖn内の距離指標のサイズは小さいという相関がある。以上の相関を利用して、距離特定部４２１は、各動画Ｖn内における距離指標のサイズに応じて撮像距離Ｄnを推定する。例えば、動画Ｖn内の距離指標のサイズが大きいほど撮像距離Ｄnが小さい数値となるように、距離特定部４２１は動画Ｖnの解析により撮像距離Ｄnを推定する。なお、距離指標が被写体Ｑnに直接的に付与される必要は必ずしもない。例えば、収録スタジオＲn内において被写体Ｑnの撮像距離Ｄnと同等の距離の地点に、距離指標が設置されてもよい。 The distance determination unit 421 in the first embodiment determines the imaging distance Dn by detecting a distance index in the video Vn. The distance index is a marker that is added to each subject Qn in advance to determine the imaging distance Dn. A distance index of a common size is added to multiple subjects Q1 to Q3. Therefore, there is a correlation in which the larger the imaging distance Dn, the smaller the size of the distance index in the video Vn. Using the above correlation, the distance determination unit 421 estimates the imaging distance Dn according to the size of the distance index in each video Vn. For example, the distance determination unit 421 estimates the imaging distance Dn by analyzing the video Vn so that the imaging distance Dn becomes a smaller value as the size of the distance index in the video Vn increases. Note that it is not necessarily necessary to directly assign the distance index to the subject Qn. For example, a distance index may be installed at a point in the recording studio Rn at a distance equivalent to the imaging distance Dn of the subject Qn.

被写体抽出部４２２は、複数の動画Ｖ1～Ｖ3の各々から被写体画像Ｇnを抽出する。具体的には、被写体抽出部４２２は、各動画Ｖnから特定色の背景（例えばグリーンバックまたはブルーバック）に対応する領域を除去することで、被写体画像Ｇnを抽出する。なお、被写体画像Ｇnの抽出済の動画Ｖnを動画取得部４１が取得する形態においては、被写体抽出部４２２は省略されてよい。 The subject extraction unit 422 extracts a subject image Gn from each of the multiple videos V1 to V3. Specifically, the subject extraction unit 422 extracts the subject image Gn by removing an area corresponding to a background of a specific color (e.g., a green back or a blue back) from each video Vn. Note that in a form in which the video acquisition unit 41 acquires a video Vn from which the subject image Gn has been extracted, the subject extraction unit 422 may be omitted.

被写体選択部４２３は、複数の被写体Ｑ1～Ｑ3の何れかを基準被写体Ｑrefとして選択する。基準被写体Ｑrefは、複数の被写体Ｑ1～Ｑ3のうち配信コンテンツＣの視聴者が注目すべき被写体Ｑnである。利用者は、再生装置３５が再生する配信コンテンツＣを視聴しながら操作装置３４を操作することで、複数の被写体Ｑ1～Ｑ3の何れかを指定する。被写体選択部４２３は、複数の被写体Ｑ1～Ｑ3のうち利用者が操作装置３４に対する操作により指定した被写体Ｑnを、基準被写体Ｑrefとして選択する。なお、被写体選択部４２３による基準被写体Ｑrefの選択は、複数の動画Ｖ1～Ｖ3の何れかの選択、または、複数の撮像装置２１-1～２１-3の何れかの選択とも換言される。 The subject selection unit 423 selects one of the multiple subjects Q1 to Q3 as the reference subject Qref. The reference subject Qref is a subject Qn among the multiple subjects Q1 to Q3 that the viewer of the distribution content C should pay attention to. The user specifies one of the multiple subjects Q1 to Q3 by operating the operation device 34 while watching the distribution content C played by the playback device 35. The subject selection unit 423 selects the subject Qn specified by the user by operating the operation device 34 among the multiple subjects Q1 to Q3 as the reference subject Qref. The selection of the reference subject Qref by the subject selection unit 423 can also be said as the selection of one of the multiple videos V1 to V3, or the selection of one of the multiple imaging devices 21-1 to 21-3.

画像調整部４２４は、調整処理を実行する。調整処理は、複数の被写体画像Ｇ1～Ｇ3の各々を撮像距離Ｄnに応じて調整する画像処理である。第１実施形態の調整処理は、ぼかし処理と重畳処理とを含む。 The image adjustment unit 424 executes an adjustment process. The adjustment process is image processing that adjusts each of the multiple subject images G1 to G3 according to the imaging distance Dn. The adjustment process in the first embodiment includes blurring and superimposition.

［ぼかし処理］
ぼかし処理は、各被写体画像Ｇnをぼかす加工処理である。すなわち、ぼかし処理により、被写体画像Ｇnの輪郭または境界は曖昧な状態に変化する。具体的には、ぼかし処理は、被写体画像Ｇnを構成する各画素の画素値を、当該画素を含む所定の範囲（以下「処理範囲」という）内における複数の画素値の平均値（例えば単純平均または加重平均）に置換するフィルタ処理である。処理範囲は、例えば置換対象となる１個の画素を中心とする矩形状の範囲である。処理範囲が大きいほど、被写体画像Ｇnがぼける程度は増加する。ぼかし処理は、撮像装置における光学系の合焦面から離間した被写体に光学的に発生するぼけを模擬する画像処理である。 [Blurring]
The blurring process is a processing process that blurs each object image Gn. That is, the blurring process changes the contour or boundary of the object image Gn to a vague state. Specifically, the blurring process is a filter process that replaces the pixel value of each pixel that constitutes the object image Gn with the average value (e.g., simple average or weighted average) of multiple pixel values within a predetermined range (hereinafter referred to as the "processing range") that includes the pixel. The processing range is, for example, a rectangular range centered on one pixel to be replaced. The larger the processing range, the greater the degree to which the object image Gn is blurred. The blurring process is an image process that simulates the blur that occurs optically in an object that is separated from the focal plane of the optical system in the imaging device.

図６は、ぼかし処理の説明図である。ぼかし処理においては、ぼかし量Ｂn（Ｂ1～Ｂ3）が制御される。ぼかし量Ｂnは、被写体画像Ｇnをぼかす程度を表す画像パラメータである。具体的には、ぼかし量Ｂnは、ぼかし処理における処理範囲のサイズを指定する非負値である。図６に例示される通り、ぼかし量Ｂnが大きいほど処理範囲は拡大し、結果的に被写体画像Ｇnがぼける程度は増加する。ぼかし量Ｂnのゼロは、被写体画像Ｇnをぼかさないことを意味する。 Figure 6 is an explanatory diagram of the blurring process. In the blurring process, the blur amount Bn (B1 to B3) is controlled. The blur amount Bn is an image parameter that indicates the degree to which the subject image Gn is blurred. Specifically, the blur amount Bn is a non-negative value that specifies the size of the processing range in the blurring process. As illustrated in Figure 6, the larger the blur amount Bn, the larger the processing range becomes, and as a result, the degree to which the subject image Gn is blurred increases. A blur amount Bn of zero means that the subject image Gn is not blurred.

第１実施形態の画像調整部４２４は、被写体画像Ｇn毎にぼかし量Ｂnを個別に制御する。具体的には、画像調整部４２４は、各被写体Ｑnの撮像距離Ｄnに応じて被写体画像Ｇnのぼかし量Ｂnを調整する。以上の説明から理解される通り、ぼかし処理は、各撮像距離Ｄnに応じて各被写体Ｑnの画像パラメータ（ぼかし量Ｂn）を調整する加工処理の一例である。 The image adjustment unit 424 in the first embodiment individually controls the blur amount Bn for each subject image Gn. Specifically, the image adjustment unit 424 adjusts the blur amount Bn of the subject image Gn according to the imaging distance Dn of each subject Qn. As can be understood from the above explanation, the blurring process is an example of a processing process that adjusts the image parameters (blur amount Bn) of each subject Qn according to each imaging distance Dn.

図７は、撮像距離Ｄnとぼかし量Ｂnとの関係を表す説明図である。図７の横軸は撮像距離Ｄnであり、縦軸はぼかし量Ｂnである。図７の基準値Ｄrefは、撮像距離Ｄnの基準となる数値である。具体的には、画像調整部４２４は、被写体選択部４２３が選択した基準被写体Ｑrefに対応する撮像距離Ｄnを基準値Ｄrefとして設定する。以上の通り、基準値Ｄrefは、複数の被写体Ｑ1～Ｑ3の何れか（基準被写体Ｑref）に対応する撮像距離Ｄnである。基準値Ｄrefは、撮像装置における光学系の合焦面に相当する。 Figure 7 is an explanatory diagram showing the relationship between the imaging distance Dn and the blur amount Bn. The horizontal axis of Figure 7 is the imaging distance Dn, and the vertical axis is the blur amount Bn. The reference value Dref in Figure 7 is a numerical value that serves as a reference for the imaging distance Dn. Specifically, the image adjustment unit 424 sets the imaging distance Dn corresponding to the reference subject Qref selected by the subject selection unit 423 as the reference value Dref. As described above, the reference value Dref is the imaging distance Dn that corresponds to one of the multiple subjects Q1 to Q3 (reference subject Qref). The reference value Dref corresponds to the focal plane of the optical system in the imaging device.

図７から理解される通り、画像調整部４２４は、撮像距離Ｄnが合焦範囲Ｐ0内の数値である場合にはぼかし量Ｂnをゼロに設定する。合焦範囲Ｐ0は、基準値Ｄrefを含む範囲である。例えば、基準値Ｄrefを中心とする所定幅の範囲が合焦範囲Ｐ0として設定される。合焦範囲Ｐ0は、現実の撮像装置において実質的に合焦していると見做せる被写界深度に相当する。 As can be seen from FIG. 7, the image adjustment unit 424 sets the blur amount Bn to zero when the imaging distance Dn is a value within the focusing range P0. The focusing range P0 is a range that includes the reference value Dref. For example, a range of a certain width centered on the reference value Dref is set as the focusing range P0. The focusing range P0 corresponds to the depth of field that can be considered to be substantially in focus in a real imaging device.

図７から理解される通り、画像調整部４２４は、合焦範囲Ｐ0の外側において、基準値Ｄrefと各被写体Ｑnの撮像距離Ｄnとの差異|Ｄref－Ｄn|が大きいほど、被写体画像Ｇnのぼかし量Ｂnを大きい数値に設定する。具体的には、ぼかし量Ｂnは、撮像距離Ｄnに対して直線的に変化する。 As can be seen from FIG. 7, outside the focusing range P0, the image adjustment unit 424 sets the blur amount Bn of the subject image Gn to a larger value as the difference |Dref-Dn| between the reference value Dref and the imaging distance Dn of each subject Qn increases. Specifically, the blur amount Bn changes linearly with the imaging distance Dn.

撮像距離Ｄnの数値として距離Ｄa1と距離Ｄa2とを想定する。距離Ｄa1および距離Ｄa2は、合焦範囲Ｐ0の下限値ｒaを下回る範囲Ｐa内の数値である。距離Ｄa2と基準値Ｄrefとの差異は、距離Ｄa1と基準値Ｄrefとの差異を上回る（|Ｄref－Ｄa2|＞|Ｄref－Ｄa1|）。画像調整部４２４は、撮像距離Ｄnが距離Ｄa1である場合に、被写体画像Ｇnのぼかし量Ｂnを設定値Ｂa1に設定する。他方、画像調整部４２４は、撮像距離Ｄnが距離Ｄa2である場合に、被写体画像Ｇnのぼかし量Ｂnを、設定値Ｂa1を上回る設定値Ｂa2に設定する。なお、距離Ｄa1は「第１距離」の一例であり、距離Ｄa2は「第２距離」の一例である。また、設定値Ｂa1は「第１設定値」の一例であり、設定値Ｂa2は「第２設定値」の一例である。 Assuming distances Da1 and Da2 as the numerical values of the imaging distance Dn. Distances Da1 and Da2 are numerical values within a range Pa below the lower limit value ra of the focusing range P0. The difference between distance Da2 and the reference value Dref exceeds the difference between distance Da1 and the reference value Dref (|Dref-Da2|>|Dref-Da1|). When imaging distance Dn is distance Da1, image adjustment unit 424 sets blur amount Bn of subject image Gn to set value Ba1. On the other hand, when imaging distance Dn is distance Da2, image adjustment unit 424 sets blur amount Bn of subject image Gn to set value Ba2 that exceeds set value Ba1. Note that distance Da1 is an example of the "first distance" and distance Da2 is an example of the "second distance". Also, set value Ba1 is an example of the "first set value" and set value Ba2 is an example of the "second set value".

同様に、撮像距離Ｄnの数値として距離Ｄb1と距離Ｄb2とを想定する。距離Ｄb1および距離Ｄb2は、合焦範囲Ｐ0の上限値ｒbを上回る範囲Ｐb内の数値である。距離Ｄb2と基準値Ｄrefとの差異は、距離Ｄb1と基準値Ｄrefとの差異を上回る（|Ｄref－Ｄb2|＞|Ｄref－Ｄb1|）。画像調整部４２４は、撮像距離Ｄnが距離Ｄb1である場合に、被写体画像Ｇnのぼかし量Ｂnを設定値Ｂb1に設定する。他方、画像調整部４２４は、撮像距離Ｄnが距離Ｄb2である場合に、被写体画像Ｇnのぼかし量Ｂnを、設定値Ｂb1を上回る設定値Ｂb2に設定する。なお、距離Ｄb1は「第１距離」の一例であり、距離Ｄb2は「第２距離」の一例である。また、設定値Ｂb1は「第１設定値」の一例であり、設定値Ｂb2は「第２設定値」の一例である。 Similarly, distances Db1 and Db2 are assumed as the numerical values of the imaging distance Dn. Distances Db1 and Db2 are numerical values within a range Pb that exceeds the upper limit value rb of the focusing range P0. The difference between distance Db2 and the reference value Dref exceeds the difference between distance Db1 and the reference value Dref (|Dref-Db2|>|Dref-Db1|). When the imaging distance Dn is distance Db1, the image adjustment unit 424 sets the blur amount Bn of the subject image Gn to a set value Bb1. On the other hand, when the imaging distance Dn is distance Db2, the image adjustment unit 424 sets the blur amount Bn of the subject image Gn to a set value Bb2 that exceeds the set value Bb1. Note that distance Db1 is an example of the "first distance" and distance Db2 is an example of the "second distance". Also, set value Bb1 is an example of the "first set value" and set value Bb2 is an example of the "second set value".

図８から図１０は、各被写体画像Ｇnのぼかし量Ｂnに着目した合成動画Ｖの模式図である。図８から図１０においては、複数の被写体画像Ｇ1～Ｇ3が合成された合成動画Ｖが例示されている。 Figures 8 to 10 are schematic diagrams of a composite video V that focuses on the blur amount Bn of each subject image Gn. Figures 8 to 10 show an example of a composite video V in which multiple subject images G1 to G3 are composited.

図８においては、被写体Ｑ2が基準被写体Ｑrefとして選択された場合が想定されている。被写体Ｑ2の撮像距離Ｄ2が基準値Ｄrefに設定され、結果的に被写体画像Ｇ2のぼかし量Ｂ2はゼロに設定される。すなわち、被写体画像Ｇ2は輪郭または境界が明瞭な状態に維持される。他方、被写体Ｑ1の撮像距離Ｄ1は合焦範囲Ｐ0の下限値ｒaを下回る。撮像距離Ｄ1に対応するぼかし量Ｂ1が被写体画像Ｇ1のぼかし処理に適用される結果、合成動画Ｖにおける被写体画像Ｇ1は被写体画像Ｇ2と比較してぼけた画像となる。同様に、被写体Ｑ3の撮像距離Ｄ3は合焦範囲Ｐ0の上限値ｒbを上回る。撮像距離Ｄ3に対応するぼかし量Ｂ3が被写体画像Ｇ3のぼかし処理に適用される結果、合成動画Ｖにおける被写体画像Ｇ3は被写体画像Ｇ2と比較してぼけた画像となる。以上の通り、図８の合成動画Ｖは、被写体Ｑ2に合焦した状態で撮像された動画のように知覚される。すなわち、被写体Ｑ1には前ぼけが付与され、被写体Ｑ3には後ぼけが付与される。したがって、配信コンテンツＣの視聴者は、被写体Ｑ2に注目し易い。 In FIG. 8, it is assumed that the subject Q2 is selected as the reference subject Qref. The imaging distance D2 of the subject Q2 is set to the reference value Dref, and as a result, the blur amount B2 of the subject image G2 is set to zero. That is, the contour or boundary of the subject image G2 is maintained in a clear state. On the other hand, the imaging distance D1 of the subject Q1 is below the lower limit ra of the focusing range P0. As a result of the blur amount B1 corresponding to the imaging distance D1 being applied to the blurring process of the subject image G1, the subject image G1 in the composite video V becomes a blurred image compared to the subject image G2. Similarly, the imaging distance D3 of the subject Q3 exceeds the upper limit rb of the focusing range P0. As a result of the blur amount B3 corresponding to the imaging distance D3 being applied to the blurring process of the subject image G3, the subject image G3 in the composite video V becomes a blurred image compared to the subject image G2. As described above, the composite video V in FIG. 8 is perceived as a video captured in a state where the subject Q2 is in focus. That is, subject Q1 is given a foreground blur, and subject Q3 is given a background blur. Therefore, viewers of distributed content C are likely to focus on subject Q2.

図９においては、被写体Ｑ1が基準被写体Ｑrefとして選択された場合が想定されている。したがって、被写体画像Ｇ1は輪郭または境界が明瞭な状態に維持され、被写体画像Ｇ2および被写体画像Ｇ3はぼけた画像となる。撮像距離Ｄ3は範囲Ｐb内で撮像距離Ｄ2を上回るから、被写体画像Ｇ3のぼかし量Ｂ3は被写体画像Ｇ2のぼかし量Ｂ2を上回る。すなわち、被写体画像Ｇ3は被写体画像Ｇ2と比較してぼけた画像となる。以上の通り、被写体画像Ｇ1は他の被写体画像Ｇn（Ｇ2，Ｇ3）と比較して明瞭に表示されるから、配信コンテンツＣの視聴者は被写体Ｑ1に注目し易い。 In FIG. 9, it is assumed that subject Q1 is selected as the reference subject Qref. Therefore, subject image G1 maintains a clear outline or boundary, while subject images G2 and G3 are blurred. Because imaging distance D3 exceeds imaging distance D2 within range Pb, the blur amount B3 of subject image G3 exceeds the blur amount B2 of subject image G2. In other words, subject image G3 is a blurred image compared to subject image G2. As described above, subject image G1 is displayed clearly compared to the other subject images Gn (G2, G3), so viewers of distributed content C are likely to focus on subject Q1.

図１０においては、被写体Ｑ3が基準被写体Ｑrefとして選択された場合が想定されている。したがって、被写体画像Ｇ3は輪郭または境界が明瞭な状態に維持され、被写体画像Ｇ1および被写体画像Ｇ2はぼけた画像となる。撮像距離Ｄ1は範囲Ｐa内で撮像距離Ｄ2を下回るから、被写体画像Ｇ1のぼかし量Ｂ1は被写体画像Ｇ2のぼかし量Ｂ2を上回る。すなわち、被写体画像Ｇ1は被写体画像Ｇ2と比較してぼけた画像となる。以上の通り、被写体画像Ｇ3は他の被写体画像Ｇn（Ｇ1，Ｇ2）と比較して明瞭に表示されるから、配信コンテンツＣの視聴者は被写体Ｑ3に注目し易い。 In FIG. 10, it is assumed that subject Q3 is selected as the reference subject Qref. Therefore, the contour or boundary of subject image G3 is maintained in a clear state, and subject images G1 and G2 are blurred images. Because imaging distance D1 is lower than imaging distance D2 within range Pa, the blur amount B1 of subject image G1 exceeds the blur amount B2 of subject image G2. In other words, subject image G1 is a blurred image compared to subject image G2. As described above, subject image G3 is displayed clearly compared to the other subject images Gn (G1, G2), so viewers of distributed content C are likely to focus on subject Q3.

［重畳処理］
重畳処理は、複数の被写体画像Ｇ1～Ｇ3を相互に重畳する画像処理である。重畳処理において、画像調整部４２４は、複数の被写体画像Ｇ1～Ｇ3の前後を、各撮像距離Ｄnに応じて制御する。具体的には、撮像距離Ｄnが大きいほど被写体画像Ｇnが奥側に位置するように、各被写体画像Ｇnの前後が調整される。 [Superimposition processing]
The superimposition process is an image process in which multiple subject images G1 to G3 are superimposed on each other. In the superimposition process, the image adjustment unit 424 controls the front and rear of the multiple subject images G1 to G3 according to each imaging distance Dn. Specifically, the front and rear of each subject image Gn are adjusted so that the subject image Gn is positioned further back as the imaging distance Dn increases.

例えば、図８から図１０においては、被写体画像Ｇ1と被写体画像Ｇ2とが部分的に重複し、被写体画像Ｇ2と被写体画像Ｇ3とが部分的に重複する場合が想定されている。前述の通り、撮像距離Ｄ1は撮像距離Ｄ2を下回る。したがって、画像調整部４２４は、重畳処理において、被写体画像Ｇ1が被写体画像Ｇ2の手前に位置するように各被写体画像Ｇnを重畳する。すなわち、被写体画像Ｇ2のうち被写体画像Ｇ1と重複する部分は、被写体画像Ｇ1の背後に隠れる。 For example, in Figures 8 to 10, it is assumed that subject image G1 and subject image G2 partially overlap, and subject image G2 and subject image G3 partially overlap. As described above, imaging distance D1 is less than imaging distance D2. Therefore, in the superimposition process, image adjustment unit 424 superimposes each subject image Gn so that subject image G1 is located in front of subject image G2. In other words, the portion of subject image G2 that overlaps with subject image G1 is hidden behind subject image G1.

また、撮像距離Ｄ2は撮像距離Ｄ3を下回る。したがって、画像調整部４２４は、重畳処理において、被写体画像Ｇ2が被写体画像Ｇ3の手前に位置するように各被写体画像Ｇnを重畳する。すなわち、被写体画像Ｇ3のうち被写体画像Ｇ2と重複する部分は、被写体画像Ｇ2の背後に隠れる。図８から図１０の例示の通り、第１実施形態においては、各被写体画像Ｇnの前後が撮像距離Ｄnに応じて制御される。したがって、各被写体Ｑnの現実の位置が各被写体画像Ｇnの前後に反映された自然な合成動画Ｖを生成できる。 In addition, the imaging distance D2 is less than the imaging distance D3. Therefore, in the superimposition process, the image adjustment unit 424 superimposes each subject image Gn so that the subject image G2 is located in front of the subject image G3. That is, the portion of the subject image G3 that overlaps with the subject image G2 is hidden behind the subject image G2. As illustrated in the examples of Figures 8 to 10, in the first embodiment, the front and back of each subject image Gn are controlled according to the imaging distance Dn. Therefore, a natural composite video V can be generated in which the actual position of each subject Qn is reflected in the front and back of each subject image Gn.

以上に説明した通り、画像調整部４２４が実行するぼかし処理および重畳処理は、各撮像距離Ｄnに応じて各被写体画像Ｇnを調整する調整処理の例示である。いま、被写体Ｑn1と被写体Ｑn2とに着目する（ｎ1＝１～３，ｎ2＝１～３，ｎ1≠ｎ2）。調整処理は、撮像装置２１-n1の撮像距離Ｄn1と撮像装置２１-n2の撮像距離Ｄn2とに応じて被写体画像Ｇn1と被写体画像Ｇn2とを調整する処理として包括的に表現される。 As described above, the blurring and superimposing processes performed by the image adjustment unit 424 are examples of adjustment processes that adjust each subject image Gn according to each imaging distance Dn. Now, focus on subjects Qn1 and Qn2 (n1 = 1 to 3, n2 = 1 to 3, n1 ≠ n2). The adjustment processes are comprehensively expressed as processes that adjust subject images Gn1 and Gn2 according to the imaging distances Dn1 and Dn2 of the imaging devices 21-n1 and 21-n2.

撮像装置２１-n1は「第１撮像装置」の一例であり、撮像距離Ｄn1は「第１撮像距離」の一例である。撮像装置２１-n2は「第２撮像装置」の一例であり、撮像距離Ｄn2は「第２撮像距離」の一例である。また、被写体画像Ｇn1は「第１被写体の画像」の一例であり、被写体画像Ｇn2は「第２被写体の画像」の一例である。動画Ｖn1は「第１動画」の一例であり、動画Ｖn2は「第２動画」の一例である。収録スタジオＲn1は「第１現実空間」の一例であり、収録スタジオＲn2は「第２現実空間」の一例である。 The imaging device 21-n1 is an example of a "first imaging device," and the imaging distance Dn1 is an example of a "first imaging distance." The imaging device 21-n2 is an example of a "second imaging device," and the imaging distance Dn2 is an example of a "second imaging distance." Furthermore, the subject image Gn1 is an example of an "image of a first subject," and the subject image Gn2 is an example of an "image of a second subject." The video Vn1 is an example of a "first video," and the video Vn2 is an example of a "second video." The recording studio Rn1 is an example of a "first real space," and the recording studio Rn2 is an example of a "second real space."

図１１は、動画合成システム３０の動作のフローチャートである。図１１の動作は、例えば操作装置３４に対する利用者からの指示を契機として開始され、以降は所定の周期で反復される。 Figure 11 is a flowchart of the operation of the video synthesis system 30. The operation of Figure 11 is initiated, for example, in response to an instruction from a user via the operation device 34, and is thereafter repeated at a predetermined interval.

制御装置３１（動画取得部４１）は、複数の素材データＭ1～Ｍ3を取得する（Ｓ1）。制御装置３１（画像処理部４２）は、合成処理Ｓ2を実行することで合成動画Ｖを生成する。また、制御装置３１（音声処理部４３）は、複数の音声Ａ1～Ａ3を混合することで合成音声Ａを生成する（Ｓ3）。なお、合成動画Ｖの生成（Ｓ2）と合成音声Ａの生成（Ｓ3）との順序は反転されてもよい。制御装置３１（出力処理部４４）は、合成動画Ｖと合成音声Ａとを含む配信コンテンツＣを生成し（Ｓ4）、配信コンテンツＣを端末装置２００に配信する（Ｓ5）。 The control device 31 (video acquisition unit 41) acquires multiple pieces of material data M1 to M3 (S1). The control device 31 (image processing unit 42) generates a composite video V by executing a synthesis process S2. The control device 31 (audio processing unit 43) also generates a composite voice A by mixing multiple voices A1 to A3 (S3). Note that the order of generating the composite video V (S2) and generating the composite voice A (S3) may be reversed. The control device 31 (output processing unit 44) generates a distribution content C including the composite video V and the synthetic voice A (S4), and distributes the distribution content C to the terminal device 200 (S5).

図１２は、図１１における合成処理Ｓ2のフローチャートである。合成処理Ｓ2が開始されると、制御装置３１（距離特定部４２１）は、複数の動画Ｖ1～Ｖ3の各々について撮像距離Ｄnを特定する（Ｓ21）。制御装置３１（被写体抽出部４２２）は、複数の動画Ｖ1～Ｖ3の各々から被写体画像Ｇnを抽出する（Ｓ22）。なお、撮像距離Ｄnの特定（Ｓ21）と被写体画像Ｇnの抽出（Ｓ22）との順序は反転されてもよい。 Figure 12 is a flowchart of the synthesis process S2 in Figure 11. When the synthesis process S2 is started, the control device 31 (distance determination unit 421) determines the imaging distance Dn for each of the multiple videos V1 to V3 (S21). The control device 31 (subject extraction unit 422) extracts a subject image Gn from each of the multiple videos V1 to V3 (S22). Note that the order of determining the imaging distance Dn (S21) and extracting the subject image Gn (S22) may be reversed.

制御装置３１（被写体選択部４２３）は、複数の被写体Ｑ1～Ｑ3の何れかを基準被写体Ｑrefとして選択する（Ｓ23）。具体的には、操作装置３４に対する操作で指定された被写体Ｑnが基準被写体Ｑrefとして選択される。なお、利用者は、合成処理Ｓ2の過程における任意の時点で所望の被写体Ｑnを基準被写体Ｑrefとして指定できる。したがって、配信コンテンツＣの再生中の任意の時点において、基準被写体Ｑrefは変更され得る。 The control device 31 (subject selection unit 423) selects one of the multiple subjects Q1 to Q3 as the reference subject Qref (S23). Specifically, the subject Qn specified by operating the operation device 34 is selected as the reference subject Qref. Note that the user can specify the desired subject Qn as the reference subject Qref at any point during the synthesis process S2. Therefore, the reference subject Qref can be changed at any point during the playback of the distribution content C.

制御装置３１（画像調整部４２４）は、調整処理を実行する（Ｓ24，Ｓ25）。具体的には、制御装置３１は、ぼかし処理Ｓ24と重畳処理Ｓ25とを実行することで、合成動画Ｖを生成する。すなわち、合成処理Ｓ2は、各撮像距離Ｄnに応じて各被写体画像Ｇnを調整する調整処理（Ｓ24，Ｓ25）を含む。 The control device 31 (image adjustment unit 424) executes adjustment processing (S24, S25). Specifically, the control device 31 executes blurring processing S24 and superimposition processing S25 to generate a composite video V. That is, the composite processing S2 includes adjustment processing (S24, S25) that adjusts each subject image Gn according to each imaging distance Dn.

以上に説明した通り、第１実施形態においては、複数の動画Ｖ1～Ｖ3を合成する合成処理Ｓ2において、各撮像距離Ｄnに応じて被写体画像Ｇnを調整する調整処理が実行される。すなわち、図８から図１０の例示の通り、各撮像距離Ｄnの関係が合成動画Ｖにおける各被写体画像Ｇnの関係に反映される。したがって、相異なる収録スタジオＲnに所在する複数の被写体Ｑ1～Ｑ3を含む自然な合成動画Ｖを生成できる。第１実施形態においては特に、各被写体Ｑnの画像パラメータであるぼかし量Ｂnが撮像距離Ｄnに応じて調整される。したがって、複数の被写体Ｑ1～Ｑ3を含む自然な合成動画Ｖを生成できるという効果は格別に顕著である。 As described above, in the first embodiment, in the synthesis process S2 that synthesizes multiple videos V1 to V3, an adjustment process is performed to adjust the subject image Gn according to each imaging distance Dn. That is, as shown in the examples of Figures 8 to 10, the relationship between each imaging distance Dn is reflected in the relationship between each subject image Gn in the synthetic video V. Therefore, a natural synthetic video V including multiple subjects Q1 to Q3 located in different recording studios Rn can be generated. In particular, in the first embodiment, the blur amount Bn, which is an image parameter of each subject Qn, is adjusted according to the imaging distance Dn. Therefore, the effect of being able to generate a natural synthetic video V including multiple subjects Q1 to Q3 is particularly remarkable.

第１実施形態においては特に、各被写体画像Ｇnにおけるぼかし量Ｂnが撮像距離Ｄnに応じて相違するようにぼかし処理Ｓ24が実行される。したがって、撮像距離Ｄnに応じて光学的なぼけの度合が変化する現実の撮像の傾向が模擬された自然な合成動画Ｖを生成できる。 In particular, in the first embodiment, the blurring process S24 is performed so that the blurring amount Bn in each subject image Gn differs depending on the imaging distance Dn. Therefore, a natural composite video V can be generated that mimics the tendency of real imaging, in which the degree of optical blur changes depending on the imaging distance Dn.

また、撮像距離Ｄnとの基準値Ｄrefとの差異が増加するほど、被写体画像Ｇnのぼかし処理Ｓ24に適用されるぼかし量Ｂnが増加する。したがって、基準値Ｄrefに対応する地点に位置する合焦面から奥行方向（前後方向）に離間するほど被写体の光学的なぼけが増加する、という現実の撮像の傾向が忠実に模擬された自然な合成動画Ｖを生成できる。 In addition, the greater the difference between the imaging distance Dn and the reference value Dref, the greater the blur amount Bn applied to the blurring process S24 of the subject image Gn. Therefore, a natural composite video V can be generated that faithfully mimics the tendency of real imaging, in which the optical blur of the subject increases the further away in the depth direction (front-back direction) it is from the focal plane located at the point corresponding to the reference value Dref.

第１実施形態においては、複数の被写体Ｑ1～Ｑ3の何れか（基準被写体Ｑref）に対応する撮像距離Ｄnを基準値Ｄrefとして各被写体画像Ｇnのぼかし量Ｂnが設定される。したがって、基準被写体Ｑrefを基準として各被写体画像Ｇnのぼかし量Ｂnが設定された自然な合成動画Ｖを生成できる。また、複数の被写体Ｑ1～Ｑ3のうち基準被写体Ｑrefが特に注目され易い合成動画Ｖを生成できる。 In the first embodiment, the blur amount Bn of each subject image Gn is set with the imaging distance Dn corresponding to one of the multiple subjects Q1 to Q3 (reference subject Qref) as the reference value Dref. Therefore, a natural composite video V can be generated in which the blur amount Bn of each subject image Gn is set with reference to the reference subject Qref. Also, a composite video V can be generated in which the reference subject Qref is particularly likely to attract attention among the multiple subjects Q1 to Q3.

［第２実施形態］
第２実施形態を説明する。なお、以下に例示する各態様において機能が第１実施形態と同様である要素については、第１実施形態の説明と同様の符号を流用して各々の詳細な説明を適宜に省略する。 [Second embodiment]
A second embodiment will be described. Note that, for elements having the same functions as those in the first embodiment in each aspect exemplified below, the same reference numerals as those in the first embodiment will be used, and detailed descriptions of each will be omitted as appropriate.

第１実施形態においては、被写体選択部４２３が、利用者からの指示に応じて基準被写体Ｑrefを選択する形態を例示した。第２実施形態の被写体選択部４２３は、複数の素材データＭ1～Ｍ3を解析した結果に応じて、複数の被写体Ｑ1～Ｑ3から基準被写体Ｑrefを選択する（Ｓ23）。被写体選択部４２３が基準被写体Ｑrefを選択する方法としては、例えば以下の態様１または態様２が採用される。 In the first embodiment, the subject selection unit 423 selects the reference subject Qref in response to an instruction from a user. In the second embodiment, the subject selection unit 423 selects the reference subject Qref from the multiple subjects Q1 to Q3 in response to the results of analyzing the multiple material data M1 to M3 (S23). As a method for the subject selection unit 423 to select the reference subject Qref, for example, the following aspect 1 or aspect 2 is adopted.

［態様１］
態様１の被写体選択部４２３は、複数の動画Ｖ1～Ｖ3を解析した結果に応じて基準被写体Ｑrefを選択する。例えば、複数の被写体Ｑ1～Ｑ3のうち特に動作している被写体Ｑnを視聴者は特に注目すべきという概略的な傾向がある。以上の傾向を考慮して、被写体選択部４２３は、複数の被写体Ｑ1～Ｑ3のうち時間的な変化が大きい動画Ｖnに対応する被写体Ｑnを、基準被写体Ｑrefとして選択する。 [Aspect 1]
The subject selection unit 423 in aspect 1 selects the reference subject Qref according to the results of analyzing the multiple moving images V1 to V3. For example, there is a general tendency that viewers pay particular attention to a subject Qn that is particularly moving among the multiple subjects Q1 to Q3. Taking the above tendency into consideration, the subject selection unit 423 selects the subject Qn that corresponds to the moving image Vn that changes greatly over time among the multiple subjects Q1 to Q3 as the reference subject Qref.

具体的には、被写体選択部４２３は、複数の動画Ｖ1～Ｖ3の各々について画像の時間的な変化量を算定し、変化量が大きい動画Ｖnに対応する被写体Ｑnを、基準被写体Ｑrefとして選択する。以上の形態によれば、複数の被写体Ｑ1～Ｑ3のうち動作が顕著な被写体Ｑnが、基準被写体Ｑrefとして選択される。なお、変化量は、動画Ｖnの全体の解析により算定されてもよいし、動画Ｖnのうち被写体画像Ｇnの解析により算定されてもよい。 Specifically, the subject selection unit 423 calculates the amount of change over time of the image for each of the multiple videos V1 to V3, and selects the subject Qn corresponding to the video Vn with the largest amount of change as the reference subject Qref. According to the above embodiment, the subject Qn with the most prominent movement among the multiple subjects Q1 to Q3 is selected as the reference subject Qref. Note that the amount of change may be calculated by analyzing the entire video Vn, or may be calculated by analyzing the subject image Gn in the video Vn.

複数の被写体Ｑ1～Ｑ3の各々が順次に動作する場面においては、複数の被写体Ｑ1～Ｑ3のうち動作が顕著な被写体Ｑnは経時的に変化する。したがって、合成動画Ｖの任意の時点において基準被写体Ｑrefは変更される。例えば、被写体Ｑn1が動作する状態から被写体Ｑn2が動作する状態に遷移した場合、基準被写体Ｑrefは被写体Ｑn1から被写体Ｑn2に変更される。すなわち、合成動画Ｖにおいて明瞭に表示される被写体画像Ｇnは、時間の経過とともに随時に切替わる。 In a scene in which each of the multiple subjects Q1 to Q3 moves in sequence, the subject Qn whose movement is most prominent among the multiple subjects Q1 to Q3 changes over time. Therefore, the reference subject Qref changes at any point in the composite video V. For example, when a transition occurs from a state in which subject Qn1 moves to a state in which subject Qn2 moves, the reference subject Qref changes from subject Qn1 to subject Qn2. In other words, the subject image Gn that is clearly displayed in the composite video V switches from time to time.

［態様２］
態様２の被写体選択部４２３は、複数の音声Ａ1～Ａ3を解析した結果に応じて基準被写体Ｑrefを選択する。例えば、複数の被写体Ｑ1～Ｑ3のうち発言している被写体Ｑnを視聴者は特に注目すべきという概略的な傾向がある。以上の傾向を考慮して、被写体選択部４２３は、複数の被写体Ｑ1～Ｑ3のうち音量が大きい音声Ａnに対応する被写体Ｑnを、基準被写体Ｑrefとして選択する。 [Aspect 2]
The subject selection unit 423 in aspect 2 selects the reference subject Qref according to the results of analyzing the multiple sounds A1 to A3. For example, there is a general tendency that viewers pay particular attention to the subject Qn who is speaking among the multiple subjects Q1 to Q3. Taking the above tendency into consideration, the subject selection unit 423 selects the subject Qn corresponding to the sound An with the loudest volume among the multiple subjects Q1 to Q3 as the reference subject Qref.

具体的には、被写体選択部４２３は、複数の音声Ａ1～Ａ3の各々について音量を算定し、音量が大きい音声Ａnに対応する被写体Ｑnを、基準被写体Ｑrefとして選択する。以上の形態によれば、複数の被写体Ｑ1～Ｑ3のうち発言中の被写体Ｑnが、基準被写体Ｑrefとして選択される。 Specifically, the subject selection unit 423 calculates the volume of each of the multiple sounds A1 to A3, and selects the subject Qn corresponding to the sound An with the loudest volume as the reference subject Qref. According to the above embodiment, the subject Qn that is speaking among the multiple subjects Q1 to Q3 is selected as the reference subject Qref.

複数の被写体Ｑ1～Ｑ3の各々が順次に発言する場面においては、音声Ａnの音量が大きい被写体Ｑnは経時的に変化する。したがって、合成動画Ｖの任意の時点において基準被写体Ｑrefは変更される。例えば、被写体Ｑn1が発言する状態から被写体Ｑn2が発言する状態に遷移した場合、基準被写体Ｑrefは被写体Ｑn1から被写体Ｑn2に変更される。すなわち、合成動画Ｖにおいて明瞭に表示される被写体画像Ｇnは、時間の経過とともに随時に切替わる。 In a scene in which multiple subjects Q1 to Q3 each speak in sequence, the subject Qn with the loudest sound An changes over time. Therefore, the reference subject Qref changes at any point in the composite video V. For example, when a state transition occurs from one in which subject Qn1 speaks to one in which subject Qn2 speaks, the reference subject Qref changes from subject Qn1 to subject Qn2. In other words, the subject image Gn that is clearly displayed in the composite video V switches from time to time.

態様１および態様２の説明から理解される通り、第２実施形態の被写体選択部４２３は、複数の素材データＭ1～Ｍ3（動画Ｖ1～Ｖ3または音声Ａ1～Ａ3）を解析した結果に応じて、複数の被写体Ｑ1～Ｑ3から基準被写体Ｑrefを選択する要素として表現される。 As can be understood from the explanations of aspects 1 and 2, the subject selection unit 423 in the second embodiment is expressed as an element that selects a reference subject Qref from a plurality of subjects Q1 to Q3 according to the results of analyzing a plurality of pieces of material data M1 to M3 (videos V1 to V3 or audio A1 to A3).

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態においては、複数の素材データＭ1～Ｍ3を解析した結果に応じて複数の被写体Ｑ1～Ｑ3から基準被写体Ｑrefが選択される。したがって、利用者による指示を必要とせずに、配信コンテンツＣにおいて視聴者が特に注目すべき適切な被写体Ｑnを、基準被写体Ｑrefとして選択できる。なお、各素材データＭnの解析の結果に応じた基準被写体Ｑrefの選択（第２実施形態）と、利用者からの指示に応じた基準被写体Ｑrefの選択（第１実施形態）とは併用されてもよい。 The second embodiment also achieves the same effects as the first embodiment. Furthermore, in the second embodiment, the reference subject Qref is selected from the multiple subjects Q1 to Q3 according to the results of analyzing the multiple material data M1 to M3. Therefore, an appropriate subject Qn that the viewer should pay particular attention to in the distributed content C can be selected as the reference subject Qref without the need for instructions from the user. Note that the selection of the reference subject Qref according to the results of the analysis of each material data Mn (second embodiment) and the selection of the reference subject Qref according to instructions from the user (first embodiment) may be used together.

［第３実施形態］
図１３は、第３実施形態における画像処理部４２のブロック図である。第３実施形態の画像処理部４２は、第１実施形態と同様の要素（距離特定部４２１、被写体抽出部４２２、被写体選択部４２３および画像調整部４２４）に加えて画像生成部４２５を含む。画像生成部４２５は、仮想動画Ｖzを生成する。 [Third embodiment]
13 is a block diagram of the image processing unit 42 in the third embodiment. The image processing unit 42 in the third embodiment includes an image generating unit 425 in addition to the same elements as those in the first embodiment (a distance identifying unit 421, a subject extracting unit 422, a subject selecting unit 423, and an image adjusting unit 424). The image generating unit 425 generates a virtual moving image Vz.

図１４は、仮想動画Ｖzの説明図である。仮想動画Ｖzは、仮想空間Ｚの動画である。仮想空間Ｚは、複数の仮想オブジェクトＯm（ｍ＝１，２）が配置された仮想的な空間である。すなわち、仮想空間Ｚは、収録スタジオＲn等の現実の空間とは別個の空間であり、コンピュータによる情報処理で生成される空間である。各仮想オブジェクトＯmは、例えば演出または装飾のために仮想空間Ｚ内に配置された仮想的な表示要素である。なお、図１４においては、仮想空間Ｚ内で活動する仮想的な生物を、仮想オブジェクトＯmとして例示した。ただし、仮想空間Ｚ内の建造物や自然物等の無生物的な要素が、仮想オブジェクトＯmとして仮想空間Ｚ内に配置されてもよい。 Figure 14 is an explanatory diagram of a virtual video Vz. The virtual video Vz is a video of a virtual space Z. The virtual space Z is a virtual space in which multiple virtual objects Om (m = 1, 2) are arranged. In other words, the virtual space Z is a space separate from a real space such as a recording studio Rn, and is a space generated by information processing by a computer. Each virtual object Om is a virtual display element arranged in the virtual space Z for the purpose of, for example, performance or decoration. Note that in Figure 14, a virtual living thing active in the virtual space Z is exemplified as the virtual object Om. However, inanimate elements such as buildings and natural objects in the virtual space Z may also be arranged in the virtual space Z as the virtual object Om.

仮想空間Ｚ内には仮想的な撮像装置（以下「仮想撮像装置」という）が設置される。仮想撮像装置は、仮想空間Ｚを撮像する仮想カメラである。画像生成部４２５が生成する仮想動画Ｖzは、仮想撮像装置により仮想空間Ｚを撮像した動画である。仮想動画Ｖzの生成には、例えば３Ｄレンダリング等の各種の画像処理が利用される。なお、仮想動画Ｖzを表すデータの形式は任意である。 A virtual imaging device (hereinafter referred to as the "virtual imaging device") is installed within the virtual space Z. The virtual imaging device is a virtual camera that captures the virtual space Z. The virtual video Vz generated by the image generation unit 425 is a video captured of the virtual space Z by the virtual imaging device. Various types of image processing, such as 3D rendering, are used to generate the virtual video Vz. Note that the format of the data representing the virtual video Vz is arbitrary.

仮想オブジェクトＯm毎に仮想撮像距離Ｅmが設定される。仮想撮像距離Ｅmは、仮想空間Ｚ内における仮想撮像装置と仮想オブジェクトＯmとの間の距離である。図１４においては、仮想オブジェクトＯ2の仮想撮像距離Ｅ2が仮想オブジェクトＯ1の仮想撮像距離Ｅ1を上回る場合が想定されている（Ｅ2＞Ｅ1）。 A virtual imaging distance Em is set for each virtual object Om. The virtual imaging distance Em is the distance between the virtual imaging device and the virtual object Om in the virtual space Z. In FIG. 14, it is assumed that the virtual imaging distance E2 of the virtual object O2 exceeds the virtual imaging distance E1 of the virtual object O1 (E2>E1).

図１５は、第３実施形態における合成処理Ｓ2のフローチャートである。第３実施形態の合成処理Ｓ2において、制御装置３１（画像生成部４２５）は、仮想動画Ｖzを生成する（Ｓ26）。なお、仮想動画Ｖzの生成（Ｓ26）は、調整処理（Ｓ24，Ｓ25）の開始前の任意の段階で実行されてよい。 Figure 15 is a flowchart of the synthesis process S2 in the third embodiment. In the synthesis process S2 in the third embodiment, the control device 31 (image generation unit 425) generates a virtual video Vz (S26). Note that the generation of the virtual video Vz (S26) may be performed at any stage before the start of the adjustment processes (S24, S25).

第３実施形態の合成処理Ｓ2は、複数の被写体画像Ｇ1～Ｇ3と仮想動画Ｖzとの合成により合成動画Ｖを生成する画像処理である。図１６は、第３実施形態における合成動画Ｖの模式図である。図１６に例示される通り、合成動画Ｖは、複数の被写体画像Ｇ1～Ｇ3と複数の仮想オブジェクトＯm（Ｏ1，Ｏ2）とを含む。 The synthesis process S2 in the third embodiment is an image process that generates a synthetic video V by synthesizing multiple subject images G1 to G3 with a virtual video Vz. FIG. 16 is a schematic diagram of the synthetic video V in the third embodiment. As illustrated in FIG. 16, the synthetic video V includes multiple subject images G1 to G3 and multiple virtual objects Om (O1, O2).

第３実施形態の調整処理（Ｓ24，Ｓ25）においては、第１実施形態と同様に各被写体画像Ｇnが撮像距離Ｄnに応じて調整されるほか、各仮想オブジェクトＯmが仮想撮像距離Ｅmに応じて調整される。 In the adjustment process (S24, S25) of the third embodiment, each subject image Gn is adjusted according to the imaging distance Dn, as in the first embodiment, and each virtual object Om is adjusted according to the virtual imaging distance Em.

具体的には、ぼかし処理Ｓ24において、画像調整部４２４は、各被写体画像Ｇnを撮像距離Ｄnに応じたぼかし量Ｂnによりぼかすほか、各仮想オブジェクトＯmを仮想撮像距離Ｅmに応じたぼかし量Ｂmによりぼかす。仮想撮像距離Ｅmは、撮像距離Ｄnと同様にぼかし量Ｂmの制御に利用される。例えば、画像調整部４２４は、合焦範囲Ｐ0の外側において、基準値Ｄrefと各仮想オブジェクトＯmの仮想撮像距離Ｅmとの差異|Ｄref－Ｅm|が大きいほど、仮想オブジェクトＯmのぼかし量Ｂmを大きい数値に設定する。以上に例示した制御の結果、図１６に例示される通り、各仮想オブジェクトＯmは仮想撮像距離Ｅmに応じてぼけた画像となる。 Specifically, in the blurring process S24, the image adjustment unit 424 blurs each subject image Gn with a blur amount Bn according to the imaging distance Dn, and blurs each virtual object Om with a blur amount Bm according to the virtual imaging distance Em. The virtual imaging distance Em is used to control the blur amount Bm, just like the imaging distance Dn. For example, outside the focusing range P0, the image adjustment unit 424 sets the blur amount Bm of the virtual object Om to a larger value the greater the difference |Dref-Em| between the reference value Dref and the virtual imaging distance Em of each virtual object Om. As a result of the control exemplified above, each virtual object Om becomes a blurred image according to the virtual imaging distance Em, as exemplified in FIG. 16.

また、重畳処理Ｓ25において、画像調整部４２４は、仮想空間Ｚ内における各被写体画像Ｇnの前後を撮像距離Ｄnに応じて制御するほか、仮想空間Ｚ内における各仮想オブジェクトＯmの前後を仮想撮像距離Ｅmに応じて制御する。具体的には、仮想撮像距離Ｅmが大きいほど仮想オブジェクトＯmが奥側に位置するように、各仮想オブジェクトＯmの前後が調整される。 In addition, in the superimposition process S25, the image adjustment unit 424 controls the front and rear of each subject image Gn in the virtual space Z according to the imaging distance Dn, and also controls the front and rear of each virtual object Om in the virtual space Z according to the virtual imaging distance Em. Specifically, the front and rear of each virtual object Om is adjusted so that the virtual object Om is positioned further back as the virtual imaging distance Em increases.

例えば、図１６においては、仮想オブジェクトＯ1の仮想撮像距離Ｅ1が被写体画像Ｇ1の撮像距離Ｄ1と被写体画像Ｇ2の撮像距離Ｄ2との間の数値である場合が想定されている（Ｄ1＜Ｅ1＜Ｄ2）。したがって、画像調整部４２４は、被写体画像Ｇ1を仮想オブジェクトＯ1の手前に配置し、かつ、仮想オブジェクトＯ1を被写体画像Ｇ2の手前に配置する。すなわち、仮想オブジェクトＯ1のうち被写体画像Ｇ1と重複する部分は、被写体画像Ｇ1の背後に隠れ、被写体画像Ｇ2のうち仮想オブジェクトＯ1と重複する部分は、仮想オブジェクトＯ1の背後に隠れる。同様に、被写体画像Ｇ2は仮想オブジェクトＯ2の手前に位置し、仮想オブジェクトＯ2は被写体画像Ｇ3の手前に位置する。 For example, in FIG. 16, it is assumed that the virtual imaging distance E1 of the virtual object O1 is a numerical value between the imaging distance D1 of the subject image G1 and the imaging distance D2 of the subject image G2 (D1<E1<D2). Therefore, the image adjustment unit 424 places the subject image G1 in front of the virtual object O1, and places the virtual object O1 in front of the subject image G2. That is, the part of the virtual object O1 that overlaps with the subject image G1 is hidden behind the subject image G1, and the part of the subject image G2 that overlaps with the virtual object O1 is hidden behind the virtual object O1. Similarly, the subject image G2 is located in front of the virtual object O2, and the virtual object O2 is located in front of the subject image G3.

第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態においては、収録スタジオＲnにおいて撮像された被写体画像Ｇnと仮想空間Ｚ内の仮想オブジェクトＯmとが重畳される。したがって、現実の被写体Ｑnだけでなく仮想オブジェクトＯmを含む多様な合成動画Ｖを生成できる。しかも、仮想撮像距離Ｅmに応じて仮想オブジェクトＯmが調整される。したがって、仮想撮像距離Ｅmが反映されない形態と比較して、仮想オブジェクトＯmが各被写体Ｑnと同じ空間に所在するような自然な合成動画Ｖを生成できる。 The third embodiment achieves the same effect as the first embodiment. Furthermore, in the third embodiment, a subject image Gn captured in a recording studio Rn is superimposed on a virtual object Om in a virtual space Z. Therefore, a variety of composite videos V including not only real subjects Qn but also virtual objects Om can be generated. Moreover, the virtual object Om is adjusted according to the virtual imaging distance Em. Therefore, compared to a form in which the virtual imaging distance Em is not reflected, a natural composite video V can be generated in which the virtual object Om appears to be located in the same space as each subject Qn.

なお、被写体選択部４２３が選択する基準被写体Ｑrefの候補に仮想オブジェクトＯmが含まれてもよい。例えば、被写体選択部４２３は、複数の被写体Ｑ1～Ｑ3と複数の仮想オブジェクトＯ1，Ｏ2とを含む複数の候補（以下「候補被写体」という）から基準被写体Ｑrefを選択する。具体的には、被写体選択部４２３は、複数の候補被写体のうち利用者が操作装置３４に対する操作により指定した候補被写体を、基準被写体Ｑrefとして選択する。したがって、仮想オブジェクトＯmが基準被写体Ｑrefとして選択され、当該仮想オブジェクトＯmの仮想撮像距離Ｅmが基準値Ｄrefとして設定される場合がある。 The candidates for the reference subject Qref selected by the subject selection unit 423 may include the virtual object Om. For example, the subject selection unit 423 selects the reference subject Qref from a plurality of candidates (hereinafter referred to as "candidate subjects") including a plurality of subjects Q1 to Q3 and a plurality of virtual objects O1 and O2. Specifically, the subject selection unit 423 selects, as the reference subject Qref, a candidate subject designated by the user through an operation on the operation device 34 from among the plurality of candidate subjects. Therefore, the virtual object Om may be selected as the reference subject Qref, and the virtual imaging distance Em of the virtual object Om may be set as the reference value Dref.

また、仮想オブジェクトＯmを含む複数の候補被写体から基準被写体Ｑrefが選択される形態には、第２実施形態の態様１が適用されてもよい。具体的には、被写体選択部４２３は、複数の動画Ｖ1～Ｖ3と仮想動画Ｖzとを解析した結果に応じて基準被写体Ｑrefを選択する。例えば、被写体選択部４２３は、複数の動画Ｖ1～Ｖ3と各仮想オブジェクトＯmの動画とを含む複数の動画のうち、時間的な変化が大きい動画から基準被写体Ｑrefを選択する。したがって、仮想オブジェクトＯmの動画の変化量が各動画Ｖnの変化量を上回る場合、被写体選択部４２３は、仮想オブジェクトＯmを基準被写体Ｑrefとして選択する。 In addition, aspect 1 of the second embodiment may be applied to the form in which the reference subject Qref is selected from a plurality of candidate subjects including the virtual object Om. Specifically, the subject selection unit 423 selects the reference subject Qref according to the results of analyzing the plurality of videos V1 to V3 and the virtual video Vz. For example, the subject selection unit 423 selects the reference subject Qref from a video with a large temporal change among a plurality of videos including the plurality of videos V1 to V3 and the videos of each virtual object Om. Therefore, when the amount of change in the video of the virtual object Om exceeds the amount of change in each video Vn, the subject selection unit 423 selects the virtual object Om as the reference subject Qref.

また、仮想オブジェクトＯmが音声を発音する形態においては、第２実施形態の態様２が適用されてもよい。具体的には、被写体選択部４２３は、複数の音声Ａ1～Ａ3と仮想動画Ｖzに対応する音声とを解析した結果に応じて基準被写体Ｑrefを選択する。例えば、被写体選択部４２３は、複数の音声Ａ1～Ａ3と各仮想オブジェクトＯmの音声とを含む複数の音声のうち、音量が大きい音声に対応する被写体Ｑnまたは仮想オブジェクトＯmを、基準被写体Ｑrefとして選択する。したがって、仮想オブジェクトＯmが音量で発音している場合、被写体選択部４２３は、仮想オブジェクトＯmを基準被写体Ｑrefとして選択する。 In addition, in a form in which the virtual object Om produces sound, the aspect 2 of the second embodiment may be applied. Specifically, the subject selection unit 423 selects the reference subject Qref according to the result of analyzing the multiple sounds A1 to A3 and the sound corresponding to the virtual video Vz. For example, the subject selection unit 423 selects the subject Qn or virtual object Om corresponding to the sound with the loudest volume among the multiple sounds including the multiple sounds A1 to A3 and the sound of each virtual object Om as the reference subject Qref. Therefore, when the virtual object Om produces sound at a certain volume, the subject selection unit 423 selects the virtual object Om as the reference subject Qref.

［第４実施形態］
図１７は、第４実施形態における動画合成システム３０の機能的な構成を例示するブロック図である。第４実施形態の制御装置３１は、第１実施形態と同様の要素（動画取得部４１、画像処理部４２、音声処理部４３および出力処理部４４）に加えて撮像制御部４５としても機能する。撮像制御部４５は、複数の撮像装置２１-1～２１-3を制御する。 [Fourth embodiment]
17 is a block diagram illustrating a functional configuration of a moving image synthesizing system 30 according to the fourth embodiment. The control device 31 according to the fourth embodiment functions as an imaging control unit 45 in addition to the same elements as those in the first embodiment (a moving image acquisition unit 41, an image processing unit 42, an audio processing unit 43, and an output processing unit 44). The imaging control unit 45 controls a plurality of imaging devices 21-1 to 21-3.

第４実施形態の各撮像装置２１-nは、動画Ｖnを撮像する条件（以下「撮像条件」という）を変更可能なＰＴＺ（Panoramac-Tilt-Zoom）カメラである。撮像条件は、撮像装置２１-nが撮像する範囲を規定する条件である。例えば、撮像方向および撮像倍率が撮像条件として例示される。撮像方向は、撮影レンズの光軸の方向であり、例えば水平方向（パン）および垂直方向（チルト）に変化する。撮像倍率は、例えば焦点距離に応じた倍率（ズーム）である。 Each imaging device 21-n in the fourth embodiment is a PTZ (Panoramac-Tilt-Zoom) camera that can change the conditions for capturing video Vn (hereinafter referred to as "imaging conditions"). The imaging conditions are conditions that define the range captured by the imaging device 21-n. For example, the imaging conditions include the imaging direction and imaging magnification. The imaging direction is the direction of the optical axis of the shooting lens, and changes, for example, to the horizontal direction (pan) and vertical direction (tilt). The imaging magnification is, for example, a magnification (zoom) that corresponds to the focal length.

撮像制御部４５は、操作装置３４に対する利用者からの操作に応じて各撮像装置２１-nを制御する。利用者は、再生装置３５が再生する配信コンテンツＣを視聴しながら操作装置３４を操作することで、各撮像装置２１-nの撮像条件を指示する。撮像制御部４５は、利用者が指示した撮像条件を指定する制御データＸを生成する。制御データＸは、撮像方向および撮像倍率を指定するデータである。例えば、制御データＸは、現時点の数値に対する変化量（相対値）、または所定値を基準とした絶対値として、撮像方向および撮像倍率を指定する。なお、制御データＸの時系列が記憶装置３２に事前に記憶されてもよい。 The imaging control unit 45 controls each imaging device 21-n in response to a user's operation on the operation device 34. The user operates the operation device 34 while watching the distribution content C played by the playback device 35 to specify the imaging conditions for each imaging device 21-n. The imaging control unit 45 generates control data X that specifies the imaging conditions specified by the user. The control data X is data that specifies the imaging direction and imaging magnification. For example, the control data X specifies the imaging direction and imaging magnification as the amount of change (relative value) with respect to the current numerical value, or as an absolute value based on a predetermined value. Note that the time series of the control data X may be stored in advance in the storage device 32.

撮像制御部４５は、複数の収録システム２０-1～２０-3に対して制御データＸを通信装置３３から送信する。すなわち、複数の撮像装置２１-1～２１-3に対して共通の制御データＸが供給される。各撮像装置２１-nは同機種であり、動作特性等の仕様は相互に共通する。したがって、複数の撮像装置２１-1～２１-3は、制御データＸに対して同様に動作する。すなわち、撮像制御部４５による制御データＸの供給により、複数の撮像装置２１-1～２１-3は共通の撮像条件に制御される。 The imaging control unit 45 transmits control data X from the communication device 33 to the multiple recording systems 20-1 to 20-3. That is, common control data X is supplied to the multiple imaging devices 21-1 to 21-3. Each imaging device 21-n is of the same model and has common specifications such as operating characteristics. Therefore, the multiple imaging devices 21-1 to 21-3 operate in the same way in response to the control data X. That is, the supply of control data X by the imaging control unit 45 controls the multiple imaging devices 21-1 to 21-3 to common imaging conditions.

撮像制御部４５は、複数の収録システム２０-1～２０-3に対して時間的に並列に制御データＸを送信する。すなわち、複数の撮像装置２１-1～２１-3に対して制御データＸが時間的に並列に供給される。したがって、各撮像装置２１-nの撮像条件は、制御データＸに応じて時間的に並列に変化する。 The imaging control unit 45 transmits the control data X to the multiple recording systems 20-1 to 20-3 in parallel over time. That is, the control data X is supplied to the multiple imaging devices 21-1 to 21-3 in parallel over time. Therefore, the imaging conditions of each imaging device 21-n change in parallel over time according to the control data X.

以上の説明から理解される通り、撮像制御部４５は、複数の撮像装置２１-1～２１-3を、共通の撮像条件のもとで、時間的に相互に並列に動作させる。例えば、撮像装置２１-1の撮像方向が特定の角度だけ変化する場合、撮像装置２１-1の撮像方向の変化に並行して、撮像装置２１-2および撮像装置２１-3の撮像方向も同じ角度だけ変化する。また、撮像装置２１-1の撮像倍率が所定の倍率に変化する場合、撮像装置２１-1の撮像倍率の変化に並行して、撮像装置２１-2および撮像装置２１-3の撮像倍率も同じ倍率に変化する。すなわち、複数の撮像装置２１-1～２１-3の撮像条件が相互に連動して共通の条件に変化する。 As can be understood from the above explanation, the imaging control unit 45 operates the multiple imaging devices 21-1 to 21-3 in parallel with each other in time under common imaging conditions. For example, when the imaging direction of the imaging device 21-1 changes by a specific angle, the imaging directions of the imaging devices 21-2 and 21-3 also change by the same angle in parallel with the change in the imaging direction of the imaging device 21-1. Furthermore, when the imaging magnification of the imaging device 21-1 changes to a predetermined magnification, the imaging magnifications of the imaging devices 21-2 and 21-3 also change to the same magnification in parallel with the change in the imaging magnification of the imaging device 21-1. In other words, the imaging conditions of the multiple imaging devices 21-1 to 21-3 are linked to each other and change to common conditions.

第４実施形態においても第１実施形態と同様の効果が実現される。また、各撮像装置２１-nによる動画Ｖnの撮像が共通の撮像条件のもとで実行される。したがって、実際には相異なる収録スタジオＲnに所在する複数の被写体Ｑ1～Ｑ3が恰も共通の空間内に所在するかのように視聴者に知覚される自然な合成動画Ｖを生成できる。 The fourth embodiment achieves the same effects as the first embodiment. Furthermore, each imaging device 21-n captures a video Vn under common imaging conditions. Therefore, a natural composite video V can be generated that is perceived by the viewer as if multiple subjects Q1 to Q3, which are actually located in different recording studios Rn, are located in a common space.

なお、第２実施形態および第３実施形態は、第４実施形態にも同様に適用される。例えば、仮想動画Ｖzを複数の動画Ｖ1～Ｖ3に合成する第３実施形態において、画像生成部４２５は、仮想撮像装置が仮想動画Ｖzを撮像するための撮像条件を、各撮像装置２１-nの撮像条件に連動させてもよい。すなわち、複数の撮像装置２１-1～２１-3と仮想撮像装置とが、共通の撮像条件のもとで時間的に相互に並列に動作してもよい。 The second and third embodiments are also similarly applied to the fourth embodiment. For example, in the third embodiment in which the virtual video Vz is synthesized into multiple videos V1 to V3, the image generation unit 425 may link the imaging conditions for the virtual imaging device to capture the virtual video Vz to the imaging conditions of each imaging device 21-n. In other words, the multiple imaging devices 21-1 to 21-3 and the virtual imaging device may operate in parallel with each other in time under common imaging conditions.

［変形例］
以上に例示した各形態は多様に変形され得る。前述の各形態に適用され得る具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を、相互に矛盾しない範囲で併合してもよい。 [Modification]
Each of the above-mentioned exemplary embodiments may be modified in various ways. Specific modified embodiments that may be applied to each of the above-mentioned embodiments are exemplified below. Two or more embodiments arbitrarily selected from the following examples may be combined to the extent that they are not mutually contradictory.

（１）前述の各形態においては、距離指標を利用して撮像距離Ｄnを特定したが、各被写体Ｑnの撮像距離Ｄnを特定する方法は、以上の例示に限定されない。例えば、以下に例示する態様１または態様２により、撮像距離Ｄnが特定されてもよい。 (1) In each of the above-described embodiments, the imaging distance Dn is determined using a distance index, but the method of determining the imaging distance Dn of each subject Qn is not limited to the above examples. For example, the imaging distance Dn may be determined by the following example 1 or 2.

［態様１］
距離特定部４２１は、各被写体Ｑnに対する顔検出の結果に応じて撮像距離Ｄnを特定してもよい。例えば、距離特定部４２１は、動画Ｖnに対する顔検出の結果を利用して、被写体Ｑnの顔面の各要素に関するサイズの指標（以下「評価指標」という）を算定する。顔検出は、被写体Ｑnの顔面を検出する処理である。例えば、顔面のサイズまたは両眼間の距離等の数値が、評価指標として算定される。 [Aspect 1]
The distance specification unit 421 may specify the imaging distance Dn according to the result of face detection for each subject Qn. For example, the distance specification unit 421 uses the result of face detection for the video Vn to calculate a size index (hereinafter referred to as an "evaluation index") for each element of the face of the subject Qn. Face detection is a process for detecting the face of the subject Qn. For example, a numerical value such as the size of the face or the distance between the eyes is calculated as the evaluation index.

撮像距離Ｄnが大きいほど評価指標は減少するという相関がある。以上の相関を考慮して、距離特定部４２１は、評価指標に応じて撮像距離Ｄnを特定する。例えば、距離特定部４２１は、評価指標が基準値と比較して大きいほど撮像距離Ｄnを小さい数値に設定し、評価指標が基準値と比較して小さいほど撮像距離Ｄnを大きい数値に設定する。なお、顔面の各要素に関するサイズには個人差があるから、評価指標の基準値は被写体Ｑn毎に個別に用意されることが望ましい。例えば、撮像距離Ｄnが所定値である状態で算定された評価指標が、基準値として記憶装置３２に事前に記憶される。 There is a correlation in which the evaluation index decreases as the imaging distance Dn increases. Taking the above correlation into consideration, the distance determination unit 421 determines the imaging distance Dn according to the evaluation index. For example, the distance determination unit 421 sets the imaging distance Dn to a smaller value as the evaluation index increases compared to a reference value, and sets the imaging distance Dn to a larger value as the evaluation index decreases compared to the reference value. Note that, since there are individual differences in the size of each facial element, it is desirable to prepare a reference value for the evaluation index individually for each subject Qn. For example, the evaluation index calculated when the imaging distance Dn is a predetermined value is stored in advance in the storage device 32 as a reference value.

なお、以上の説明においては被写体Ｑnの顔検出を例示したが、距離特定部４２１は、被写体Ｑnに対する骨格推定の結果を利用して、撮像距離Ｄnの特定のための評価指標を算定してもよい。骨格推定は、被写体Ｑnについて関節等の骨格を推定する処理である。例えば、特定の関節間の距離（例えば腕の長さ）または比率が評価指標として算定される。距離特定部４２１は、骨格に関する評価指標に応じて撮像距離Ｄnを特定する。なお、骨格には個人差があるから、評価指標の基準値は被写体Ｑn毎に個別に用意されてもよい。 In the above description, face detection of subject Qn has been exemplified, but the distance determination unit 421 may use the results of skeletal estimation for subject Qn to calculate an evaluation index for determining the imaging distance Dn. Skeletal estimation is a process of estimating the skeleton of subject Qn, such as joints. For example, the distance between specific joints (e.g., arm length) or the ratio is calculated as the evaluation index. The distance determination unit 421 determines the imaging distance Dn according to the evaluation index related to the skeleton. Since skeletons vary from person to person, the reference value of the evaluation index may be prepared individually for each subject Qn.

［態様２］
撮像装置２１-nによる撮像時に測距装置が撮像距離Ｄnを測定する形態においては、距離特定部４２１は、測距装置が測定した撮像距離Ｄnを取得する。測距装置は、例えば赤外光または紫外光等の測距光を利用した光学的なセンサである。測距装置は、例えばＬｉＤＡＲ（Light Detection and Ranging）機能を具備する。また、撮像装置２１-nの自動焦点機能により撮像距離Ｄnが特定されてもよい。自動焦点機能は、被写体Ｑnに自動的に合焦する機能である。撮影レンズを制御した結果に応じて撮像距離Ｄnが特定される。 [Aspect 2]
In a configuration in which a distance measuring device measures the imaging distance Dn when the imaging device 21-n captures an image, the distance determination unit 421 acquires the imaging distance Dn measured by the distance measuring device. The distance measuring device is an optical sensor that uses distance measuring light such as infrared light or ultraviolet light. The distance measuring device has, for example, a LiDAR (Light Detection and Ranging) function. The imaging distance Dn may also be determined by an autofocus function of the imaging device 21-n. The autofocus function is a function that automatically focuses on the subject Qn. The imaging distance Dn is determined according to the result of controlling the photographing lens.

撮像距離Ｄnの特定には、態様１および態様２以外にも任意の方法が採用される。例えば、被写体Ｑnの距離指標（マーカー）を複数の撮像装置２１-nにより撮像した結果を利用して、距離特定部４２１が撮像距離Ｄnを特定してもよい。 Any method other than the method 1 and the method 2 may be used to determine the imaging distance Dn. For example, the distance determination unit 421 may determine the imaging distance Dn by using the results of imaging a distance indicator (marker) of the subject Qn by a plurality of imaging devices 21-n.

なお、制御装置３１（距離特定部４２１）による演算処理で各被写体Ｑnの撮像距離Ｄnが特定される必要はない。例えば、各収録スタジオＲnにおいて例えばメジャー等の計測器を利用して収録前に実際に測定された撮像距離Ｄnが、被写体Ｑn毎に記憶装置３２に事前に記憶されてもよい。制御装置３１（距離特定部４２１）は、合成処理Ｓ2において、各被写体Ｑnの撮像距離Ｄnを記憶装置３２から取得する（Ｓ21）。以上の説明から理解される通り、距離特定部４２１による撮像距離Ｄnの特定（Ｓ21）には、事前に記憶された撮像距離Ｄnの読出も包含される。 It should be noted that the imaging distance Dn of each subject Qn does not need to be determined by calculation processing by the control device 31 (distance determination unit 421). For example, the imaging distance Dn actually measured before recording in each recording studio Rn using a measuring device such as a tape measure may be stored in advance in the storage device 32 for each subject Qn. In the synthesis process S2, the control device 31 (distance determination unit 421) acquires the imaging distance Dn of each subject Qn from the storage device 32 (S21). As can be understood from the above explanation, the determination of the imaging distance Dn by the distance determination unit 421 (S21) also includes the reading of the imaging distance Dn that has been stored in advance.

（２）前述の各形態においては、撮像距離Ｄnに応じてぼかし量Ｂnを設定したが、ぼかし量Ｂnは、撮像距離Ｄn以外の制御パラメータに依存してもよい。例えば、画像調整部４２４は、撮像距離Ｄnとぼかし量Ｂnとの相関を、仮想的な絞り値（以下「仮想絞り値Ｆ」という）に応じて制御してもよい。画像調整部４２４は、例えば操作装置３４に対する利用者からの指示に応じて仮想絞り値Ｆを設定する。 (2) In each of the above-described embodiments, the blur amount Bn is set according to the imaging distance Dn, but the blur amount Bn may depend on a control parameter other than the imaging distance Dn. For example, the image adjustment unit 424 may control the correlation between the imaging distance Dn and the blur amount Bn according to a virtual aperture value (hereinafter referred to as "virtual aperture value F"). The image adjustment unit 424 sets the virtual aperture value F according to, for example, an instruction from the user to the operation device 34.

図１８は、本変形例における撮像距離Ｄnとぼかし量Ｂnとの関係を表すグラフである。仮想絞り値Ｆが相異なる２個の数値（Ｆ1，Ｆ2）に設定された場合のグラフが図１８には併記されている。数値Ｆ1は数値Ｆ2を下回る。 Figure 18 is a graph showing the relationship between the imaging distance Dn and the blur amount Bn in this modified example. Figure 18 also shows a graph in which the virtual aperture value F is set to two different values (F1, F2). Value F1 is lower than value F2.

画像調整部４２４は、仮想絞り値Ｆに応じて合焦範囲Ｐ0を制御する。具体的には、図１８に例示される通り、仮想絞り値Ｆが数値Ｆ1に設定された場合の合焦範囲Ｐ0は、仮想絞り値Ｆが数値Ｆ2（＞Ｆ1）に設定された場合の合焦範囲Ｐ0よりも狭い範囲に設定される。以上の制御により、撮影レンズの絞り値が小さいほど被写界深度が縮小する現実の傾向が模擬される。 The image adjustment unit 424 controls the focus range P0 according to the virtual aperture value F. Specifically, as illustrated in FIG. 18, the focus range P0 when the virtual aperture value F is set to a value F1 is set to a narrower range than the focus range P0 when the virtual aperture value F is set to a value F2 (>F1). Through the above control, the real-world tendency for the depth of field to shrink as the aperture value of the shooting lens becomes smaller is simulated.

また、画像調整部４２４は、撮像距離Ｄnに対するぼかし量Ｂnを仮想絞り値Ｆに応じて制御する。具体的には、撮像距離Ｄnが同一の数値に設定された状況でも、仮想絞り値Ｆが数値Ｆ1に設定された場合のぼかし量Ｂnは、仮想絞り値Ｆが数値Ｆ2（＞Ｆ1）に設定された場合のぼかし量Ｂnを上回る。以上の制御により、撮影レンズの絞り値が小さいほど光学的なぼけの程度が大きいという現実の傾向が模擬される。 The image adjustment unit 424 also controls the blur amount Bn for the imaging distance Dn according to the virtual aperture value F. Specifically, even when the imaging distance Dn is set to the same value, the blur amount Bn when the virtual aperture value F is set to the value F1 exceeds the blur amount Bn when the virtual aperture value F is set to the value F2 (>F1). Through the above control, the actual tendency that the smaller the aperture value of the shooting lens, the greater the degree of optical blur is is simulated.

なお、以上の説明においては仮想絞り値Ｆに着目したが、ぼかし量Ｂnに影響する制御パラメータは仮想絞り値Ｆに限定されない。例えば、画像調整部４２４は、撮像距離Ｄnとぼかし量Ｂnとの相関を、仮想的な焦点距離（以下「仮想焦点距離」という）に応じて制御してもよい。仮想焦点距離は、例えば操作装置３４に対する利用者からの指示に応じて設定される。具体的には、画像調整部４２４は、仮想焦点距離が大きいほど、合焦範囲Ｐ0を縮小し、かつ、撮像距離Ｄnに対するぼかし量Ｂnを大きい数値に設定する。以上の形態によれば、撮影レンズの焦点距離が大きいほど、被写界深度が縮小し易く、かつ、光学的なぼけが増大し易いという現実の傾向が模擬される。 In the above description, the focus has been on the virtual aperture value F, but the control parameter that affects the blur amount Bn is not limited to the virtual aperture value F. For example, the image adjustment unit 424 may control the correlation between the imaging distance Dn and the blur amount Bn according to a virtual focal length (hereinafter referred to as "virtual focal length"). The virtual focal length is set according to an instruction from the user to the operation device 34, for example. Specifically, the image adjustment unit 424 reduces the focusing range P0 as the virtual focal length increases, and sets the blur amount Bn for the imaging distance Dn to a larger value. According to the above embodiment, the actual tendency that the depth of field is more likely to be reduced and the optical blur is more likely to be increased as the focal length of the photographing lens increases is simulated.

以上の例示から理解される通り、前述の各形態に例示した撮像距離Ｄnと、本変形例において例示した仮想絞り値Ｆおよび仮想焦点距離とは、ぼかし量Ｂnを制御するための制御パラメータとして包括的に表現される。制御パラメータは、以上に例示した種類の変数に限定されない。 As can be understood from the above examples, the imaging distance Dn exemplified in each of the above-mentioned embodiments and the virtual aperture value F and virtual focal length exemplified in this modified example are collectively expressed as control parameters for controlling the blur amount Bn. The control parameters are not limited to the types of variables exemplified above.

（３）撮像距離Ｄnとぼかし量Ｂnとの関係は、図７に例示した関係に限定されない。例えば、図１９に例示される通り、撮像距離Ｄnに対してぼかし量Ｂnが曲線的に変化する形態も想定される。また、図２０に例示される通り、撮像距離Ｄnに対してぼかし量Ｂnが変化しない合焦範囲Ｐ0は、省略されてもよい。 (3) The relationship between the imaging distance Dn and the blur amount Bn is not limited to the relationship illustrated in FIG. 7. For example, as illustrated in FIG. 19, a form in which the blur amount Bn changes in a curve with respect to the imaging distance Dn is also possible. Also, as illustrated in FIG. 20, the focus range P0 in which the blur amount Bn does not change with respect to the imaging distance Dn may be omitted.

図２１に例示される通り、撮像距離Ｄnとぼかし量Ｂnとの関係が、範囲Ｐaと範囲Ｐbとで相違する形態も想定される。図２１には、撮像距離Ｄnに対するぼかし量Ｂnの勾配が、範囲Ｐaと範囲Ｐbとで相違する場合が例示されている。図２１から理解される通り、ぼかし量Ｂnの数値範囲も範囲Ｐaと範囲Ｐbとで相違する。図２１の形態によれば、基準被写体Ｑrefの手前側と奥側とで被写体Ｑnのぼけの特性を相違させることが可能である。 As shown in FIG. 21, it is also possible to assume that the relationship between the imaging distance Dn and the blur amount Bn differs between ranges Pa and Pb. FIG. 21 illustrates a case in which the gradient of the blur amount Bn with respect to the imaging distance Dn differs between ranges Pa and Pb. As can be seen from FIG. 21, the numerical range of the blur amount Bn also differs between ranges Pa and Pb. According to the embodiment in FIG. 21, it is possible to make the blur characteristics of the subject Qn different between the front side and the back side of the reference subject Qref.

（４）前述の各形態においては、複数の撮像装置２１-1～２１-3において撮像条件が共通する場合を想定したが、撮像倍率等の撮像条件が撮像装置２１-n毎に相違する場合が想定される。また、各撮像装置２１-nの撮影レンズがズームレンズである形態では、撮像倍率（焦点距離）が撮像装置２１-n毎に個別に設定される場合も想定される。 (4) In each of the above embodiments, it is assumed that the imaging conditions are common to the multiple imaging devices 21-1 to 21-3, but it is also assumed that the imaging conditions, such as the imaging magnification, are different for each imaging device 21-n. Also, in an embodiment in which the shooting lens of each imaging device 21-n is a zoom lens, it is also assumed that the imaging magnification (focal length) is set individually for each imaging device 21-n.

各動画Ｖnにおける被写体画像Ｇnのサイズは、撮像距離Ｄnだけでなく撮像倍率等の撮像条件にも依存する。例えば、被写体Ｑn自体のサイズおよび撮像距離Ｄnが共通する場合でも、撮像装置２１-nの撮像倍率が大きいほど被写体画像Ｇnのサイズは増大する。複数の被写体画像Ｇ1～Ｇ3のサイズが自然な関係となるように、画像調整部４２４は、合成処理Ｓ2において撮像装置２１-n毎の撮像倍率の相違を補償する。すなわち、撮像倍率の相違に起因した被写体画像Ｇnのサイズの相違が低減される。 The size of the subject image Gn in each video Vn depends not only on the imaging distance Dn but also on imaging conditions such as imaging magnification. For example, even if the size of the subject Qn itself and the imaging distance Dn are the same, the size of the subject image Gn increases as the imaging magnification of the imaging device 21-n increases. In order to create a natural relationship between the sizes of the multiple subject images G1 to G3, the image adjustment unit 424 compensates for the differences in imaging magnification for each imaging device 21-n in the synthesis process S2. In other words, the differences in size of the subject image Gn caused by differences in imaging magnification are reduced.

具体的には、画像調整部４２４は、撮像倍率の逆比により各被写体画像Ｇnを拡大または縮小する。例えば、撮像装置２１-n1の撮像倍率が撮像装置２１-n2の撮像倍率の２倍である場合、画像調整部４２４は、被写体画像Ｇn1のサイズを変更せずに被写体画像Ｇn2のサイズを１/２倍に調整する。あるいは、画像調整部４２４は、被写体画像Ｇn2のサイズを変更せずに被写体画像Ｇn1のサイズを２倍に調整してもよい。以上の形態によれば、撮像装置２１-n毎の撮像条件の相違が補償され、結果的に自然な合成動画Ｖを生成できる。 Specifically, the image adjustment unit 424 enlarges or reduces each subject image Gn in inverse proportion to the imaging magnification. For example, if the imaging magnification of the imaging device 21-n1 is twice that of the imaging device 21-n2, the image adjustment unit 424 adjusts the size of the subject image Gn2 to 1/2 without changing the size of the subject image Gn1. Alternatively, the image adjustment unit 424 may adjust the size of the subject image Gn1 to 2 without changing the size of the subject image Gn2. According to the above embodiment, the difference in imaging conditions for each imaging device 21-n is compensated for, and as a result, a natural composite video V can be generated.

なお、画像調整部４２４が各撮像装置２１-nの撮像倍率を取得できない形態においては、画像調整部４２４は、撮像距離Ｄnに応じて各被写体画像Ｇnのサイズを調整してもよい。具体的には、画像調整部４２４は、各被写体画像Ｇnにおける距離指標のサイズが撮像距離Ｄnの逆比となるように、各被写体画像Ｇnのサイズを拡大または縮小する。例えば、撮像距離Ｄn2が撮像距離Ｄn1の２倍である場合、被写体画像Ｇn2における距離指標のサイズが被写体画像Ｇn1における距離指標のサイズの１/２倍となるように、被写体画像Ｇn1および被写体画像Ｇn2の一方または双方のサイズが調整される。 In a configuration in which the image adjustment unit 424 cannot acquire the imaging magnification of each imaging device 21-n, the image adjustment unit 424 may adjust the size of each subject image Gn according to the imaging distance Dn. Specifically, the image adjustment unit 424 enlarges or reduces the size of each subject image Gn so that the size of the distance index in each subject image Gn is inversely proportional to the imaging distance Dn. For example, when the imaging distance Dn2 is twice the imaging distance Dn1, the size of one or both of the subject images Gn1 and Gn2 is adjusted so that the size of the distance index in the subject image Gn2 is 1/2 the size of the distance index in the subject image Gn1.

（５）前述の各形態においては、各収録スタジオＲnとは別個の位置に動画合成システム３０が設置された形態を例示したが、収録スタジオＲnに動画合成システム３０が設置されてもよい。また、撮像装置２１-nに動画合成システム３０が搭載されてもよい。 (5) In each of the above embodiments, the video compositing system 30 is installed in a location separate from each recording studio Rn. However, the video compositing system 30 may be installed in the recording studio Rn. Also, the video compositing system 30 may be mounted on the imaging device 21-n.

（６）前述の各形態においては、動画Ｖnのうち特定色の背景に対応する領域を除去することで被写体画像Ｇnを抽出したが、動画Ｖnから被写体画像Ｇnを抽出する方法は、以上の例示に限定されない。例えば、被写体抽出部４２２は、公知の物体検出処理により動画Ｖnから被写体画像Ｇnを抽出してもよい。物体検出処理としては、例えば深層ニューラルネットワーク等の推定モデルを利用した物体検出、または背景差分法等の画像処理を利用した物体検出が例示される。以上の説明から理解される通り、収録スタジオＲnの背景は特定色である必要はない。 (6) In each of the above embodiments, the subject image Gn is extracted by removing the area of the video Vn that corresponds to the background of a specific color, but the method of extracting the subject image Gn from the video Vn is not limited to the above examples. For example, the subject extraction unit 422 may extract the subject image Gn from the video Vn by a known object detection process. Examples of the object detection process include object detection using an estimation model such as a deep neural network, or object detection using image processing such as a background difference method. As can be understood from the above explanation, the background of the recording studio Rn does not need to be a specific color.

（７）前述の各形態においては、基準被写体Ｑrefに対応する撮像距離Ｄnを基準値Ｄrefとして設定したが、基準値Ｄrefを設定する方法は、以上の例示に限定されない。例えば、操作装置３４に対する操作により利用者が任意に指示した数値が、基準値Ｄrefとして設定されてもよい。基準値Ｄrefは、利用者からの指示に応じて随時に変更される。画像調整部４２４は、利用者から指示された基準値Ｄrefをぼかし処理Ｓ24に適用する。また、進行が事前に計画されたイベントの配信コンテンツＣを制作する場合を想定すると、基準値Ｄrefの時系列が記憶装置３２に事前に記憶されてもよい。画像調整部４２４は、記憶装置３２から時系列に取得した基準値Ｄrefを順次にぼかし処理Ｓ24に適用する。 (7) In each of the above-mentioned embodiments, the imaging distance Dn corresponding to the reference subject Qref is set as the reference value Dref, but the method of setting the reference value Dref is not limited to the above examples. For example, a numerical value arbitrarily specified by the user through operation of the operation device 34 may be set as the reference value Dref. The reference value Dref is changed as needed in response to instructions from the user. The image adjustment unit 424 applies the reference value Dref specified by the user to the blurring process S24. In addition, assuming a case where distribution content C is produced for an event whose progress is planned in advance, the time series of the reference value Dref may be stored in the storage device 32 in advance. The image adjustment unit 424 sequentially applies the reference values Dref acquired in chronological order from the storage device 32 to the blurring process S24.

以上の例示から理解される通り、基準値Ｄrefは、基準被写体Ｑrefの撮像距離Ｄnに限定されない。すなわち、基準被写体Ｑrefの撮像距離Ｄnとは無関係に基準値Ｄrefが設定されてもよい。したがって、基準被写体Ｑrefの選択（被写体選択部４２３，Ｓ23）は、本開示において省略されてよい。 As can be understood from the above examples, the reference value Dref is not limited to the imaging distance Dn of the reference subject Qref. In other words, the reference value Dref may be set regardless of the imaging distance Dn of the reference subject Qref. Therefore, the selection of the reference subject Qref (subject selection unit 423, S23) may be omitted in this disclosure.

（８）前述の各形態においては、重畳処理Ｓ25において複数の被写体画像Ｇ1～Ｇ3が合成される形態を例示したが、複数の被写体画像Ｇ1～Ｇ3を合成する合成処理の過程において、調整処理が実行される段階は任意である。例えば、各被写体画像Ｇnについてぼかし処理Ｓ24や前後の調整等の調整処理が実行されてから、複数の被写体画像Ｇ1～Ｇ3が合成されてもよいし、複数の被写体画像Ｇ1～Ｇ3が合成されてから、合成動画Ｖにおける各被写体画像Ｇnについて調整処理が実行されてもよい。 (8) In each of the above embodiments, the multiple subject images G1 to G3 are synthesized in the superimposition process S25, but the stage at which the adjustment process is performed during the synthesis process of synthesizing the multiple subject images G1 to G3 is arbitrary. For example, the multiple subject images G1 to G3 may be synthesized after the blurring process S24 or adjustments such as front and rear adjustments are performed on each subject image Gn, or the multiple subject images G1 to G3 may be synthesized and then the adjustment process is performed on each subject image Gn in the synthetic video V.

（９）前述の各形態においては、被写体画像Ｇnに対するぼかし処理Ｓ24を例示したが、被写体画像Ｇnの画像パラメータを調整する加工処理は、以上に例示したぼかし処理Ｓ24に限定されない。例えば、明度（露出）、彩度、色相、コントラスト、明瞭度等の任意の画像パラメータを調整する画像処理が、「加工処理」として包括的に表現される。加工処理においては、被写体画像Ｇnに関する以上の画像パラメータが、撮像距離Ｄnに応じて調整される。 (9) In each of the above embodiments, blurring processing S24 for the subject image Gn is exemplified, but the processing for adjusting the image parameters of the subject image Gn is not limited to the blurring processing S24 exemplified above. For example, image processing for adjusting any image parameter such as brightness (exposure), saturation, hue, contrast, clarity, etc. is collectively expressed as "processing." In the processing, the above image parameters for the subject image Gn are adjusted according to the imaging distance Dn.

例えば、画像調整部４２４は、基準値Ｄrefと各被写体Ｑnの撮像距離Ｄnとの差異|Ｄref－Ｄn|が大きいほど、明度、彩度、コントラストまたは明瞭度等の画像パラメータを小さい数値に設定する。以上の形態においても、基準被写体Ｑrefが特に注目され易い合成動画Ｖを生成できる。また、基準値Ｄrefと撮像距離Ｄnとの差異|Ｄref－Ｄn|に応じて被写体画像Ｇnの色相を調整する形態も想定される。例えば、画像調整部４２４は、差異|Ｄref－Ｄn|が所定の範囲内にある場合に被写体画像Ｇnを特定の色相に調整する。 For example, the image adjustment unit 424 sets image parameters such as brightness, saturation, contrast, or clarity to smaller values as the difference |Dref-Dn| between the reference value Dref and the imaging distance Dn of each subject Qn increases. Even in the above embodiment, a composite video V can be generated in which the reference subject Qref is particularly likely to attract attention. In addition, a form in which the hue of the subject image Gn is adjusted according to the difference |Dref-Dn| between the reference value Dref and the imaging distance Dn is also envisioned. For example, the image adjustment unit 424 adjusts the subject image Gn to a specific hue when the difference |Dref-Dn| is within a predetermined range.

（１０）前述の各形態においては、収録スタジオＲnを現実空間として例示したが、現実空間は収録スタジオＲn等の屋内空間に限定されない。例えば、屋外空間等の現実空間内に収録システム２０-nが設置されてもよい。以上の説明から理解される通り、現実空間は、現実世界の実在する空間として定義され、屋内／屋外は不問である。 (10) In each of the above embodiments, the recording studio Rn is exemplified as a real space, but the real space is not limited to an indoor space such as the recording studio Rn. For example, the recording system 20-n may be installed in a real space such as an outdoor space. As can be understood from the above explanation, the real space is defined as an actual space in the real world, regardless of whether it is indoors or outdoors.

（１１）前述の各形態においては、複数の動画Ｖ1～Ｖ3の合成により合成動画Ｖを生成したが、さらに他の画像が合成されてもよい。例えば、被写体Ｑnがプレイするゲームの画面が、合成動画Ｖに合成されてもよい。また、収録スタジオＲn内に設置された撮像装置２１-n以外の撮像装置により撮像された動画が、合成動画Ｖに合成されてもよい。また、前述の各形態においては、複数の音声Ａ1～Ａ3の合成により合成音声Ａを生成したが、さらに他の音声が合成されてもよい。例えば、被写体Ｑnがプレイするゲームの音声が、合成音声Ａに合成されてもよい。 (11) In each of the above-described embodiments, the composite video V is generated by combining multiple videos V1 to V3, but other images may be further combined. For example, the screen of a game played by subject Qn may be combined with the composite video V. Also, a video captured by an imaging device other than imaging device 21-n installed in recording studio Rn may be combined with the composite video V. Also, in each of the above-described embodiments, the composite sound A is generated by combining multiple sounds A1 to A3, but other sounds may be further combined. For example, the sound of a game played by subject Qn may be combined with the composite sound A.

（１２）動画合成システム３０と収録システム２０-1との間の通信遅延と、動画合成システム３０と収録システム２０-2との間の通信遅延とが相違する場合、動画合成システム３０が取得する動画Ｖ1と動画Ｖ2とが時間的に相互に同期しない可能性がある。例えば、動画Ｖ1および動画Ｖ2の一方が他方に対して遅延する状況が想定される。以上の状況において、画像処理部４２（例えば画像調整部４２４）は、複数の動画Ｖ1～Ｖ3を時間的に相互に同期させてもよい。以上の形態によれば、複数の動画Ｖ1～Ｖ3の時間的なズレが低減された自然な合成動画Ｖを生成できる。 (12) When the communication delay between the video compositing system 30 and the recording system 20-1 differs from the communication delay between the video compositing system 30 and the recording system 20-2, the videos V1 and V2 acquired by the video compositing system 30 may not be synchronized in time with each other. For example, a situation may be assumed in which one of the videos V1 and V2 is delayed relative to the other. In the above situation, the image processing unit 42 (e.g., the image adjustment unit 424) may synchronize the multiple videos V1 to V3 with each other in time. According to the above embodiment, a natural composite video V can be generated in which the time lag between the multiple videos V1 to V3 is reduced.

（１３）第４実施形態においては、撮像方向および撮像倍率を撮像条件として例示したが、撮像制御部４５により制御される撮像条件は、以上の例示に限定されない。例えば、焦点位置（フォーカス）、絞り値（アイリス）、露光時間（シャッタースピード）、露出値またはホワイトバランス等、撮像範囲自体には影響しない条件も「撮像条件」には包含される。 (13) In the fourth embodiment, the imaging direction and imaging magnification are exemplified as imaging conditions, but the imaging conditions controlled by the imaging control unit 45 are not limited to the above examples. For example, the "imaging conditions" also include conditions that do not affect the imaging range itself, such as the focus position, aperture value (iris), exposure time (shutter speed), exposure value, or white balance.

（１４）前述の各形態においては３個の収録システム２０-1～２０-3を例示したが、収録システム２０-nの個数は任意である。例えば、２個の収録システム２０-nが設置された構成や、４個以上の収録システム２０-nが設置された構成にも、本開示は同様に適用される。 (14) In each of the above embodiments, three recording systems 20-1 to 20-3 are illustrated, but the number of recording systems 20-n is arbitrary. For example, this disclosure is similarly applicable to a configuration in which two recording systems 20-n are installed, or a configuration in which four or more recording systems 20-n are installed.

例えば、動画収録システム１００がＮ個（Ｎは２以上の自然数）の撮像装置２１-1～２１-Nを具備する構成を想定すると、Ｎ個の撮像装置２１-1～２１-Nから選択された１個の撮像装置２１-n1（ｎ1＝１～Ｎ）が本開示における「第１撮像装置」の一例であり、他の撮像装置２１-n2（ｎ2＝１～Ｎ，ｎ2≠ｎ1）が本開示における「第２撮像装置」の一例である。 For example, assuming that the video recording system 100 is configured to include N (N is a natural number equal to or greater than 2) imaging devices 21-1 to 21-N, one imaging device 21-n1 (n1 = 1 to N) selected from the N imaging devices 21-1 to 21-N is an example of a "first imaging device" in this disclosure, and the other imaging device 21-n2 (n2 = 1 to N, n2 ≠ n1) is an example of a "second imaging device" in this disclosure.

（１５）前述の各形態に係る動画合成システム３０の機能は、前述の通り、制御装置３１を構成する単数または複数のプロセッサと、記憶装置３２に記憶されたプログラムとの協働により実現される。以上に例示したプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号（transitory, propagating signal）を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記録媒体が、前述の非一過性の記録媒体に相当する。 (15) As described above, the functions of the video synthesis system 30 according to each of the above embodiments are realized by the cooperation of one or more processors constituting the control device 31 and the program stored in the storage device 32. The above-mentioned programs can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and a good example is an optical recording medium (optical disk) such as a CD-ROM, but also includes any known type of recording medium such as a semiconductor recording medium or a magnetic recording medium. Note that a non-transitory recording medium includes any recording medium except for a transient, propagating signal, and does not exclude volatile recording media. In addition, in a configuration in which a distribution device distributes a program via a communication network, the recording medium that stores the program in the distribution device corresponds to the non-transitory recording medium described above.

（１６）本開示における「第ｎ」（ｎは自然数）という記載は、各要素を表記上において区別するための形式的または便宜的な標識（ラベル）としてのみ使用され、如何なる実質的な意味も持たない。したがって、「第ｎ」という表記を根拠として、各要素の位置または処理の順序等が限定的に解釈される余地はない。 (16) In this disclosure, the term "nth" (n is a natural number) is used only as a formal or convenient label to distinguish each element in notation and does not have any substantive meaning. Therefore, there is no room for restrictive interpretation of the position of each element or the order of processing, etc., based on the term "nth".

［付記］
以上の記載から、例えば以下のように本開示の好適な態様が把握される。なお、各態様の理解を容易にするために、以下では、図面の符号を便宜的に括弧書で併記するが、本開示は図示の態様に限定されない。 [Additional Notes]
From the above description, for example, preferred aspects of the present disclosure can be understood as follows. In addition, in order to facilitate understanding of each aspect, reference numerals in the drawings are conveniently written in parentheses below, but the present disclosure is not limited to the illustrated aspects.

［付記１］
本開示のひとつの態様（付記１）に係る動画合成システム（３０）は、第１現実空間（Ｒn1）内の第１撮像装置（２１-n1）が撮像した第１動画（Ｖn1）と、第２現実空間（Ｒn2）内の第２撮像装置（２１-n2）が撮像した第２動画（Ｖn2）とを取得する動画取得部（４１）と、前記第１動画（Ｖn1）における第１被写体（Ｑn1）の画像（Ｇn1）と前記第２動画（Ｖn2）における第２被写体（Ｑn2）の画像（Ｇn2）とを含む合成動画（Ｖ）を生成する合成処理（Ｓ2）を実行する画像処理部（４２）とを具備し、前記合成処理（Ｓ2）は、前記第１撮像装置（２１-n1）の第１撮像距離（Ｄn1）と前記第２撮像装置（２１-n2）の第２撮像距離（Ｄn2）とに応じて前記第１被写体（Ｑn1）の画像（Ｇn1）と前記第２被写体（Ｑn2）の画像（Ｇn2）とを調整する調整処理（Ｓ24，Ｓ25）を含む。 [Appendix 1]
A moving image compositing system (30) according to one aspect (Supplementary Note 1) of the present disclosure includes a moving image acquisition unit (41) that acquires a first moving image (Vn1) captured by a first imaging device (21-n1) in a first real space (Rn1) and a second moving image (Vn2) captured by a second imaging device (21-n2) in a second real space (Rn2), and an image (Gn1) of a first subject (Qn1) in the first moving image (Vn1) and an image (Gn2) of a second subject (Qn2) in the second moving image (Vn2). and an image processing unit (42) that executes a synthesis process (S2) for generating a synthetic moving image (V) including an image (Gn1) of the first subject (Qn1) and an image (Gn2) of the second subject (Qn2), the synthesis process (S2) including adjustment processes (S24, S25) for adjusting an image (Gn1) of the first subject (Qn1) and an image (Gn2) of the second subject (Qn2) in accordance with a first imaging distance (Dn1) of the first imaging device (21-n1) and a second imaging distance (Dn2) of the second imaging device (21-n2).

以上の態様によれば、第１撮像装置（２１-n1）が撮像した第１動画（Ｖn1）と第２撮像装置（２１-n2）が撮像した第２動画（Ｖn2）とを合成する合成処理（Ｓ2）において、第１動画（Ｖn1）における第１被写体（Ｑn1）の画像（Ｇn1）と第２動画（Ｖn2）における第２被写体（Ｑn2）の画像（Ｇn2）とを、第１撮像距離（Ｄn1）と第２撮像距離（Ｄn2）とに応じて調整する調整処理（Ｓ24，Ｓ25）が実行される。すなわち、第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）の関係が、合成動画（Ｖ）における第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との関係に反映される。したがって、相異なる現実空間（Ｒn）に所在する複数の被写体（Ｑn）を含む自然な合成動画（Ｖ）を生成できる。 According to the above aspect, in the synthesis process (S2) for synthesizing the first video (Vn1) captured by the first imaging device (21-n1) and the second video (Vn2) captured by the second imaging device (21-n2), an adjustment process (S24, S25) is executed for adjusting the image (Gn1) of the first subject (Qn1) in the first video (Vn1) and the image (Gn2) of the second subject (Qn2) in the second video (Vn2) according to the first imaging distance (Dn1) and the second imaging distance (Dn2). That is, the relationship between the first imaging distance (Dn1) and the second imaging distance (Dn2) is reflected in the relationship between the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) in the synthetic video (V). Therefore, a natural synthetic video (V) including multiple subjects (Qn) located in different real spaces (Rn) can be generated.

「（第１／第２）現実空間（Ｒn）」は、現実の世界に実在する空間であり、仮想空間（Ｚ）と対比される概念である。 "(First/Second) Real Space (Rn)" is a space that actually exists in the real world, and is a concept that contrasts with virtual space (Z).

「（第１／第２）動画」は、複数の映像の時系列により構成される動的な画像である。第１動画（Ｖn1）と第２動画（Ｖn2）とは、例えば時間的に相互に並列に撮像される。すなわち、第１動画（Ｖn1）が撮像される期間と第２動画（Ｖn2）が撮像される期間とは時間軸上で相互に重複する。ただし、第１動画（Ｖn1）と第２動画（Ｖn2）とは相互に並列に撮像されなくてもよい。すなわち、第１動画（Ｖn1）が撮像される期間と第２動画（Ｖn2）が撮像される期間とは時間軸上で相互に重複しなくてもよい。 The "(first/second) video" is a dynamic image composed of a time series of multiple images. The first video (Vn1) and the second video (Vn2) are captured, for example, in parallel with each other in time. That is, the period during which the first video (Vn1) is captured and the period during which the second video (Vn2) is captured overlap with each other on the time axis. However, the first video (Vn1) and the second video (Vn2) do not have to be captured in parallel with each other. That is, the period during which the first video (Vn1) is captured and the period during which the second video (Vn2) is captured do not have to overlap with each other on the time axis.

「合成処理（Ｓ2）」は、第１動画（Ｖn1）における第１被写体（Ｑn1）の画像（Ｇn1）と第２動画（Ｖn2）における第２被写体（Ｑn2）の画像（Ｇn2）とを含む合成動画（Ｖ）を生成する任意の画像処理である。合成処理（Ｓ2）のなかで調整処理（Ｓ24，Ｓ25）が実行される段階は任意である。例えば、第１被写体（Ｑn1）の画像（Ｇn1）および第２被写体（Ｑn2）の画像（Ｇn2）の少なくとも一方について調整処理（Ｓ24，Ｓ25）が実行されてから、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とが合成されてもよいし、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とが合成されてから、第１被写体（Ｑn1）の画像（Ｇn1）および第２被写体（Ｑn2）の画像（Ｇn2）の少なくとも一方について調整処理（Ｓ24，Ｓ25）が実行されてもよい。また、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とを合成する過程において調整処理（Ｓ24，Ｓ25）（例えば後述の重畳処理（Ｓ25））が実行されてもよい。 The "composite process (S2)" is an arbitrary image process that generates a composite video (V) including an image (Gn1) of a first subject (Qn1) in a first video (Vn1) and an image (Gn2) of a second subject (Qn2) in a second video (Vn2). The stage at which the adjustment processes (S24, S25) are performed in the composite process (S2) is arbitrary. For example, the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) may be combined after the adjustment process (S24, S25) is performed on at least one of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2), or the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) may be combined before the adjustment process (S24, S25) is performed on at least one of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2). In addition, the adjustment process (S24, S25) (for example, the superimposition process (S25) described later) may be performed in the process of combining the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2).

「撮像距離（Ｄn）」は、（第１／第２）撮像装置と（第１／第２）被写体との間の距離である。撮像距離（Ｄn）を特定する方法は任意である。例えば、撮像距離（Ｄn）が事前に決定された既定値であれば、当該撮像距離（Ｄn）は記憶装置に事前に保存されてもよい。 The "imaging distance (Dn)" is the distance between the (first/second) imaging device and the (first/second) subject. Any method can be used to determine the imaging distance (Dn). For example, if the imaging distance (Dn) is a preset value determined in advance, the imaging distance (Dn) may be stored in advance in a storage device.

撮像装置が撮像した動画の解析により撮像距離（Ｄn）が推定されてもよい。例えば、被写体に付加された既定の距離指標（マーカー）について動画内のサイズを解析することで、撮像距離（Ｄn）が推定される。また、既知のサイズの被写体（例えば出演者）について、顔認識または骨格認識等の認識技術により動画内のサイズを解析することで、撮像距離（Ｄn）が推定される。 The imaging distance (Dn) may be estimated by analyzing a video captured by an imaging device. For example, the imaging distance (Dn) is estimated by analyzing the size of a predetermined distance indicator (marker) added to a subject in the video. Also, the imaging distance (Dn) is estimated for a subject of known size (e.g., a performer) by analyzing the size of the subject in the video using recognition technology such as face recognition or skeletal recognition.

撮像装置による撮像時に測距装置が撮像距離（Ｄn）を測定する環境では、測距装置が測定した撮像距離（Ｄn）が取得される。測距装置は、例えば赤外光または紫外光等の測距光を利用した光学的なセンサである。測距装置は、例えばＬｉＤＡＲ（Light Detection and Ranging）機能または自動焦点機能を具備する。なお、測距装置は、撮像装置に搭載されてもよいし、撮像装置とは別体に設置されてもよい。 In an environment where a ranging device measures the imaging distance (Dn) when an image is captured by an imaging device, the imaging distance (Dn) measured by the ranging device is acquired. The ranging device is an optical sensor that uses ranging light such as infrared light or ultraviolet light. The ranging device has, for example, a LiDAR (Light Detection and Ranging) function or an autofocus function. The ranging device may be mounted on the imaging device or installed separately from the imaging device.

「調整処理（Ｓ24，Ｓ25）」は、撮像距離（Ｄn）に応じて被写体の画像を調整する画像処理である。「撮像距離（Ｄn）に応じて」とは、合成画像における被写体の画像が撮像距離（Ｄn）に依存する関係を意味する。すなわち、例えば撮像距離（Ｄn）が相異なる第１値と第２値とに設定され得る場合を想定すると、撮像距離（Ｄn）が第１値である場合の調整処理（Ｓ24，Ｓ25）後の画像と、撮像距離（Ｄn）が第２値である場合の調整処理（Ｓ24，Ｓ25）後の画像とが相違することを意味する。ただし、撮像距離（Ｄn）が変化しても画像が変化しない場合はあり得る。 The "adjustment process (S24, S25)" is an image process that adjusts the image of the subject according to the imaging distance (Dn). "Depending on the imaging distance (Dn)" means that the image of the subject in the composite image depends on the imaging distance (Dn). That is, assuming that the imaging distance (Dn) can be set to a first value and a second value that are different from each other, for example, this means that the image after the adjustment process (S24, S25) when the imaging distance (Dn) is the first value will be different from the image after the adjustment process (S24, S25) when the imaging distance (Dn) is the second value. However, it is possible that the image will not change even if the imaging distance (Dn) changes.

「複数の撮像装置」は、第１撮像装置（２１-n1）および第２撮像装置（２１-n2）以外の１以上の撮像装置を含んでもよい。すなわち、画像処理部（４２）は、相異なる撮像装置により撮像された３個以上の動画の合成により合成動画（Ｖ）を生成してもよい。「第１動画（Ｖn1）」は、複数の動画から選択されたひとつの動画であり、「第２動画（Ｖn2）」は、複数の動画のうち第１動画（Ｖn1）以外のひとつの動画である。すなわち、合成動画（Ｖ）は、第１動画（Ｖn1）および第２動画（Ｖn2）のみの合成により生成されてもよいし、第１動画（Ｖn1）および第２動画（Ｖn2）と他の１以上の動画との合成により生成されてもよい。 The "multiple imaging devices" may include one or more imaging devices other than the first imaging device (21-n1) and the second imaging device (21-n2). That is, the image processing unit (42) may generate a composite video (V) by combining three or more videos captured by different imaging devices. The "first video (Vn1)" is one video selected from the multiple videos, and the "second video (Vn2)" is one video among the multiple videos other than the first video (Vn1). That is, the composite video (V) may be generated by combining only the first video (Vn1) and the second video (Vn2), or may be generated by combining the first video (Vn1) and the second video (Vn2) with one or more other videos.

［付記２］
付記１の具体例（付記２）において、前記調整処理（Ｓ24，Ｓ25）は、前記第１撮像距離（Ｄn1）と前記第２撮像距離（Ｄn2）とに応じて前記第１被写体（Ｑn1）の画像パラメータと前記第２被写体（Ｑn2）の画像パラメータとを調整する加工処理を含む。以上の態様においては、第１被写体（Ｑn1）の画像パラメータと第２被写体（Ｑn2）の画像パラメータとが、第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）に応じて調整される。したがって、相異なる空間に所在する複数の被写体（Ｑn）を含む自然な合成動画（Ｖ）を生成できる。 [Appendix 2]
In a specific example (Supplementary Note 2) of Supplementary Note 1, the adjustment process (S24, S25) includes a processing process for adjusting the image parameters of the first subject (Qn1) and the image parameters of the second subject (Qn2) according to the first imaging distance (Dn1) and the second imaging distance (Dn2). In the above aspect, the image parameters of the first subject (Qn1) and the image parameters of the second subject (Qn2) are adjusted according to the first imaging distance (Dn1) and the second imaging distance (Dn2). Therefore, a natural composite video (V) including multiple subjects (Qn) located in different spaces can be generated.

「加工処理」は、第１撮像距離（Ｄn1）と第２撮像距離（Ｄn2）とに応じて第１被写体（Ｑn1）の画像パラメータと第２被写体（Ｑn2）の画像パラメータとを調整する任意の画像処理である。例えば、第１撮像距離（Ｄn1）と第２撮像距離（Ｄn2）との差異が大きいほど、第１被写体（Ｑn1）の画像パラメータと第２被写体（Ｑn2）の画像パラメータとの差異を増加させるような画像処理が、「調整処理（Ｓ24，Ｓ25）」として例示される。ただし、例えば第１撮像距離（Ｄn1）と第２撮像距離（Ｄn2）とが相互に近似または一致する場合に、第１被写体（Ｑn1）の画像パラメータと第２被写体（Ｑn2）の画像パラメータとが相互に一致することはあり得る。 The "processing process" is any image processing that adjusts the image parameters of the first subject (Qn1) and the second subject (Qn2) according to the first imaging distance (Dn1) and the second imaging distance (Dn2). For example, the "adjustment process (S24, S25)" is an example of an image processing that increases the difference between the image parameters of the first subject (Qn1) and the image parameters of the second subject (Qn2) the greater the difference between the first imaging distance (Dn1) and the second imaging distance (Dn2). However, for example, when the first imaging distance (Dn1) and the second imaging distance (Dn2) are close to or coincident with each other, it is possible that the image parameters of the first subject (Qn1) and the image parameters of the second subject (Qn2) coincide with each other.

「画像パラメータ」は、画像の視覚的な特性を制御するための任意のパラメータである。例えば、ぼかし量（Ｂn）、明度（露出）、彩度、色相、コントラスト、明瞭度等の任意の特性値が「画像パラメータ」として例示される。相異なる複数のパラメータの組合せを「画像パラメータ」と解釈してもよい。 An "image parameter" is any parameter for controlling the visual characteristics of an image. For example, any characteristic value such as the amount of blur (Bn), brightness (exposure), saturation, hue, contrast, and clarity are examples of "image parameters." A combination of multiple different parameters may also be interpreted as an "image parameter."

［付記３］
付記２の具体例（付記３）において、前記加工処理は、画像をぼかすぼかし処理（Ｓ24）を含み、前記画像パラメータは、ぼかし量（Ｂn）である。以上の態様によれば、第１被写体（Ｑn1）の画像（Ｇn1）におけるぼかし量（Ｂn）と第２被写体（Ｑn2）の画像（Ｇn2）におけるぼかし量（Ｂn）とが第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）に応じて相違するように、ぼかし処理（Ｓ24）が実行される。撮像距離（Ｄn）に応じて光学的なぼけの度合が変化する現実の撮像の傾向が模擬された自然な合成動画（Ｖ）を生成できる。 [Appendix 3]
In a specific example (Supplementary Note 3) of Supplementary Note 2, the processing includes a blurring process (S24) for blurring an image, and the image parameter is a blurring amount (Bn). According to the above aspect, the blurring process (S24) is executed so that the blurring amount (Bn) in the image (Gn1) of the first object (Qn1) and the blurring amount (Bn) in the image (Gn2) of the second object (Qn2) differ depending on the first imaging distance (Dn1) and the second imaging distance (Dn2). It is possible to generate a natural composite moving image (V) that simulates the tendency of real imaging in which the degree of optical blur changes depending on the imaging distance (Dn).

「ぼかし処理（Ｓ24）」は、画像をぼかす画像処理である。例えば、動画内の各画素の画素値を、当該画素を含む所定の範囲内の平均値に置換する平滑処理が、ぼかし処理（Ｓ24）の一例である。ぼかし量（Ｂn）は、ぼかし処理（Ｓ24）により画像がぼける程度を制御する画像パラメータである。例えば、画素値の平均値が算定される範囲のサイズを規定する画像パラメータが、「ぼかし量（Ｂn）」として例示される。 The "blurring process (S24)" is an image process that blurs an image. For example, a smoothing process that replaces the pixel value of each pixel in a video with the average value within a predetermined range that includes the pixel is an example of the blurring process (S24). The blur amount (Bn) is an image parameter that controls the degree to which the image is blurred by the blurring process (S24). For example, the image parameter that specifies the size of the range in which the average pixel value is calculated is exemplified as the "blur amount (Bn)".

第１撮像距離（Ｄn1）と第２撮像距離（Ｄn2）とが相違する場合には、第１被写体（Ｑn1）の画像（Ｇn1）のぼかし量（Ｂn）と第２被写体（Ｑn2）の画像（Ｇn2）のぼかし量（Ｂn）とが相異なる数値に設定される。第１被写体（Ｑn1）の画像（Ｇn1）および第２被写体（Ｑn2）の画像（Ｇn2）の双方に対してぼかし処理（Ｓ24）が実行されてもよいし、第１被写体（Ｑn1）の画像（Ｇn1）および第２被写体（Ｑn2）の画像（Ｇn2）の一方のみにぼかし処理（Ｓ24）が実行されてもよい。 When the first imaging distance (Dn1) and the second imaging distance (Dn2) are different, the blur amount (Bn) of the image (Gn1) of the first subject (Qn1) and the blur amount (Bn) of the image (Gn2) of the second subject (Qn2) are set to different values. The blurring process (S24) may be performed on both the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2), or the blurring process (S24) may be performed on only one of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2).

［付記４］
付記３の具体例（付記４）において、前記第１撮像距離（Ｄn1）は、相異なる第１距離（Ｄa1，Ｄb1）および第２距離（Ｄa2，Ｄb2）に設定可能であり、前記第２距離（Ｄa2，Ｄb2）と基準値（Ｄref）との差異は、前記第１距離（Ｄa1，Ｄb1）と基準値（Ｄref）との差異を上回り、前記画像処理部（４２）は、前記第１撮像距離（Ｄn1）が前記第１距離（Ｄa1，Ｄb1）である場合に、前記第１被写体（Ｑn1）の画像（Ｇn1）に関するぼかし量（Ｂn）を第１設定値（Ｂa1，Ｂb1）に設定し、前記第１撮像距離（Ｄn1）が前記第２距離（Ｄa2，Ｄb2）である場合に、前記第１被写体（Ｑn1）の画像（Ｇn1）に関するぼかし量（Ｂn）を、前記第１設定値（Ｂa1，Ｂb1）を上回る第２設定値（Ｂa2，Ｂb2）に設定する。以上の態様によれば、第１撮像距離（Ｄn1）と基準値（Ｄref）との差異が増加するほど、第１被写体（Ｑn1）の画像（Ｇn1）のぼかし処理（Ｓ24）に適用されるぼかし量（Ｂn）が増加する。したがって、基準値（Ｄref）に対応する地点に位置する合焦面から奥行方向（前後方向）に離間するほど被写体の光学的なぼけが増加するという現実の撮像の傾向が忠実に模擬された自然な合成動画（Ｖ）を生成できる。 [Appendix 4]
In a specific example (Supplementary Note 4) of Supplementary Note 3, the first imaging distance (Dn1) can be set to a first distance (Da1, Db1) and a second distance (Da2, Db2) that are different from each other, a difference between the second distance (Da2, Db2) and a reference value (Dref) exceeds a difference between the first distance (Da1, Db1) and the reference value (Dref), and the image processing unit (42) determines that the first imaging distance (Dn1) is equal to or greater than the first distance (Da When the first imaging distance (Dn1) is the second distance (Da2, Db2), the blur amount (Bn) for the image (Gn1) of the first object (Qn1) is set to a first set value (Ba1, Bb1), and when the first imaging distance (Dn1) is the second distance (Da2, Db2), the blur amount (Bn) for the image (Gn1) of the first object (Qn1) is set to a second set value (Ba2, Bb2) that exceeds the first set value (Ba1, Bb1). According to the above aspect, the blur amount (Bn) applied to the blurring process (S24) of the image (Gn1) of the first object (Qn1) increases as the difference between the first imaging distance (Dn1) and the reference value (Dref) increases. Therefore, it is possible to generate a natural composite moving image (V) that faithfully simulates the tendency of real imaging, in which the optical blur of the object increases as the object moves away in the depth direction (front-back direction) from the focal plane located at the point corresponding to the reference value (Dref).

「基準値（Ｄref）」は、撮像距離（Ｄn）がとり得る特定の数値である。例えば第２撮像距離（Ｄn2）が「基準値（Ｄref）」として設定される。以上の形態によれば、第２被写体（Ｑn2）に合焦し、かつ第１被写体（Ｑn1）はぼけた合成動画（Ｖ）を生成できる。撮像距離（Ｄn）が、基準値（Ｄref）を含む所定の範囲内の数値である場合、被写体のぼかし量（Ｂn）をゼロに設定してもよい。以上の形態によれば、被写界深度内の要素には全体に合焦するという光学的な傾向を模擬できる。 The "reference value (Dref)" is a specific numerical value that the imaging distance (Dn) can take. For example, the second imaging distance (Dn2) is set as the "reference value (Dref)". According to the above embodiment, a composite video (V) can be generated in which the second subject (Qn2) is in focus and the first subject (Qn1) is blurred. If the imaging distance (Dn) is a numerical value within a predetermined range including the reference value (Dref), the subject blur amount (Bn) may be set to zero. According to the above embodiment, the optical tendency of the elements within the depth of field to be entirely in focus can be simulated.

［付記５］
付記４の具体例（付記５）において、前記基準値（Ｄref）は、前記第１動画（Ｖn1）および前記第２動画（Ｖn2）を含む複数の動画にそれぞれ対応する複数の被写体（Ｑn）の何れかである基準被写体（Ｑref）に対応する撮像距離（Ｄn）である。以上の態様によれば、複数の被写体（Ｑn）の何れか（基準被写体（Ｑref））に対応する撮像距離（Ｄn）を基準値（Ｄref）として第１被写体（Ｑn1）の画像（Ｇn1）に関するぼかし量（Ｂn）が設定される。したがって、基準被写体（Ｑref）を基準として各被写体の画像のぼかし量（Ｂn）が設定された自然な合成動画（Ｖ）を生成できる。また、複数の被写体（Ｑn）のうち基準被写体（Ｑref）が特に注目され易い合成動画（Ｖ）を生成できる。 [Appendix 5]
In a specific example (Supplementary Note 5) of Supplementary Note 4, the reference value (Dref) is an imaging distance (Dn) corresponding to a reference subject (Qref) which is one of a plurality of subjects (Qn) corresponding to a plurality of moving images including the first moving image (Vn1) and the second moving image (Vn2). According to the above aspect, the imaging distance (Dn) corresponding to one of the plurality of subjects (Qn) (reference subject (Qref)) is set as the reference value (Dref) to set the blur amount (Bn) for the image (Gn1) of the first subject (Qn1). Therefore, a natural composite moving image (V) can be generated in which the blur amount (Bn) of the image of each subject is set based on the reference subject (Qref). In addition, a composite moving image (V) can be generated in which the reference subject (Qref) among the plurality of subjects (Qn) is particularly likely to attract attention.

「基準被写体（Ｑref）」は、相異なる動画に対応する複数の被写体（Ｑn）から選択された１以上の被写体である。基準被写体（Ｑref）を選択するための条件は任意である。例えば、複数の被写体（Ｑn）のうち利用者が選択した被写体が基準被写体（Ｑref）とされる。なお、「複数の動画」は、現実空間（Ｒn）内で実際に撮像された動画のほか、仮想空間（Ｚ）を表す仮想動画が含まれてもよい。仮想動画は、仮想空間（Ｚ）内の仮想オブジェクト（Ｏm）を仮想撮像装置により撮像した動画である。基準被写体（Ｑref）は、現実空間（Ｒn）内の被写体と仮想空間（Ｚ）内の仮想オブジェクト（Ｏm）とを含む複数の被写体（Ｑn）から選択される。 The "reference subject (Qref)" is one or more subjects selected from a plurality of subjects (Qn) corresponding to different videos. The conditions for selecting the reference subject (Qref) are arbitrary. For example, a subject selected by a user from a plurality of subjects (Qn) is set as the reference subject (Qref). Note that the "plurality of videos" may include not only videos actually captured in the real space (Rn) but also virtual videos representing the virtual space (Z). The virtual video is a video captured by a virtual imaging device of a virtual object (Om) in the virtual space (Z). The reference subject (Qref) is selected from a plurality of subjects (Qn) including a subject in the real space (Rn) and a virtual object (Om) in the virtual space (Z).

［付記６］
付記５の具体例（付記６）において、前記画像処理部（４２）は、前記複数の動画または前記複数の動画にそれぞれ対応する複数の音声を解析した結果に応じて、前記複数の被写体（Ｑn）から前記基準被写体（Ｑref）を選択する。以上の態様によれば、各被写体の動画または音声を解析した結果に応じて複数の被写体（Ｑn）から基準被写体（Ｑref）が選択される。したがって、利用者による指示を必要とせずに、合成動画（Ｖ）において視聴者が特に注目すべき適切な被写体を基準被写体（Ｑref）として選択できる。 [Appendix 6]
In a specific example (Supplementary Note 6) of Supplementary Note 5, the image processing unit (42) selects the reference subject (Qref) from the multiple subjects (Qn) according to a result of analyzing the multiple videos or the multiple sounds corresponding to the multiple videos. According to the above aspect, the reference subject (Qref) is selected from the multiple subjects (Qn) according to a result of analyzing the video or sound of each subject. Therefore, an appropriate subject that the viewer should pay particular attention to in the composite video (V) can be selected as the reference subject (Qref) without the need for instructions from the user.

各動画を解析する処理の具体的な内容、および解析の結果を被写体の選択に適用する条件は任意である。例えば、ひとつの態様において、複数の被写体（Ｑn）のうち時間的な変化が大きい被写体（すなわち動作中の被写体）が基準被写体（Ｑref）として選択される。また、例えば各動画が音声を含む形態においては、複数の被写体（Ｑn）のうち音声の音量が大きい動画に対応する被写体（すなわち発言中の被写体）が基準被写体（Ｑref）として選択される。 The specific content of the process for analyzing each video and the conditions for applying the results of the analysis to the selection of the subject are arbitrary. For example, in one embodiment, a subject that changes significantly over time among multiple subjects (Qn) (i.e., a subject in motion) is selected as the reference subject (Qref). In another embodiment, in which each video includes audio, a subject that corresponds to a video with a louder audio volume among multiple subjects (Qn) (i.e., a subject speaking) is selected as the reference subject (Qref).

［付記７］
付記１から付記６の何れかの具体例（付記７）において、前記調整処理（Ｓ24，Ｓ25）は、前記第１被写体（Ｑn1）の画像（Ｇn1）と前記第２被写体（Ｑn2）の画像（Ｇn2）とを重畳する重畳処理（Ｓ25）を含み、前記重畳処理（Ｓ25）においては、前記第１被写体（Ｑn1）の画像（Ｇn1）と前記第２被写体（Ｑn2）の画像（Ｇn2）との前後を、前記第１撮像距離（Ｄn1）および前記第２撮像距離（Ｄn2）に応じて制御する。以上の態様によれば、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との前後が第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）に応じて制御される。したがって、第１被写体（Ｑn1）および第２被写体（Ｑn2）の現実の位置が第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との前後に反映され他自然な合成動画（Ｖ）を生成できる。 [Appendix 7]
In any specific example (Supplementary Note 7) of Supplementary Note 1 to Supplementary Note 6, the adjustment process (S24, S25) includes a superimposition process (S25) for superimposing an image (Gn1) of the first subject (Qn1) and an image (Gn2) of the second subject (Qn2), and in the superimposition process (S25), the front and rear of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) is controlled according to the first imaging distance (Dn1) and the second imaging distance (Dn2). According to the above aspect, the front and rear of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) is controlled according to the first imaging distance (Dn1) and the second imaging distance (Dn2). Therefore, the actual positions of the first subject (Qn1) and the second subject (Qn2) are reflected in front of and behind the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2), thereby generating a natural composite moving image (V).

「重畳処理（Ｓ25）」は、複数の被写体（Ｑn）を重ねる画像処理である。重畳処理（Ｓ25）により、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とが部分的に重複する。第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との「前後」は、第１被写体（Ｑn1）の画像（Ｇn1）の背後に第２被写体（Ｑn2）の画像（Ｇn2）が配置されるのか、第２被写体（Ｑn2）の画像（Ｇn2）の背後の第１被写体（Ｑn1）の画像（Ｇn1）が配置されるのかという位置関係である。 "Superimposition process (S25)" is image processing that superimposes multiple subjects (Qn). Superimposition process (S25) causes the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) to partially overlap. The "front and back" between the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) refers to the positional relationship of whether the image (Gn2) of the second subject (Qn2) is placed behind the image (Gn1) of the first subject (Qn1), or whether the image (Gn1) of the first subject (Qn1) is placed behind the image (Gn2) of the second subject (Qn2).

「第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との前後を第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）に応じて制御する」とは、合成動画（Ｖ）における第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との前後が、第１撮像距離（Ｄn1）および第２撮像距離（Ｄn2）に依存することを意味する。例えば、第１撮像距離（Ｄn1）が第２撮像距離（Ｄn2）を下回る場合には、第１被写体（Ｑn1）の画像（Ｇn1）が第２被写体（Ｑn2）の画像（Ｇn2）の手前に位置するように第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とが重畳され、第２撮像距離（Ｄn2）が第１撮像距離（Ｄn1）を下回る場合には、第２被写体（Ｑn2）の画像（Ｇn2）が第１被写体（Ｑn1）の画像（Ｇn1）の手前に位置するように第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）とが重畳される。 "The foreground and background between the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) is controlled according to the first imaging distance (Dn1) and the second imaging distance (Dn2)" means that the foreground and background between the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) in the composite video (V) depends on the first imaging distance (Dn1) and the second imaging distance (Dn2). For example, when the first imaging distance (Dn1) is less than the second imaging distance (Dn2), the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) are superimposed so that the image (Gn1) of the first subject (Qn1) is located in front of the image (Gn2) of the second subject (Qn2), and when the second imaging distance (Dn2) is less than the first imaging distance (Dn1), the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) are superimposed so that the image (Gn2) of the second subject (Qn2) is located in front of the image (Gn1) of the first subject (Qn1).

以上の例示から理解される通り、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との「前後」は、画像（Ｇn1）と画像（Ｇn2）との表示における優先度とも観念される。例えば、画像（Ｇn1）が画像（Ｇn2）の前方に表示された状態は、画像（Ｇn1）が画像（Ｇn2）に優先して表示された状態とも表現され、画像（Ｇn2）が画像（Ｇn1）の前方に表示された状態は、画像（Ｇn2）が画像（Ｇn1）に優先して表示された状態とも表現される。 As can be understood from the above examples, the "front and back" of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) can also be thought of as the priority in displaying the images (Gn1) and (Gn2). For example, a state in which the image (Gn1) is displayed in front of the image (Gn2) can also be expressed as a state in which the image (Gn1) is displayed in priority over the image (Gn2), and a state in which the image (Gn2) is displayed in front of the image (Gn1) can also be expressed as a state in which the image (Gn2) is displayed in priority over the image (Gn1).

また、第１被写体（Ｑn1）の画像（Ｇn1）と第２被写体（Ｑn2）の画像（Ｇn2）との「前後」は、画像（Ｇn1）および画像（Ｇn2）の各々の表示／非表示の区別とも観念される。例えば、画像（Ｇn1）が画像（Ｇn2）の前方に表示された状態は、画像（Ｇn1）のうち画像（Ｇn2）に重複する部分が表示され、かつ、画像（Ｇn2）のうち画像（Ｇn1）に重複する部分が非表示とされた状態とも表現される。同様に、例えば、画像（Ｇn2）が画像（Ｇn1）の前方に表示された状態は、画像（Ｇn2）のうち画像（Ｇn1）に重複する部分が表示され、画像（Ｇn1）のうち画像（Ｇn2）に重複する部分が非表示とされた状態とも表現される。 In addition, the "front and back" of the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) can also be thought of as a distinction between display/non-display of the image (Gn1) and the image (Gn2). For example, a state in which the image (Gn1) is displayed in front of the image (Gn2) can also be expressed as a state in which the part of the image (Gn1) that overlaps with the image (Gn2) is displayed, and the part of the image (Gn2) that overlaps with the image (Gn1) is not displayed. Similarly, for example, a state in which the image (Gn2) is displayed in front of the image (Gn1) can also be expressed as a state in which the part of the image (Gn2) that overlaps with the image (Gn1) is displayed, and the part of the image (Gn1) that overlaps with the image (Gn2) is not displayed.

［付記８］
付記１から付記７の何れかの具体例（付記８）において、前記画像処理部（４２）は、前記合成処理（Ｓ2）において、前記第１被写体（Ｑn1）の画像（Ｇn1）および前記第２被写体（Ｑn2）の画像（Ｇn2）と、仮想空間（Ｚ）において仮想撮像装置により撮像された仮想オブジェクト（Ｏm）とを含む前記合成動画（Ｖ）を生成し、前記調整処理（Ｓ24，Ｓ25）において、前記仮想撮像装置の仮想撮像距離（Ｅm）に応じて前記仮想オブジェクト（Ｏm）を調整する。以上の態様によれば、現実空間（Ｒn）内において撮像された被写体（第１被写体（Ｑn1）および第２被写体（Ｑn2））の画像と仮想空間（Ｚ）内の仮想オブジェクト（Ｏm）とが重畳される。したがって、現実の被写体だけでなく仮想オブジェクト（Ｏm）を含む多様な合成動画（Ｖ）を生成できる。しかも、仮想撮像距離（Ｅm）に応じて仮想オブジェクト（Ｏm）が調整されるから、仮想オブジェクト（Ｏm）が各被写体と同じ空間に所在するような自然な合成動画（Ｖ）を生成できる。 [Appendix 8]
In any of the specific examples (Supplementary Note 8) of Supplementary Note 1 to Supplementary Note 7, the image processing unit (42) generates the composite video (V) including the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) and a virtual object (Om) captured by a virtual imaging device in a virtual space (Z) in the synthesis process (S2), and adjusts the virtual object (Om) according to the virtual imaging distance (Em) of the virtual imaging device in the adjustment process (S24, S25). According to the above aspect, the images of the subjects (first subject (Qn1) and second subject (Qn2)) captured in the real space (Rn) are superimposed on the virtual object (Om) in the virtual space (Z). Therefore, a variety of composite videos (V) including not only real subjects but also virtual objects (Om) can be generated. Moreover, since the virtual object (Om) is adjusted according to the virtual imaging distance (Em), a natural composite moving image (V) can be generated in which the virtual object (Om) appears to exist in the same space as each subject.

「仮想空間（Ｚ）」は、画像処理等の各種の情報処理により設定される仮想的な空間であり、現実空間（Ｒn）と対比される概念である。仮想オブジェクト（Ｏm）は、仮想空間（Ｚ）内に存在するオブジェクトであり、仮想空間（Ｚ）内の仮想撮像装置により撮像される。「仮想撮像装置」は、仮想空間（Ｚ）内に設置された仮想的な撮像装置である。仮想撮像装置による撮像条件は固定／可変の何れでもよい。 "Virtual space (Z)" is a virtual space that is set by various types of information processing such as image processing, and is a concept that is contrasted with real space (Rn). A virtual object (Om) is an object that exists in the virtual space (Z) and is imaged by a virtual imaging device in the virtual space (Z). A "virtual imaging device" is a virtual imaging device installed in the virtual space (Z). The imaging conditions by the virtual imaging device may be either fixed or variable.

仮想オブジェクト（Ｏm）の画像に対する「調整処理（Ｓ24，Ｓ25）」の内容は任意である。例えば、前述の各形態について例示した加工処理または重畳処理（Ｓ25）等を含む「調整処理（Ｓ24，Ｓ25）」が、第１被写体（Ｑn1）の画像（Ｇn1）および第２被写体（Ｑn2）の画像（Ｇn2）だけでなく仮想オブジェクト（Ｏm）の画像に対しても実行される。 The content of the "adjustment process (S24, S25)" for the image of the virtual object (Om) is arbitrary. For example, the "adjustment process (S24, S25)" including the processing process or superimposition process (S25) exemplified in each of the above-mentioned embodiments is performed not only on the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) but also on the image of the virtual object (Om).

［付記９］
付記１から付記８の何れかの具体例（付記９）に係る動画合成システム（３０）は、前記第１撮像装置（２１-n1）と前記第２撮像装置（２１-n2）とを共通の撮像条件のもとで時間的に相互に並列に動作させる撮像制御部（４５）をさらに具備する。以上の態様によれば、第１撮像装置（２１-n1）による第１動画（Ｖn1）の撮像と第２撮像装置（２１-n2）による第２動画（Ｖn2）の撮像とが共通の撮像条件のもとで実行される。したがって、第１現実空間（Ｒn1）内の被写体と第２現実空間（Ｒn2）内の被写体とが恰も共通の空間内に所在するかのように視聴者に知覚され得る自然な合成動画（Ｖ）を生成できる。 [Appendix 9]
The moving image compositing system (30) according to any one of the specific examples (Supplementary Note 9) of Supplementary Note 1 to Supplementary Note 8 further includes an imaging control unit (45) that operates the first imaging device (21-n1) and the second imaging device (21-n2) in parallel with each other in time under common imaging conditions. According to the above aspect, the first imaging device (21-n1) captures the first moving image (Vn1) and the second imaging device (21-n2) captures the second moving image (Vn2) under common imaging conditions. Therefore, it is possible to generate a natural composite moving image (V) that can be perceived by a viewer as if the subject in the first real space (Rn1) and the subject in the second real space (Rn2) were located in a common space.

「撮像条件」は、（第１／第２）撮像装置による動画の撮像に関する条件である。具体的には、撮像方向または撮像倍率等、撮像範囲を規定する条件が「撮像条件」として例示される。撮像方向は、例えばパン方向（左右方向）またはチルト方向（上下方向）である。撮像倍率は、例えばズーム倍率である。ただし、例えば焦点位置（フォーカス）、絞り値（アイリス）、露出値、ホワイトバランス等、撮像範囲自体には影響しない条件も「撮像条件」には包含される。「撮像条件」は、例えば撮像制御部（４５）による制御の対象となる動作条件とも表現される。 "Imaging conditions" are conditions related to the imaging of video by the (first/second) imaging device. Specifically, conditions that define the imaging range, such as the imaging direction or imaging magnification, are exemplified as "imaging conditions". The imaging direction is, for example, the pan direction (left/right direction) or the tilt direction (up/down direction). The imaging magnification is, for example, the zoom magnification. However, "imaging conditions" also include conditions that do not affect the imaging range itself, such as the focus position, aperture value (iris), exposure value, and white balance. "Imaging conditions" are also expressed as operating conditions that are subject to control by, for example, the imaging control unit (45).

「共通の撮像条件のもとで動作させる」とは、第１撮像装置（２１-n1）による撮像条件と第２撮像装置（２１-n2）による撮像条件とが完全に一致する場合のほか、第１撮像装置（２１-n1）による撮像条件と第２撮像装置（２１-n2）による撮像条件とが実質的に一致する場合も含む。「撮像条件が実質的に一致する場合」とは、例えば、撮像条件の相違が視聴者に知覚されない程度に第１撮像装置（２１-n1）と第２撮像装置（２１-n2）との間で撮像条件が近似する場合、または、第１撮像装置（２１-n1）による撮像条件と第２撮像装置（２１-n2）による撮像条件との差異が、両者間の特性の相違（例えば製造誤差に起因した相違）を原因とする程度の微小な差異である場合である。 "Operating under common imaging conditions" includes cases where the imaging conditions by the first imaging device (21-n1) and the imaging conditions by the second imaging device (21-n2) are completely the same, as well as cases where the imaging conditions by the first imaging device (21-n1) and the imaging conditions by the second imaging device (21-n2) are substantially the same. "When the imaging conditions are substantially the same" refers to cases where the imaging conditions between the first imaging device (21-n1) and the second imaging device (21-n2) are similar to each other to the extent that the difference in the imaging conditions is not perceived by the viewer, or cases where the difference between the imaging conditions by the first imaging device (21-n1) and the imaging conditions by the second imaging device (21-n2) is so small that it is caused by differences in characteristics between the two (for example, differences due to manufacturing errors).

例えば、第１撮像装置（２１-n1）と第２撮像装置（２１-n2）とが同機種である場合を想定すると、撮像条件を指示する共通の制御データを第１撮像装置（２１-n1）および第２撮像装置（２１-n2）の双方に送信する結果として、第１撮像装置（２１-n1）と第２撮像装置（２１-n2）とは共通の撮像条件のもとで動作する。また、第１撮像装置（２１-n1）と第２撮像装置（２１-n2）とが別機種である場合を想定すると、両機種間の撮像動作の相違が解消されるように第１撮像装置（２１-n1）と第２撮像装置（２１-n2）とに別個の制御データを送信する結果として、第１撮像装置（２１-n1）と第２撮像装置（２１-n2）とは共通の撮像条件のもとで動作する。 For example, assuming that the first imaging device (21-n1) and the second imaging device (21-n2) are of the same model, common control data instructing the imaging conditions is transmitted to both the first imaging device (21-n1) and the second imaging device (21-n2), and as a result, the first imaging device (21-n1) and the second imaging device (21-n2) operate under common imaging conditions. Also, assuming that the first imaging device (21-n1) and the second imaging device (21-n2) are of different models, separate control data is transmitted to the first imaging device (21-n1) and the second imaging device (21-n2) so that the difference in imaging operation between the two models is eliminated, and as a result, the first imaging device (21-n1) and the second imaging device (21-n2) operate under common imaging conditions.

［付記１０］
本開示のひとつの態様（付記１０）に係る動画合成方法は、第１現実空間（Ｒn1）内の第１撮像装置（２１-n1）が撮像した第１動画（Ｖn1）と、第２現実空間（Ｒn2）内の第２撮像装置（２１-n2）が撮像した第２動画（Ｖn2）とを取得し、前記第１動画（Ｖn1）における第１被写体（Ｑn1）の画像（Ｇn1）と前記第２動画（Ｖn2）における第２被写体（Ｑn2）の画像（Ｇn2）とを含む合成動画（Ｖ）を生成する合成処理（Ｓ2）を実行し、前記合成処理（Ｓ2）は、前記第１撮像装置（２１-n1）の第１撮像距離（Ｄn1）と前記第２撮像装置（２１-n2）の第２撮像距離（Ｄn2）とに応じて前記第１被写体（Ｑn1）の画像（Ｇn1）と前記第２被写体（Ｑn2）の画像（Ｇn2）とを調整する調整処理（Ｓ24，Ｓ25）を含む。 [Appendix 10]
A moving image compositing method according to one aspect (Supplementary Note 10) of the present disclosure includes acquiring a first moving image (Vn1) captured by a first imaging device (21-n1) in a first real space (Rn1) and a second moving image (Vn2) captured by a second imaging device (21-n2) in a second real space (Rn2), and performing a compositing process (S2) to generate a composite moving image (V) including an image (Gn1) of a first subject (Qn1) in the first moving image (Vn1) and an image (Gn2) of a second subject (Qn2) in the second moving image (Vn2), the compositing process (S2) including an adjustment process (S24, S25) to adjust the image (Gn1) of the first subject (Qn1) and the image (Gn2) of the second subject (Qn2) in accordance with a first imaging distance (Dn1) of the first imaging device (21-n1) and a second imaging distance (Dn2) of the second imaging device (21-n2).

［付記１１］
本開示のひとつの態様（付記１１）に係るプログラムは、第１現実空間（Ｒn1）内の第１撮像装置（２１-n1）が撮像した第１動画（Ｖn1）と、第２現実空間（Ｒn2）内の第２撮像装置（２１-n2）が撮像した第２動画（Ｖn2）とを取得する動画取得部（４１）、および、前記第１動画（Ｖn1）における第１被写体（Ｑn1）の画像（Ｇn1）と前記第２動画（Ｖn2）における第２被写体（Ｑn2）の画像（Ｇn2）とを含む合成動画（Ｖ）を生成する合成処理（Ｓ2）を実行する画像処理部（４２）、としてコンピュータシステムを機能させるプログラムであって、前記合成処理（Ｓ2）は、前記第１撮像装置（２１-n1）の第１撮像距離（Ｄn1）と前記第２撮像装置（２１-n2）の第２撮像距離（Ｄn2）とに応じて前記第１被写体（Ｑn1）の画像（Ｇn1）と前記第２被写体（Ｑn2）の画像（Ｇn2）とを調整する調整処理（Ｓ24，Ｓ25）を含む。 [Appendix 11]
A program according to one aspect (Supplementary Note 11) of the present disclosure includes a video acquisition unit (41) that acquires a first video (Vn1) captured by a first imaging device (21-n1) in a first real space (Rn1) and a second video (Vn2) captured by a second imaging device (21-n2) in a second real space (Rn2), and an image (Gn1) of a first subject (Qn1) in the first video (Vn1) and an image (Gn2) of a second subject (Qn2) in the second video (Vn2). A program that causes a computer system to function as an image processing unit (42) that executes a synthesis process (S2) to generate a composite moving image (V), the synthesis process (S2) including an adjustment process (S24, S25) that adjusts an image (Gn1) of the first subject (Qn1) and an image (Gn2) of the second subject (Qn2) in accordance with a first imaging distance (Dn1) of the first imaging device (21-n1) and a second imaging distance (Dn2) of the second imaging device (21-n2).

１００…動画収録システム、２００…端末装置、２０-n（２０-1～２０-3）…収録システム、２１-n（２１-1～２１-3）…撮像装置、２２-n（２２-1～２２-3）…収音装置、２３-n（２３-1～２３-3）…通信装置、３０…動画合成システム、３１…制御装置、３２…記憶装置、３３…通信装置、３４…操作装置、３５…再生装置、４１…動画取得部、４２…画像処理部、４３…音声処理部、４４…出力処理部、４５…撮像制御部、４２１…距離特定部、４２２…被写体抽出部、４２３…被写体選択部、４２４…画像調整部、４２５…画像生成部。 100...video recording system, 200...terminal device, 20-n (20-1 to 20-3)...recording system, 21-n (21-1 to 21-3)...imaging device, 22-n (22-1 to 22-3)...sound collection device, 23-n (23-1 to 23-3)...communication device, 30...video synthesis system, 31...control device, 32...storage device, 33...communication device, 34...operation device, 35...playback device, 41...video acquisition unit, 42...image processing unit, 43...audio processing unit, 44...output processing unit, 45...imaging control unit, 421...distance determination unit, 422...subject extraction unit, 423...subject selection unit, 424...image adjustment unit, 425...image generation unit.

Claims

a video acquisition unit that acquires a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space;
an image processing unit that executes a synthesis process to generate a synthetic moving image including an image of a first subject in the first moving image and an image of a second subject in the second moving image,
The video compositing system, wherein the compositing process includes an adjustment process of adjusting the image of the first subject and the image of the second subject in accordance with a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.

The moving image compositing system according to claim 1 , wherein the adjustment process includes a processing process for adjusting image parameters of the first object and image parameters of the second object in accordance with the first imaging distance and the second imaging distance.

The processing includes a blurring process for blurring an image,
The moving image synthesis system according to claim 2 , wherein the image parameter is an amount of blurring.

the first imaging distance is settable to a first distance and a second distance that are different from each other;
a difference between the second distance and a reference value that is greater than a difference between the first distance and a reference value;
4. The video compositing system of claim 3, wherein the image processing unit sets an amount of blur for the image of the first subject to a first setting value when the first imaging distance is the first distance, and sets an amount of blur for the image of the first subject to a second setting value that is greater than the first setting value when the first imaging distance is the second distance.

The moving image compositing system according to claim 4 , wherein the reference value is an imaging distance corresponding to a reference subject which is any one of a plurality of subjects corresponding to each of a plurality of moving images including the first moving image and the second moving image.

The moving image compositing system according to claim 5 , wherein the image processing unit selects the reference subject from the plurality of subjects according to a result of analyzing the plurality of moving images or a plurality of sounds respectively corresponding to the plurality of moving images.

the adjustment process includes a superimposition process of superimposing the image of the first subject and the image of the second subject,
The moving image synthesis system according to claim 1 , wherein in the superimposing process, a front-back position between the image of the first subject and the image of the second subject is controlled in accordance with the first imaging distance and the second imaging distance.

The image processing unit includes:
In the synthesis process, the synthetic video is generated including the image of the first subject and the image of the second subject, and a virtual object captured by a virtual imaging device in a virtual space;
The video compositing system according to claim 1 , wherein the adjustment process adjusts the virtual object in accordance with a virtual imaging distance of the virtual imaging device.

2. The moving image synthesis system according to claim 1, further comprising an imaging control unit that causes the first imaging device and the second imaging device to operate in parallel with each other in time under common imaging conditions.

acquiring a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space;
executing a synthesis process for generating a synthetic moving image including an image of a first subject in the first moving image and an image of a second subject in the second moving image;
The moving image compositing method is realized by a computer system, wherein the compositing process includes an adjustment process of adjusting the image of the first subject and the image of the second subject according to a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.

a video acquisition unit that acquires a first video captured by a first imaging device in a first real space and a second video captured by a second imaging device in a second real space; and
an image processing unit that executes a synthesis process to generate a synthetic moving image including an image of a first subject in the first moving image and an image of a second subject in the second moving image;
A program for causing a computer system to function as
The synthesis process includes an adjustment process of adjusting the image of the first subject and the image of the second subject in accordance with a first imaging distance of the first imaging device and a second imaging distance of the second imaging device.