JP4971813B2

JP4971813B2 - Video generation apparatus and video generation program

Info

Publication number: JP4971813B2
Application number: JP2007021903A
Authority: JP
Inventors: 仁博冨山; 祐一岩舘
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2007-01-31
Filing date: 2007-01-31
Publication date: 2012-07-11
Anticipated expiration: 2027-01-31
Also published as: JP2008187678A

Description

本発明は、映像生成装置及び映像生成プログラムに係り、特に時間経過に伴って動く被写体を映像として高精度に提示するための映像生成装置及び映像生成プログラムに関する。 The present invention relates to a video generation device and a video generation program, and more particularly, to a video generation device and a video generation program for presenting a subject moving with time as a video with high accuracy.

近年、映画やＣＭ等において、被写体を取り囲むように配置した複数台のカメラで同期撮影した映像を利用し、被写体が空中に跳びあがった瞬間の映像をカメラの配列に従って切り替えて提示することで、あたかも１台のカメラがその空中に浮いた被写体の周りを移動しながら撮影したかのような映像を表示する技術が盛んに用いられている。 In recent years, in movies, commercials, etc., by using images taken synchronously with multiple cameras arranged so as to surround the subject, switching and presenting the image of the moment when the subject jumped in the air according to the camera arrangement, Techniques for displaying a video as if one camera was moving while moving around a subject floating in the air are actively used.

このような映像表示技術は、多視点映像表示技術といい、例えばスポーツ中継等に応用され、試合の流れや選手の動きを多視点映像によって伝える技術として知られている（例えば、特許文献１、非特許文献１等参照。）。 Such video display technology is referred to as multi-view video display technology, which is applied to, for example, sports broadcasts and the like, and is known as a technology that conveys the flow of a game and the movement of a player using multi-view video (for example, Patent Document 1, (See Non-Patent Document 1, etc.).

一方で、１台のカメラで撮影した映像から、映像フレーム毎に動く被写体領域を抽出し、それらを背景画像に合成表示することで、時間経過に沿った被写体の動きを１枚の映像で表現することができるマルチモーション映像表示技術も用いられており、例えばスポーツ中継等において選手の動きをわかり易く解説する手法として知られている（例えば、特許文献２、非特許文献２、３等参照。）。
特開２００６−１１５２９８号公報伊佐憲一他４名，「最新スポーツ中継技術世界初！プロ野球中継におけるＥｙｅＶｉｓｏｎＴＭ（アイビジョン）の活用」，放送技術，兼六館出版，２００１年１１月ｐ．９６−ｐ．１０５．特開２００６−５４５２号公報李他２名、「体操選手のフォーム・軌跡表示システム」、映像情報メディア学会誌、Ｖｏｌ．５１Ｎｏ．１１ｐ．１８８１−ｐ．１８８７（１９９７年）．加藤他３名，「スーパーマルチモーション」，映像情報メディア学会誌，Ｖｏｌ．５７Ｎｏ．８ｐ．９３６−ｐ．９３８（２００３年）． On the other hand, by extracting the subject area that moves for each video frame from the video captured by one camera and combining and displaying it on the background image, the movement of the subject over time can be expressed as a single video. Multi-motion video display technology that can be used is also used, and is known as a method for easily explaining the movement of a player in sports broadcasting, for example (see, for example, Patent Document 2, Non-Patent Documents 2, 3, etc.). .
JP 2006-115298 A Kenichi Isa and four others, “Latest sports broadcast technology, the world's first! Utilizing EyeVison ™ (eye vision) in professional baseball broadcasts”, Broadcast Technology, Kenrokukan Publishing, November 2001, p. 96-p. 105. Japanese Unexamined Patent Publication No. 2006-5542 Lee et al., “Gymnastic player form / trajectory display system”, Journal of the Institute of Image Information and Television Engineers, Vol. 51 No. 11 p. 1881-p. 1887 (1997). Kato et al., “Super Multi Motion”, Journal of the Institute of Image Information and Television Engineers, Vol. 57 No. 8 p. 936-p. 938 (2003).

ところで、上述した従来手法は、多視点映像表示技術又はマルチモーション映像表示技術のうち、何れか一方を用いた被写体の表示技術である。しかしながら、時間経過に伴って動く被写体を映像として高精度に提示するためには、時間経過に沿った被写体の動きを複数の視点から見ることができる多視点マルチモーション映像を提示することが好ましいが、このような映像提示技術に関してはこれまで前例がない。 By the way, the above-described conventional method is a subject display technology using one of a multi-view video display technology and a multi-motion video display technology. However, in order to present a moving subject with time as a video with high accuracy, it is preferable to present a multi-view multi-motion video in which the movement of the subject along the passage of time can be seen from a plurality of viewpoints. There has been no precedent for such video presentation technology.

更に、違和感のないマルチモーション映像を生成するためには、撮影するカメラと被写体の位置関係を考慮し、映像フレーム毎に抽出した被写体領域画像を所定の背景画像に合成する順序を決定する必要があるが、これらを自動的に制御するための手法も提案されていない。 Furthermore, in order to generate a multi-motion video without a sense of incongruity, it is necessary to determine the order in which the subject area image extracted for each video frame is combined with a predetermined background image in consideration of the positional relationship between the camera to be photographed and the subject. There are no proposed methods for automatically controlling these.

本発明は、上述した問題点に鑑みなされたものであり、時間経過に伴って動く被写体を映像として高精度に提示するための映像生成装置及び映像生成プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a video generation apparatus and a video generation program for presenting a subject that moves with time as a video with high accuracy.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、複数の撮像手段により撮影された動く被写体の映像から予め設定された条件に基づいて所定の映像を生成する映像生成装置において、出力する映像の内容を設定する出力映像設定手段と、前記出力映像設定手段により得られる設定条件に基づいて、前記複数の撮像手段により撮影した映像から選択された映像の被写体領域を抽出する被写体領域抽出手段と、前記被写体領域抽出手段により抽出された被写体領域を所定の背景画像に合成する画像合成手段と、前記出力映像設定手段により得られる設定条件に基づいて、前記複数の撮像手段により撮影した映像から選択された映像を所定の順序で切り替えて多視点映像を生成する多視点映像生成手段と、前記画像合成手段及び前記多視点映像生成手段により得られる映像から出力映像を生成する出力映像生成手段とを有することを特徴とする。 According to the first aspect of the present invention, in a video generation device that generates a predetermined video based on a preset condition from videos of a moving subject photographed by a plurality of imaging means, the content of the video to be output is set. Output video setting means; subject area extraction means for extracting a subject area of a video selected from videos taken by the plurality of imaging means based on setting conditions obtained by the output video setting means; and subject area extraction Image synthesis means for synthesizing the subject area extracted by the means with a predetermined background image, and a video selected from the video captured by the plurality of imaging means based on a setting condition obtained by the output video setting means. Obtained by the multi-view video generation means for generating the multi-view video by switching in the above order, the image synthesis means and the multi-view video generation means. And having an output image generation means for generating an output image from the video.

請求項１記載の発明によれば、時間経過に伴って動く被写体を映像として高精度に提示することができる。 According to the first aspect of the present invention, it is possible to present a subject that moves with time as a video with high accuracy.

請求項２に記載された発明は、前記画像合成手段は、前記被写体領域抽出手段により抽出された複数の被写体領域画像を、所定の順序で前記背景画像に合成することを特徴とする。 The invention described in claim 2 is characterized in that the image synthesizing unit synthesizes a plurality of subject region images extracted by the subject region extracting unit with the background image in a predetermined order.

請求項２記載の発明によれば、所定の順序に従って画像を合成するため、効率的に高精度な合成画像を生成することができる。 According to the second aspect of the present invention, since the images are synthesized according to a predetermined order, a highly accurate synthesized image can be efficiently generated.

請求項３に記載された発明は、前記画像合成手段は、前記複数の撮像手段におけるカメラパラメータにより得られる前記撮像手段から被写体領域までの距離情報に基づいた順序で前記背景画像に合成することを特徴とする。 According to a third aspect of the present invention, the image synthesizing unit synthesizes the background image in an order based on distance information from the imaging unit to a subject area obtained by camera parameters in the plurality of imaging units. Features.

請求項３記載の発明によれば、各撮像手段と被写体までの距離を高精度に取得することができるため、例えば被写体が同一画像上で重なる場合に、距離情報に基づいた順序で高精度に合成画像を生成することができる。また、合成する順序を自動化することができ、映像生成の運用性を向上させることができる。 According to the third aspect of the present invention, since the distance to each imaging means and the subject can be obtained with high accuracy, for example, when the subjects overlap on the same image, with high accuracy in the order based on the distance information. A composite image can be generated. Further, the order of synthesis can be automated, and the operability of video generation can be improved.

請求項４に記載された発明は、前記撮像手段は、前記出力映像設定手段により設定される出力映像に使用する映像を撮影する撮像手段からの映像のみを取得することを特徴とする。 The invention described in claim 4 is characterized in that the imaging means obtains only the video from the imaging means for capturing the video used for the output video set by the output video setting means.

請求項４記載の発明によれば、無駄な映像を蓄積することがなく、記録容量を削減することができる。また、必要な映像のみを効率的に取得することができる。 According to the invention described in claim 4, it is possible to reduce the recording capacity without accumulating useless video. In addition, only necessary images can be acquired efficiently.

請求項５に記載された発明は、前記画像合成手段は、前記出力映像設定手段により得られる設定条件に基づいて、同一の画像に合成する被写体の数が異なる画像を生成することを特徴とする。 The invention described in claim 5 is characterized in that the image synthesizing means generates images having different numbers of subjects to be synthesized with the same image, based on setting conditions obtained by the output video setting means. .

請求項５記載の発明によれば、多種の多視点マルチモーション画像を生成することができる。これにより、例えばスポーツ選手等の被写体の動きをよりわかり易く表現した斬新な映像を提示することができる。したがって、より高精度な出力映像を提供することができる。 According to the fifth aspect of the present invention, various multi-view multi-motion images can be generated. This makes it possible to present a novel video that more easily expresses the movement of a subject such as a sports player. Therefore, it is possible to provide a more accurate output video.

請求項６に記載された発明は、複数の撮像手段により撮影された動く被写体の映像から予め設定された条件に基づいて所定の映像を生成する映像生成処理をコンピュータに実行させるための映像生成プログラムにおいて、出力する映像の内容を設定する出力映像設定ステップと、前記出力映像設定ステップにより得られる設定条件に基づいて、前記複数の撮像手段により撮影した映像から選択された映像の被写体領域を抽出する被写体領域抽出ステップと、前記被写体領域抽出ステップにより抽出された被写体領域を所定の背景画像に合成する画像合成ステップと、前記出力映像設定ステップにより得られる設定条件に基づいて、前記複数の撮像手段により撮影した映像から選択された映像を所定の順序で切り替えて多視点映像を生成する多視点映像生成ステップと、前記画像合成ステップ及び前記多視点映像生成ステップにより得られる映像から出力映像を生成する出力映像生成ステップとをコンピュータに実行させる。 According to a sixth aspect of the present invention, there is provided a video generation program for causing a computer to execute a video generation process for generating a predetermined video based on a preset condition from videos of a moving subject photographed by a plurality of imaging means. , An output video setting step for setting the content of the video to be output, and a subject area of the video selected from the video captured by the plurality of imaging means based on the setting conditions obtained by the output video setting step Based on the setting conditions obtained by the subject region extraction step, the image composition step of combining the subject region extracted by the subject region extraction step with a predetermined background image, and the output video setting step, by the plurality of imaging means A multi-viewpoint video is generated by switching a selected video from a captured video in a predetermined order. And the point image generating step, to perform the output image generation step of generating an output image from said image synthesizing step and video obtained by the multi-view image generation step to the computer.

請求項６記載の発明によれば、時間経過に伴って動く被写体を映像として高精度に提示することができる。また、実行プログラムをコンピュータにインストールすることにより、容易に映像生成を実現することができる。 According to the sixth aspect of the present invention, it is possible to present a subject that moves with time as a video with high accuracy. In addition, video generation can be easily realized by installing an execution program in a computer.

本発明によれば、時間経過に伴って動く被写体を映像として高精度に提示することができる。 According to the present invention, it is possible to present a subject that moves with time as a video with high accuracy.

＜本発明の概要＞
本発明では、多視点映像表示と、マルチモーション映像表示を組み合わせることで、予め設定された生成すべき映像の形態に基づいて制御を行い、時間経過に沿った被写体の動きを複数の視点から見ることができる多視点マルチモーション映像を提示する。これにより、例えばスポーツ選手等の被写体の動きをよりわかり易く表現した斬新な映像を提示することができる。 <Outline of the present invention>
In the present invention, by combining multi-view video display and multi-motion video display, control is performed based on a preset video format to be generated, and the movement of the subject over time is viewed from a plurality of viewpoints. Multi-view multi-motion video that can be presented. This makes it possible to present a novel video that more easily expresses the movement of a subject such as a sports player.

また、本発明では、複数台のカメラで被写体を撮影した映像と、複数台のカメラのカメラパラメータを利用して、各カメラから動く被写体までの距離情報を計算し、この距離情報に基づいた順序で、カメラ毎に抽出した複数の映像フレームの被写体領域画像を所定の背景画像に合成する。これにより、例えば被写体が同一画像上で重なる場合に、距離情報に基づいた順序で高精度に合成画像を生成することができる。また、合成する順序を自動化することができ、映像生成の運用性を向上させることができる。 Further, in the present invention, the distance information from each camera to the moving subject is calculated using images obtained by photographing the subject with a plurality of cameras and the camera parameters of the plurality of cameras, and the order based on the distance information is calculated. Thus, the subject area images of a plurality of video frames extracted for each camera are combined with a predetermined background image. Thereby, for example, when subjects overlap on the same image, a composite image can be generated with high accuracy in the order based on the distance information. Further, the order of synthesis can be automated, and the operability of video generation can be improved.

＜実施の形態＞
次に、上述した特徴を有する本発明における映像生成装置及び映像生成プログラムを好適に実施した形態について、図面を用いて説明する。 <Embodiment>
Next, a preferred embodiment of the video generation apparatus and video generation program according to the present invention having the above-described features will be described with reference to the drawings.

＜映像生成装置：機能構成例＞
まず、本発明における映像生成装置の機能構成例について図を用いて説明する。図１は、本発明における映像生成装置の一構成例を示す図である。 <Video generation device: functional configuration example>
First, a functional configuration example of a video generation device according to the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration example of a video generation apparatus according to the present invention.

図１に示す映像生成装置１０は、撮像手段１１と、入力手段１２と、出力手段１３と、蓄積手段１４と、出力映像設定手段１５と、被写体領域抽出手段１６と
画像合成手段１７と、多視点映像生成手段１８と、出力映像生成手段１９と、送受信手段２０と、制御手段２１とを有するよう構成されている。 1 includes an imaging unit 11, an input unit 12, an output unit 13, a storage unit 14, an output video setting unit 15, a subject area extraction unit 16, an image synthesis unit 17, A viewpoint video generation unit 18, an output video generation unit 19, a transmission / reception unit 20, and a control unit 21 are configured.

撮像手段１１は、何らかの動作を行う被写体を撮影し、映像を取得する。また、撮像手段１１は、取得した映像を蓄積手段１４に蓄積する。ここで、撮像手段１１は、例えば複数の高精細カメラや市販のデジタルビデオカメラ等を用い、所定の位置に固定されているものとする。また、複数のカメラで撮影する場合には、それぞれが異なる位置から被写体の動きを撮影する。 The imaging unit 11 captures a subject that performs some operation and acquires an image. The imaging unit 11 stores the acquired video in the storage unit 14. Here, it is assumed that the imaging unit 11 is fixed at a predetermined position using, for example, a plurality of high-definition cameras, commercially available digital video cameras, or the like. When shooting with a plurality of cameras, the movement of the subject is shot from different positions.

更に、撮像手段１１は、撮影したときの時間情報や、撮影したときのカメラの撮影位置、撮影方向、フォーカス調整等のカメラパラメータ等を取得する。 Further, the imaging unit 11 acquires time information at the time of photographing, camera parameters such as the photographing position, photographing direction, and focus adjustment of the camera at the time of photographing.

なお、撮像手段１１は、出力映像設定手段１５からの設定条件に基づいて、出力映像に使用する映像を撮影する撮像手段からの映像のみを取得するようにしてもよい。これにより、無駄な映像を蓄積することがなく、記録容量を削減することができる。また、必要な映像のみを効率的に取得することができる。 Note that the imaging unit 11 may acquire only the video from the imaging unit that captures the video used for the output video based on the setting condition from the output video setting unit 15. Thereby, it is possible to reduce the recording capacity without accumulating useless video. In addition, only necessary images can be acquired efficiently.

入力手段１２は、撮像手段１１から得られる何らかの動作を行う被写体の映像の取得指示や、蓄積手段１４から所定のデータを読み出したり、書き込むための指示、どのような映像を出力するかの設定、映像からの被写体領域の抽出、映像の合成、出力映像の生成等における各指示等の入力を受け付ける。なお、入力手段１２は、例えばキーボードやマウス等のポインティングデバイス、マイク等の音声入力インターフェース等からなる。 The input means 12 is an instruction to acquire an image of a subject that performs some operation obtained from the imaging means 11, an instruction to read or write predetermined data from the storage means 14, a setting of what kind of video is output, It accepts inputs such as instructions for extracting a subject area from a video, synthesizing a video, and generating an output video. The input unit 12 includes a pointing device such as a keyboard and a mouse, a voice input interface such as a microphone, and the like.

また、出力手段１３は、入力手段１２により入力された指示内容や指示内容に基づいて各構成より得られる実行された出力映像設定結果、被写体領域抽出結果、画像合成結果、多視点映像生成結果、出力映像、撮像手段１１により撮影された被写体映像等の各種データを表示及び／又は音声にて出力する。なお、出力手段１３は、ディスプレイやスピーカ等からなる。 Further, the output unit 13 includes an executed output video setting result, a subject region extraction result, an image synthesis result, a multi-view video generation result obtained from each component based on the instruction content input by the input unit 12 and the instruction content, Various data such as output video and subject video taken by the image pickup means 11 are displayed and / or output as audio. Note that the output means 13 includes a display, a speaker, and the like.

蓄積手段１４は、撮像手段１１により得られる映像や、出力映像設定手段１５により得られる出力映像設定結果、被写体領域抽出手段１６により得られる被写体領域抽出結果、画像合成手段１７により得られる合成画像、多視点映像生成手段１８により得られる多視点映像データ等の各種データを蓄積する。 The accumulating unit 14 includes a video obtained by the imaging unit 11, an output video setting result obtained by the output video setting unit 15, a subject area extraction result obtained by the subject region extracting unit 16, a composite image obtained by the image synthesizing unit 17, Various data such as multi-view video data obtained by the multi-view video generation means 18 is stored.

出力映像設定手段１５は、撮像手段１１により得られた映像に基づいて、どのような映像を出力するかの設定を行う。具体的には、時間情報やカメラパラメータ等に基づいて、どの時点（フレーム）で合成画像や多視点映像を用いるか、また所定時間毎に合成する被写体の数や合成する背景画像の設定等を行う。これにより、複数のカメラを用いて多視点マルチモーション映像を実現することができる。つまり、複数のカメラを用いて、様々な視点から被写体を見ることができるため、例えばスポーツ中継や運動解析等の分野において、視覚的にわかり易い運動情報を提示することができる。 The output video setting unit 15 sets what video is output based on the video obtained by the imaging unit 11. Specifically, based on time information, camera parameters, etc., at which time point (frame) the composite image or multi-viewpoint video is used, the number of subjects to be composited every predetermined time, the setting of the background image to be composited, etc. Do. Thereby, a multi-view multi-motion video can be realized using a plurality of cameras. In other words, since a subject can be seen from various viewpoints using a plurality of cameras, it is possible to present exercise information that is visually easy to understand in fields such as sports relay and exercise analysis.

被写体領域抽出手段１６は、複数のカメラから撮影された複数のカメラ映像から動く被写体領域を抽出した被写体領域画像を生成する。なお、被写体領域抽出手段１６における抽出手法については後述する。 The subject area extraction means 16 generates a subject area image obtained by extracting a moving subject area from a plurality of camera images photographed from a plurality of cameras. Note that an extraction method in the subject area extraction unit 16 will be described later.

画像合成手段１７は、被写体領域抽出手段１６において抽出された被写体領域と、カメラ毎の所定の背景画像とを利用して、カメラ毎に得られる映像に対して、所定の映像フレームの背景画像に複数の映像フレームの被写体領域画像を合成したマルチモーション画像を生成する。なお、どの映像フレームを背景画像として合成を行うかについては、上述した出力映像設定手段１５において、設定された最終的な多視点マルチモーション映像の出力設定条件に基づいて、必要な合成画像を生成する。なお、画像合成手段１７における合成処理の具体例については後述する。また、画像合成手段１７において合成された画像は、蓄積手段１４に一時的に蓄積される。 The image composition means 17 uses the subject area extracted by the subject area extraction means 16 and a predetermined background image for each camera, and converts the video obtained for each camera into a background image of a predetermined video frame. A multi-motion image is generated by combining subject area images of a plurality of video frames. As for which video frame is to be synthesized as a background image, the above-described output video setting means 15 generates a necessary composite image based on the set output setting condition of the final multi-view multi-motion video. To do. A specific example of the composition processing in the image composition means 17 will be described later. The image synthesized by the image synthesizing unit 17 is temporarily accumulated in the accumulating unit 14.

多視点映像生成手段１８は、出力映像設定手段１５からの設定条件に基づいて、画像合成手段１７から得られるマルチモーション映像を所定の順序で切り替えながら多視点マルチモーション映像を生成する。また、多視点映像生成手段１８は、複数の撮像手段１１から得られる同期映像を所定の順序で切り替えながら多視点映像を生成することもできる。なお、多視点映像生成手段１８における多視点マルチモーション映像生成の具体例については後述する。また、多視点映像生成手段１８において生成された映像は、蓄積手段１４に一時的に蓄積される。 The multi-view video generation unit 18 generates a multi-view multi-motion video while switching the multi-motion video obtained from the image synthesis unit 17 in a predetermined order based on the setting condition from the output video setting unit 15. The multi-view video generation unit 18 can also generate a multi-view video while switching the synchronized video obtained from the plurality of imaging units 11 in a predetermined order. A specific example of multi-view multi-motion video generation in the multi-view video generation means 18 will be described later. Also, the video generated by the multi-view video generation unit 18 is temporarily stored in the storage unit 14.

出力映像生成手段１９は、出力映像設定手段１５における設定情報に基づいて画像合成手段１７において合成された画像、及び多視点映像生成手段１８により得られる多視点マルチモーション映像や多視点映像等を時間情報を基準として結合し、出力映像を生成する。なお、出力映像生成手段１９における映像生成の具体例については後述する。 The output video generation means 19 outputs the image synthesized by the image synthesis means 17 based on the setting information in the output video setting means 15 and the multi-view multi-motion video and multi-view video obtained by the multi-view video generation means 18 over time. Combine the information as a reference to generate the output video. A specific example of video generation in the output video generation means 19 will be described later.

送受信手段２０は、通信ネットワークを介して他の外部装置等から動く被写体の映像等を取得したり、本発明により得られる出力映像（多視点マルチモーション映像等）や上述した映像生成装置１０における各構成等で取得した各種データを外部端末に送信するための通信インターフェースである。 The transmission / reception means 20 acquires a moving subject image or the like from another external device or the like via a communication network, or outputs video (multi-view multi-motion video or the like) obtained by the present invention or each of the video generation devices 10 described above. It is a communication interface for transmitting various data acquired by configuration etc. to an external terminal.

制御手段２１は、映像生成装置１０における各機能構成全体の制御を行う。具体的には、制御手段２１は、取得した映像から被写体領域の抽出を行ったり、入力手段１２により入力されたユーザからの入力情報等に基づいて、画像合成を行ったり、多視点映像の生成を行ったり、出力映像の生成を行ったり、各処理における結果の画面を生成して表示したり、上述したそれぞれの処理で生成された各種データを蓄積する等の制御を行う。 The control unit 21 controls the entire functional configuration of the video generation apparatus 10. Specifically, the control unit 21 extracts a subject area from the acquired video, performs image synthesis based on input information from the user input by the input unit 12, and generates a multi-view video. , Generation of output video, generation and display of a result screen in each process, and accumulation of various data generated in each process described above.

＜多視点映像の取得例＞
次に、上述した撮像手段１１による多視点映像の取得例について図を用いて説明する。 <Example of acquiring multi-viewpoint video>
Next, an example of acquiring a multi-view video by the above-described imaging unit 11 will be described with reference to the drawings.

図２は、多視点映像を取得するための一例を示す図である。また、図３は、図２に対応して得られる映像フレームの一例を示す図である。図２に示すように、本実施形態においては、動く被写体３０を撮影するために、複数のカメラ３１−１〜３１−８が三脚等により所定の位置に設置されている。 FIG. 2 is a diagram illustrating an example for acquiring a multi-view video. FIG. 3 is a diagram showing an example of a video frame obtained corresponding to FIG. As shown in FIG. 2, in the present embodiment, in order to photograph a moving subject 30, a plurality of cameras 31-1 to 31-8 are installed at predetermined positions by tripods or the like.

ここで、時間の経過に伴って動く被写体３０を複数のカメラ３１−１〜３１−８で撮影した映像をそれぞれＩｍ１〜Ｉｍ８とする。また、カメラ３１−１〜３１−８により撮影される映像は、それぞれ１秒毎に所定枚数分のフレームを有している。つまり、図２に示す複数台のカメラ３１−１〜３１−８のそれぞれにより、図３に示すように複数の映像フレームが生成される。 Here, images obtained by photographing the subject 30 moving with the passage of time with a plurality of cameras 31-1 to 31-8 are referred to as Im1 to Im8, respectively. In addition, the images captured by the cameras 31-1 to 31-8 each have a predetermined number of frames every second. That is, as shown in FIG. 3, a plurality of video frames are generated by each of the plurality of cameras 31-1 to 31-8 shown in FIG.

なお、各カメラ３１−１〜３１−８から取得する情報としては、映像フレームと映像に対応する時間情報等とからなる映像情報の他にカメラの方向や、画角等のカメラパラメータ等を取得することができる。 As information acquired from each camera 31-1 to 31-8, camera parameters such as camera direction and angle of view are acquired in addition to video information including video frames and time information corresponding to video. can do.

＜出力映像設定手段１５＞
次に、出力映像設定手段１５における設定条件について説明する。出力映像設定手段１５では具体的に上述した図３に示す映像フレームから合成映像や多視点映像をどのタイミングで出力させるか等の設定を行う。 <Output video setting means 15>
Next, setting conditions in the output video setting means 15 will be described. Specifically, the output video setting means 15 performs setting such as when to output a composite video or a multi-view video from the video frame shown in FIG.

具体的には、出力映像設定手段１５は、予めカメラ等の撮像手段１１より得られる映像に対応する時間情報に基づいて、どの時間帯でどの位置を基準（背景画像）にした合成画像を出力し、またどの時間帯にどの順序で切り替えを行う多視点映像を出力するか等の設定を行う。更に、出力映像設定手段１５は、出力対象となるフレームを選択したり、時間経過と共に合成する被写体の数を変えたり、時間経過と共に合成する被写体を取得するカメラを変える等の設定を行うこともできる。これにより、ユーザ等は、複雑な映像編集を実現することができる。 Specifically, the output video setting unit 15 outputs a composite image based on which position in which time zone is used as a reference (background image) based on time information corresponding to video obtained from the imaging unit 11 such as a camera in advance. In addition, settings are made such as in which time zone and in which order the multi-viewpoint video to be switched is output. Further, the output video setting means 15 may perform settings such as selecting a frame to be output, changing the number of subjects to be combined with the passage of time, and changing the camera that acquires the subject to be combined with the passage of time. it can. Thereby, a user etc. can realize complicated video editing.

＜被写体領域抽出手段１６＞
次に、被写体領域抽出手段１６における抽出内容について説明する。ここでは、上述した複数のカメラ３１―１〜３１−８から得られる映像Ｉｍ１〜Ｉｍ８に存在する動く被写体の領域を抽出した被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）を生成する。 <Subject area extraction means 16>
Next, the extraction contents in the subject area extraction means 16 will be described. Here, subject region images (SilIm1 to SilIm8) are generated by extracting moving subject regions present in the videos Im1 to Im8 obtained from the plurality of cameras 31-1 to 31-8.

例えば、全てのカメラや一部のカメラにより撮影された複数の映像フレームのうち、出力映像設定手段１５により予め設定された映像フレーム（例えば、映像フレーム１、３，５，７，９，１１，・・・）から、カメラ毎に対応した複数の映像フレームの被写体領域画像を出力する。ここで、図４は、被写体領域画像の映像信号の一例を示す図である。図４に示す被写体領域画像は、例えば各画素を基準として動く被写体領域３２に「１」を割り当て、また、それ以外の背景領域３３に「０」を割り当てたものであり、キー信号となるものである。つまり、被写体領域抽出手段１６は、図４に示すようなキー信号を元の映像信号に重畳した映像信号等によって被写体領域を抽出する。 For example, among a plurality of video frames taken by all cameras or some cameras, video frames preset by the output video setting means 15 (for example, video frames 1, 3, 5, 7, 9, 11, To output subject area images of a plurality of video frames corresponding to each camera. Here, FIG. 4 is a diagram illustrating an example of the video signal of the subject area image. The subject area image shown in FIG. 4, for example, is obtained by assigning “1” to the subject area 32 that moves with reference to each pixel, and assigning “0” to the other background area 33, and serves as a key signal. It is. That is, the subject region extraction unit 16 extracts a subject region by a video signal or the like in which a key signal as shown in FIG. 4 is superimposed on the original video signal.

なお、上述した動く被写体領域を抽出する手法としては、例えば背景画像と対象となる映像の差分から、映像の動領域のみを抽出する背景差分法や、隣接する映像フレームの映像と対象となる映像の差分から動領域を抽出するフレーム間差分法等がある。 In addition, as a method of extracting the moving subject area described above, for example, a background difference method for extracting only a moving area of a video from a difference between a background image and a target video, or a video of a neighboring video frame and a target video. There is an inter-frame difference method or the like that extracts a moving region from the difference between the two.

上述の背景差分法では、動く物体（被写体）がない状態の画像を固定カメラで背景画像として取り込み、動く物体が入った画像から背景画像を差分し、動く物体の領域が「０」以外の値を持った差分画像を取得する。次に、その差分画像の画素値に対して閾値処理を行い２値画像を取得する。また、得られた２値画像は、小さな穴や小さな連結成分を含んでいるため、これらを取り除いて動く物体の領域を取得する。そして、この２値画像を利用して対象物体の領域に位置する画素を取り出すことにより動く物体の画像を得る。 In the background subtraction method described above, an image with no moving object (subject) is captured as a background image by a fixed camera, the background image is subtracted from the image containing the moving object, and the area of the moving object is a value other than “0”. The difference image with is acquired. Next, threshold processing is performed on the pixel value of the difference image to obtain a binary image. Moreover, since the obtained binary image includes small holes and small connected components, these are removed to obtain the region of the moving object. Then, using this binary image, a moving object image is obtained by extracting pixels located in the region of the target object.

また、上述のフレーム間差分法は、例えば動く物体を撮影した異なる時間の３枚の画像を用いて動く物体領域を取得する。つまり、３枚の画像Ａ、Ｂ、Ｃにおいて背景は変化がないものとし、ＡとＢ、ＢとＣの差分画像ＡＢ、ＢＣを作成し、閾値処理を行い２値画像を取得する。次に、２値画像ＡＢ、ＢＣの論理積処理（ＡＮＤ処理）を行い、ＡＢ、ＢＣの共通領域を取り出すことにより、画像Ｂにおける動く物体の領域を取得する。 The inter-frame difference method described above acquires a moving object region using, for example, three images taken at different times when a moving object is captured. That is, it is assumed that the background does not change in the three images A, B, and C, the difference images AB and BC between A and B and B and C are created, and threshold processing is performed to obtain a binary image. Next, a logical product process (AND process) of the binary images AB and BC is performed, and a common area of AB and BC is extracted, thereby acquiring a moving object area in the image B.

また、被写体領域を抽出するその他の手法としては、ＡｄｏｂｅＰｈｏｔｏＳｈｏｐ等のソフトウェアによってユーザが自らマニュアル操作で抽出する手法等がある。 As another method for extracting the subject area, there is a method in which the user manually extracts the subject area by software such as Adobe PhotoShop.

＜画像合成手段１７＞
次に、画像合成手段１７における画像合成内容について説明する。画像合成手段１７は、出力映像設定手段１５により設定された最終的に出力される映像に対応する合成画像を生成するため、被写体領域抽出手段１６により抽出された被写体に対して、対応する被写体を選択し、選択した画像を用いて合成を行う。 <Image composition means 17>
Next, the contents of image composition in the image composition means 17 will be described. The image synthesizing unit 17 generates a synthesized image corresponding to the finally output video set by the output video setting unit 15, and selects a corresponding subject from the subject extracted by the subject region extracting unit 16. Select and compose using the selected image.

つまり、ここでは、被写体領域抽出手段１６から出力される被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）と、対応するカメラ毎に選択される所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）とを利用して、カメラ毎に所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）に複数の映像フレームの被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）を合成したマルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成する。 That is, here, the subject area image (SilIm1 to SilIm8) output from the subject area extraction means 16 and the predetermined background image (BgIm1 to BgIm8) selected for each corresponding camera are used for each camera. Multi-motion images (MmIm1 to MmIm8) are generated by synthesizing subject region images (SilIm1 to SilIm8) of a plurality of video frames with a predetermined background image (BgIm1 to BgIm8).

例えば、背景画像として用いる映像フレームを各カメラの１３フレームの画像とし、そこに被写体領域抽出手段１６で抽出した映像フレーム（例えば、１，３，５，７，９，１１）の被写体領域画像を合成する。そして、カメラ毎に複数の映像フレームの被写体領域の画像が合成表示されたマルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成する。 For example, the video frame used as the background image is a 13-frame image of each camera, and the subject area image of the video frame (for example, 1, 3, 5, 7, 9, 11) extracted by the subject area extraction means 16 is provided there. Synthesize. Then, multi-motion images (MmIm1 to MmIm8) in which images of subject areas of a plurality of video frames are combined and displayed for each camera are generated.

ここで、図５は、合成画像の一例を示す図である。図５では、被写体を合成した画像からなるマルチモーションの合成画像４０の一例を示しており、所定の位置に設置された１台のカメラで撮影された映像フレームのうち、予め設定された条件により選択された５つの映像フレームから得られる被写体４１〜１〜４１−５が背景画像４２上に表示されている。 Here, FIG. 5 is a diagram illustrating an example of a composite image. FIG. 5 shows an example of a multi-motion composite image 40 composed of images obtained by compositing subjects. Of the video frames taken by one camera installed at a predetermined position, the pre-set conditions are used. Subjects 41 to 41-5 obtained from the five selected video frames are displayed on the background image 42.

なお、被写体４１〜１〜４１−５のうち何れか１つは、背景画像４２中に含まれている被写体である。つまり、図５の例では、被写体４１−５を含む映像フレームを背景画像４２とし、その背景画像４２に被写体領域抽出手段１６で抽出した被写体４１〜１〜４１−４が合成されている。 Any one of the subjects 41 to 1 to 41-5 is a subject included in the background image 42. In other words, in the example of FIG. 5, a video frame including the subject 41-5 is used as the background image 42, and the subjects 41 to 41-4 extracted by the subject region extraction unit 16 are combined with the background image 42.

この所定の背景画像に被写体領域画像を合成する手法としては、上述した被写体領域画像のキー信号に基づいて合成するキー合成手法等がある。ここで、キー合成手法とは、例えば、２枚の画像を合成して１枚の画像を作成する際、どの部分をどちらの画像から取ってくるかということを指定したい場合に、その画像を画素毎に「１」，「０」で指定した画像（マスク画像）を作成し、それに基づいて画像合成を行う。これは、画像間演算の一種とみなすことができる。 As a technique for synthesizing the subject area image with the predetermined background image, there is a key synthesis technique for synthesizing based on the key signal of the subject area image described above. Here, the key composition method is, for example, when two images are combined to create one image, and when it is desired to specify which part is to be taken from which image, An image (mask image) designated by “1” and “0” is created for each pixel, and image composition is performed based on the image. This can be regarded as a kind of calculation between images.

また、マスク画像の生成法としてよく利用されるものにクロマキーがある。これは、各画素をその色情報に基づいて切り捨てる処理を行う。 A chroma key is often used as a mask image generation method. In this process, each pixel is cut off based on its color information.

＜画像合成：第１の具体例＞
ここで、画像合成手段１７における画像合成の第１の具体例について説明する。画像合成手段１７は、被写体領域抽出手段１６から出力される各カメラ３１−１〜３１−８からの被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）と、カメラ毎の所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）と、予め設定されたカメラ毎に被写体領域画像を所定の背景画像に合成する順番（Ｏｒｄｅｒ１〜Ｏｒｄｅｒ８）とに基づいて、カメラ毎に所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）に複数の映像フレームの被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）を合成したマルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成する。 <Image composition: first specific example>
Here, a first specific example of image composition in the image composition means 17 will be described. The image synthesizing unit 17 includes subject area images (SilIm1 to SilIm8) output from the subject area extracting unit 16 and predetermined background images (BgIm1 to BgIm8) for each camera, Based on the order in which the subject area image is combined with a predetermined background image (Order 1 to Order 8) for each camera set in advance, the subject area images of a plurality of video frames are added to the predetermined background image (BgIm1 to BgIm8) for each camera. A multi-motion image (MmIm1 to MmIm8) is generated by synthesizing (SilIm1 to SilIm8).

例えば、所定の背景画像を所定のカメラの１３フレームの映像とし、そこに被写体領域抽出手段１６で抽出した映像フレーム１，３，５，７，９，１１の被写体領域画像を合成する場合、カメラ３１−１のマルチモーション映像（ＭｍＩｍ１）を生成する場合について、カメラ３１−１の合成する順番（Ｏｒｄｅｒ１）が予め映像フレーム１１→９→７→５→３→１の順となっていたとする。 For example, when a predetermined background image is a 13-frame video of a predetermined camera and the subject area images of the video frames 1, 3, 5, 7, 9, and 11 extracted by the subject area extraction unit 16 are combined therewith, In the case of generating the multi-motion video 31-1 (MmIm1), it is assumed that the order of synthesis by the camera 31-1 (Order1) is in the order of video frames 11 → 9 → 7 → 5 → 3 → 1.

このとき、カメラ３１−１の背景画像（１３フレームの映像）に、カメラ３１−１の映像フレーム１１の被写体領域画像をキー信号に基づき合成した合成映像を生成する。また、この合成映像に、カメラ３１−１の映像フレーム９の被写体領域画像をキー信号に基づき合成する。 At this time, a synthesized video is generated by synthesizing the subject area image of the video frame 11 of the camera 31-1 with the background image (13 video) of the camera 31-1 based on the key signal. Further, the subject area image of the video frame 9 of the camera 31-1 is synthesized with this synthesized video based on the key signal.

同様に、映像フレーム７，５，３，１の順に被写体領域画像を合成し、カメラ３１−１のマルチモーション画像（ＭｍＩｍ１）を生成する。なお、カメラ３１−１以外の他のカメラについても同様の処理を行い、カメラ台数分のマルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成する。 Similarly, the subject area images are synthesized in the order of the video frames 7, 5, 3, and 1 to generate a multi-motion image (MmIm1) of the camera 31-1. The same processing is performed for other cameras than the camera 31-1, and multi-motion images (MmIm1 to MmIm8) corresponding to the number of cameras are generated.

＜画像合成：第２の具体例＞
次に、画像合成手段１７における画像合成の第２の具体例について説明する。第２の具体例では、画像合成手段１７に距離情報計算手段を設ける。具体的には、画像合成手段１７は、動く被写体を各カメラ３１−１〜３１−８で撮影した映像（Ｉｍ１〜Ｉｍ８）と複数のカメラのそれぞれのカメラパラメータ（Ｃａｌｉｂ１〜Ｃａｌｉｂ８）から、各カメラから複数の映像フレームの動く被写体領域までの距離情報を取得する。また、カメラ毎に複数の映像フレームの動く被写体領域までの距離情報の大小を比較し、カメラ毎に被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）を所定の背景画像に合成する順番（Ｏｒｄｅｒ１〜Ｏｒｄｅｒ８）を決定し、その結果に基づいて画像合成を行う。 <Image composition: second specific example>
Next, a second specific example of image composition in the image composition means 17 will be described. In the second specific example, the image composition means 17 is provided with distance information calculation means. Specifically, the image compositing means 17 uses each camera 31-1 to 31-8 to capture each moving camera from images (Im1 to Im8) and camera parameters (Calib1 to Calib8) of the plurality of cameras. To obtain information on the distance from a moving subject area of a plurality of video frames. Also, the distance information to the moving subject area of a plurality of video frames is compared for each camera, and the order (Order1 to Order8) for combining the subject area images (SilIm1 to SilIm8) with a predetermined background image is determined for each camera. Then, image synthesis is performed based on the result.

ここで、上述した距離情報を取得するには、例えば被写体の位置をある３次元座標で表し、その３次元座標から距離を測定する。例えば、映像フレーム１，３，５，７，９，１１の動く被写体までのそれぞれの距離情報をカメラ毎に計算する場合、例えばカメラ３１−１、カメラ３１−８の映像フレーム１の映像間で映っている共通の被写体領域の点（頭の中心、腹部の中心等の注視点）の映像中の座標（ｕ１，ｖ１）、（ｕ８，ｖ８）を与える。そして、この座標値と、カメラ３１−１、カメラ３１−８のカメラパラメータＣａｌｉｂ１、Ｃａｌｉｂ８を用いて注視点の３次元座標を計算する。なお、上述の手法は、例えば冨山仁博他２名，「多視点ハイビジョン映像生成システムの試作−全日本体操選手権での中継番組利用−」，社団法人電子情報通信学会，信学技法２００６−１２,Ｐ．４３〜４５．等に記載されている。 Here, in order to obtain the above-described distance information, for example, the position of the subject is represented by a certain three-dimensional coordinate, and the distance is measured from the three-dimensional coordinate. For example, when calculating the distance information to the moving subject of the video frames 1, 3, 5, 7, 9, and 11 for each camera, for example, between the videos of the video frames 1 of the cameras 31-1 and 31-8. Coordinates (u1, v1) and (u8, v8) in the video of the points of the common subject area shown (gaze points such as the center of the head and the center of the abdomen) are given. Then, the three-dimensional coordinates of the gazing point are calculated using the coordinate values and the camera parameters Calib1 and Calib8 of the cameras 31-1 and 31-8. In addition, the above-mentioned method is, for example, Ninohiro Hiyama and two others, “Prototype of multi-view high-definition video generation system-Use of relay program in all-Japan gymnastics championship”, The Institute of Electronics, Information and Communication Engineers, IEICE Technical 2006-12, P. 43-45. Etc. are described.

具体的に説明すると、３次元座標は、以下の座標変換式（１）〜（３）をカメラ毎に定義することにより求める。 Specifically, the three-dimensional coordinates are obtained by defining the following coordinate conversion formulas (1) to (3) for each camera.

ここで、ωは画像距離を示し、Ａｍはカメラｍの内部パラメータ行列を示し、ａは画像のアスペクト比を示し、Ｆｍはカメラｍの焦点距離を示し、γはスキューを示し、（Ｃｘ,Ｃｙ）はカメラ光軸と画像面の交点を示し、Ｒｍはカメラｍの回転行列を示し、Ｔｍはカメラｍの平行移動ベクトルを示している。
Here, ω represents the image distance, Am represents the internal parameter matrix of the camera m, a represents the aspect ratio of the image, Fm represents the focal length of the camera m, γ represents the skew, and (Cx, Cy ) Represents the intersection of the camera optical axis and the image plane, Rm represents the rotation matrix of the camera m, and Tm represents the translation vector of the camera m.

つまり、複数のカメラで撮影した対象領域である被写体からユーザが選択した注視点を中心とした多視点映像を生成することを目的としている。全てのカメラ画像を、選択した注視点が中心となるように仮想的に、パンやチルト等を行った画像に射影変換し、それらを連続して提示することで、注視点を中心としたシームレスな多視点映像を生成することができる。 In other words, the object is to generate a multi-view video centered on a gazing point selected by the user from a subject that is a target area captured by a plurality of cameras. By seamlessly projecting all camera images into panned and tilted images so that the selected gaze point is at the center, and continuously presenting them, seamlessly focusing on the gaze point Multi-view video can be generated.

そこで、この射影変換処理で必要となる各カメラのカメラパラメータをカメラキャリブレーションによって求める。ここでは、上述した座標変換式（１）〜（３）を用いてユーザが選択した注視点の世界座標を求め、中心の３次元座標と各カメラの光学中心を結ぶ直線が画像中心を通るように射影変換を行う。 Therefore, camera parameters of each camera necessary for the projective transformation process are obtained by camera calibration. Here, the world coordinates of the gazing point selected by the user are obtained using the above-described coordinate transformation equations (1) to (3), and a straight line connecting the three-dimensional coordinate of the center and the optical center of each camera passes through the center of the image. Perform a projective transformation on

また、注視点を決定する際には、まず、ユーザが選択した注視点が画像の中心となるような射影変換を行うために、注視点の世界座標を算出する。まず、複数台のカメラの中からある１台のカメラ画像を選択し、注視点（ｕ_ｋ，ｖ_ｋ）を与える。次に、他のカメラの中から１台を選択し、同じ注視点の画像座標（ｕ_１，ｖ_１）を与える。これらの２台のカメラの注視点の画像座標と、座標変換式（１）〜（３）から、注視点の３次元座標（Ｘ_１，Ｙ_１，Ｚ_１）は、以下の（４）式の最小二乗法によって求めることができる。 When determining the point of gaze, first, world coordinates of the point of gaze are calculated in order to perform projective transformation so that the point of gaze selected by the user is the center of the image. First, one camera image is selected from a plurality of cameras, and a gazing point (u _k , v _k ) is given. Next, one camera is selected from the other cameras, and image coordinates (u ₁ , v ₁ ) of the same gazing point are given. From the image coordinates of the gazing point of these two cameras and the coordinate conversion equations (1) to (3), the three-dimensional coordinates (X ₁ , Y ₁ , Z ₁ ) of the gazing point are expressed by the following equation (4): Can be obtained by the least square method.

ここで、ｐ_ｉｊ ^ｍは座標変換式（１）の変換行列Ｐｍのｉ行ｊ列目の要素を表す。 Here, p _ij ^m represents an element in the i-th row and j-th column of the transformation matrix Pm of the coordinate transformation formula (1).

次に、計算した被写体領域の点の３次元座標と、各カメラまでの距離を計算することで、各カメラから映像フレーム１の動く被写体領域までの距離を計算することができる。 Next, the distance from each camera to the moving subject area of the video frame 1 can be calculated by calculating the calculated three-dimensional coordinates of the point of the subject area and the distance to each camera.

ここで、画像合成手段１７は、カメラ３１−１〜３１−８から、映像フレーム１の動く被写体領域までのそれぞれの距離値をｄｉｓｔ＿ｃ１＿ｆ１〜ｄｉｓｔ＿ｃ８＿ｆ１として求める。同様に、各カメラ３１−１〜３１−８における映像フレーム３，５，７，９，１１の動く被写体領域までの距離情報、ｄｉｓｔ＿ｃ１＿ｆ３〜ｄｉｓｔ＿ｃ８＿ｆ３、ｄｉｓｔ＿ｃ１＿ｆ５〜ｄｉｓｔ＿ｃ８＿ｆ５、ｄｉｓｔ＿ｃ１＿ｆ７〜ｄｉｓｔ＿ｃ８＿ｆ７、ｄｉｓｔ＿ｃ１＿ｆ９〜ｄｉｓｔ＿ｃ８＿ｆ９、ｄｉｓｔ＿ｃ１＿ｆ１１〜ｄｉｓｔ＿ｃ８＿ｆ１１を求める。 Here, the image synthesizing unit 17 obtains distance values from the cameras 31-1 to 31-8 to the moving subject area of the video frame 1 as dist_c1_f1 to dist_c8_f1. Similarly, the distance information to the moving subject area of the video frames 3, 5, 7, 9, and 11 in each camera 31-1 to 31-8, dist_c1_f3 to dist_c8_f3, dist_c1_f5 to dist_c8_f5, dist_c1_f7 to dist_c8_f7, dist_c1_f9_dist_c11_f9_d9_d9 ~ Dist_c8_f11 is obtained.

次に、カメラ毎に映像フレーム毎（１，３，５，７，９，１１）の距離情報の大小を比較し、距離が大きい順に被写体領域画像を合成表示する順番（Ｏｒｄｅｒ１〜Ｏｒｄｅｒ８）を決定する。例えば、カメラ３１−１に関して、映像フレーム毎の距離情報が、「ｄｉｓｔ＿ｃ１＿ｆ１１＞ｄｉｓｔ＿ｃ１＿ｆ９＞ｄｉｓｔ＿ｃ１＿ｆ７＞ｄｉｓｔ＿ｃ１＿ｆ５＞ｄｉｓｔ＿ｃ１＿ｆ３＞ｄｉｓｔ＿ｃ１＿ｆ１」という大小関係にある場合、カメラ３１−１の所定の背景画像に合成する被写体領域画像の順番（Ｏｒｄｅｒ１）は、映像フレーム１１→９→７→５→３→１となる。 Next, the size of distance information for each video frame (1, 3, 5, 7, 9, 11) is compared for each camera, and the order (Order 1 to Order 8) in which the subject area images are synthesized and displayed in order of increasing distance is determined. To do. For example, regarding the camera 31-1, when the distance information for each video frame has a magnitude relationship of “dist_c1_f11> dist_c1_f9> dist_c1_f7> dist_c1_f5> dist_c1_f3> dist_c1_f1,” a subject area to be combined with a predetermined background image of the camera 31-1 The order of images (Order 1) is video frames 11 → 9 → 7 → 5 → 3 → 1.

つまり、背景として用いられるカメラに対して遠いカメラの画像から近いカメラの画像を順番に合成する。これにより、時間経過に伴って動く被写体を映像として高精度に提示することができる。 That is, the camera images closer to the camera image far from the camera used as the background are sequentially combined. As a result, it is possible to present a subject that moves with time as a video with high accuracy.

＜画像合成：第３の具体例＞
次に、画像合成手段１７における画像合成の第３の具体例について説明する。画像合成手段１７は、被写体領域抽出手段１６から出力される被写体領域画像（ＳｉｌＩｍ１〜ＳｉｌＩｍ８）と、カメラ毎の所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）と、上述した第２の具体例に示した距離情報計算手段により得られる距離情報により決定したカメラ毎に被写体領域画像を所定の背景画像に合成する順番（Ｏｒｄｅｒ１〜Ｏｒｄｅｒ８）を利用して、カメラ毎に、所定の背景画像（ＢｇＩｍ１〜ＢｇＩｍ８）に複数の映像フレームの被写体領域画像を合成した、マルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成し、生成した映像を出力する。 <Image composition: Third specific example>
Next, a third specific example of image composition in the image composition means 17 will be described. The image synthesizing unit 17 includes a subject region image (SilIm1 to SilIm8) output from the subject region extracting unit 16, a predetermined background image (BgIm1 to BgIm8) for each camera, and the distance shown in the second specific example described above. By using the order (Order 1 to Order 8) for combining the subject area image with the predetermined background image for each camera determined by the distance information obtained by the information calculation means, the predetermined background image (BgIm1 to BgIm8) is set for each camera. A multi-motion image (MmIm1 to MmIm8) is generated by combining subject area images of a plurality of video frames, and the generated video is output.

例えば、所定の背景画像を各カメラの１３フレームの映像とし、そこに、被写体領域抽出手段１６で抽出した映像フレーム１，３，５，７，９，１１の被写体領域画像を合成する際、カメラ３１−１におけるマルチモーション画像（ＭｍＩｍ１）を生成する場合については、カメラ３１−１の合成する順番（Ｏｒｄｅｒ１）が映像フレーム１１→９→７→５→３→１の順にとなっていたとする。このとき、カメラ３１−１の背景画像（１３フレームの映像）にカメラ３１−１の映像フレーム１１の被写体領域画像をキー信号に基づき合成した合成映像を生成する。 For example, when a predetermined background image is a 13-frame video of each camera and the subject area images of the video frames 1, 3, 5, 7, 9, and 11 extracted by the subject area extraction unit 16 are combined therewith, In the case of generating a multi-motion image (MmIm1) at 31-1, it is assumed that the order of synthesis by the camera 31-1 (Order1) is in the order of video frames 11 → 9 → 7 → 5 → 3 → 1. At this time, a synthesized video is generated by synthesizing the subject area image of the video frame 11 of the camera 31-1 with the background image (13-frame video) of the camera 31-1 based on the key signal.

次に、この合成映像に、カメラ１の映像フレーム９の被写体領域画像をキー信号に基づき合成する。同様に、映像フレーム７，５，３，１の順に被写体領域画像を合成し、カメラ３１−１のマルチモーション映像（ＭｍＩｍ１）を生成し、生成した画像を出力する。 Next, the subject area image of the video frame 9 of the camera 1 is synthesized with this synthesized video based on the key signal. Similarly, the subject area images are synthesized in the order of the video frames 7, 5, 3, and 1, a multi-motion video (MmIm1) of the camera 31-1 is generated, and the generated image is output.

また、画像合成手段１７は、出力映像設定手段１５により得られる設定条件に基づいて、必要に応じてカメラ３１−１以外の他のカメラについても同様の処理を行い、カメラ台数分のマルチモーション映像（ＭｍＩｍ１〜ＭｍＩｍ８）を生成し、生成した映像を出力する。 In addition, the image composition unit 17 performs the same processing on other cameras other than the camera 31-1 as necessary based on the setting conditions obtained by the output video setting unit 15, and multi-motion video for the number of cameras. (MmIm1 to MmIm8) are generated, and the generated video is output.

＜多視点映像生成手段１８＞
多視点映像生成手段１８は、出力映像設定手段１５における設定条件に基づいて、複数台のカメラ３１−１〜３１−８から選択される所定のカメラにより同期して撮影された映像を、所定の時間をずらして順次表示させることにより多視点映像を生成する。 <Multi-view video generation means 18>
Based on the setting conditions in the output video setting unit 15, the multi-view video generation unit 18 generates video that has been captured synchronously by a predetermined camera selected from the plurality of cameras 31-1 to 31-8. A multi-view video is generated by sequentially displaying the images at different times.

なお、多視点映像生成手段１８は、出力映像設定手段１５により得られる設定条件に基づいて、例えば画像合成手段１７から出力されるカメラ台数分のマルチモーション画像（ＭｍＩｍ１〜ＭｍＩｍ８）や選択された所定のマルチモーション画像、あるいは複数の撮像手段１１により得られる合成していない映像等から、例えばカメラの配列順に画像を切り替えて多視点マルチモーション映像（ＯｕｔＩｍ）を生成する。 Note that the multi-view video generation unit 18 is based on the setting conditions obtained by the output video setting unit 15, for example, multi-motion images (MmIm 1 to MmIm 8) corresponding to the number of cameras output from the image synthesis unit 17 and selected predetermined ones. For example, the multi-view multi-motion video (OutIm) is generated by switching the images from the multi-motion image or the uncombined video obtained by the plurality of imaging units 11 in the order of camera arrangement, for example.

例えば、カメラ３１−１〜カメラ３１−８の順に左から右にカメラが設置されている場合、以下のようなフレーム順でマルチモーション映像を連続して切り替えて多視点マルチモーション映像（ＯｕｔＩｍ）を生成する。なお、この場合には、出力映像設定手段１５により設定される切り替え順序や切り替え時間間隔等の設定条件に基づいて多視点映像の生成を行う。 For example, when cameras are installed from left to right in the order of the camera 31-1 to the camera 31-8, the multi-motion multi-motion video (OutIm) is switched by continuously switching the multi-motion video in the following frame order. Generate. In this case, the multi-viewpoint video is generated based on the setting conditions such as the switching order and switching time interval set by the output video setting means 15.

つまり、多視点映像生成手段１８は、例えば１フレーム〜８フレームで、ＭｍＩｍ１〜ＭｍＩｍ８の多視点マルチモーション映像（又は多視点映像）を出力することができる。 That is, the multi-view video generation unit 18 can output multi-view multi-motion videos (or multi-view videos) of MmIm1 to MmIm8 in, for example, 1 to 8 frames.

＜出力映像生成手段１９＞
次に、出力映像生成手段１９における出力映像の生成内容について説明する。出力映像生成手段１９は、画像合成手段１７により得られる合成画像と、多視点映像生成手段１８により得られる多視点マルチモーション映像等と、出力映像設定手段１５により設定された出力映像の設定条件に基づいて、所定のタイミングで、出力映像を生成する。 <Output video generation means 19>
Next, the content of the output video generated by the output video generation means 19 will be described. The output video generation unit 19 sets the composite image obtained by the image synthesis unit 17, the multi-view multi-motion video obtained by the multi-view video generation unit 18, and the output video setting conditions set by the output video setting unit 15. Based on this, an output video is generated at a predetermined timing.

＜出力映像：第１の実施例＞
ここで、出力映像生成手段１９における出力映像の具体的な実施例について図を用いて説明する。図６は、出力映像生成の第１の実施例を説明するための図である。図６に示す出力映像は、各フレーム５０−１〜５０−８（図６（ａ）〜（ｈ））の順序で所定の時間間隔で出力される。また、各フレーム５０−１〜５０−８には、複数のカメラのうち、予め設定されたカメラからの背景画像５１−１〜５１−８がそれぞれ選択される。 <Output Video: First Example>
Here, a specific embodiment of the output video in the output video generation means 19 will be described with reference to the drawings. FIG. 6 is a diagram for explaining a first example of output video generation. The output video shown in FIG. 6 is output at predetermined time intervals in the order of the frames 50-1 to 50-8 (FIGS. 6A to 6H). In addition, for each of the frames 50-1 to 50-8, among the plurality of cameras, background images 51-1 to 51-8 from preset cameras are selected.

また、各背景画像５１−１〜５１−８には、上述した画像合成手段１７により、所定のカメラで撮影された複数の映像フレームのうち、選択された５つのフレームから得られる被写体５２−１〜５２−５が合成されている。なお、５つのフレームのうち、１つは背景画像としても用いられる。また、フレーム５０−１〜５０−８における合成画像の被写体は、各フレーム５０−１〜５０−８でそれぞれ選択された所定のカメラにより撮影された映像フレームのうち、選択される５つのフレームから取得する。 Each of the background images 51-1 to 51-8 includes a subject 52-1 obtained from five selected frames among a plurality of video frames photographed by a predetermined camera by the above-described image synthesizing unit 17. ~ 52-5 have been synthesized. Of the five frames, one is also used as a background image. In addition, the subject of the composite image in the frames 50-1 to 50-8 is selected from the five frames selected from the video frames taken by the predetermined cameras selected in the respective frames 50-1 to 50-8. get.

このように、第１の実施例によれば、図６に示すように、高精度な多視点マルチモーション画像を実現し、ディスプレイ等の出力手段１３により表示することができる。 In this way, according to the first embodiment, as shown in FIG. 6, a highly accurate multi-view multi-motion image can be realized and displayed by the output means 13 such as a display.

＜出力映像：第２の実施例＞
次に、本実施の形態における出力映像生成の第２の実施例について図を用いて説明する。図７は、出力映像生成の第２の実施例を説明するための図である。図７に示す出力映像は、各フレーム６０−１〜６０−１０（図７（ａ）〜（ｊ））の順序で出力される。なお、図７に示すフレームは、説明上、実際の出力映像のフレームから所定のフレームを選択して抽出したものを用いているが、実際、連続的な被写体のモーション映像をディスプレイ等の出力手段１３により表示させることができる。また、各フレーム６０−１〜６０−１０には、複数のカメラのうち予め設定されたカメラから、背景画像６１が選択され、その背景画像に被写体画像が合成されることになる。 <Output Video: Second Example>
Next, a second example of output video generation in this embodiment will be described with reference to the drawings. FIG. 7 is a diagram for explaining a second embodiment of output video generation. The output video shown in FIG. 7 is output in the order of the frames 60-1 to 60-10 (FIGS. 7A to 7J). For the sake of explanation, the frame shown in FIG. 7 uses a frame obtained by selecting and extracting a predetermined frame from an actual output video frame. 13 can be displayed. In each of the frames 60-1 to 60-10, the background image 61 is selected from a preset camera among a plurality of cameras, and the subject image is combined with the background image.

ここで、図７（ａ）では、出力映像設定手段１５により予め設定された１台のカメラで撮影された複数の映像フレームのうち、設定により選択される２つのフレームからそれぞれ得られる２つの被写体６２−１ａ,６２−１ｂのみが合成され、表示されている。また、図７（ｂ）では、図７（ａ）と比較して、予め設定された１台のカメラでそれぞれ撮影された複数の映像フレームのうち、更に図７（ａ）で用いたフレームとは異なる２つのフレームから得られる被写体を含め、計４つの被写体６２−２ａ〜６２−２ｄが表示されている。 Here, in FIG. 7A, two subjects respectively obtained from two frames selected by setting among a plurality of video frames shot by one camera preset by the output video setting means 15. Only 62-1a and 62-1b are synthesized and displayed. Further, in FIG. 7B, compared with FIG. 7A, among the plurality of video frames respectively shot with one preset camera, the frame used in FIG. A total of four subjects 62-2a to 62-2d are displayed including subjects obtained from two different frames.

また、図７（ｃ）〜（ｈ）については、出力映像設定手段１５により予め設定された６台のカメラでそれぞれ撮影された複数の映像フレームのうち、設定により選択される５つのフレームから得られる被写体６２−３ａ〜６２−３ｅ，・・・，６２−８ａ〜６２−８ｅが合成され、更にそれぞれが出力映像設定手段１５により予め設定された時間間隔に基づいて多視点映像生成手段１８により生成された多視点映像が出力される。なお、図７（ｃ）〜（ｈ）に対する映像の再生時には、実際に撮影した時間自体は停止している（被写体自体は動いていない）状態となる。これにより、図７に示すように被写体に対して異なる角度から回り込むような映像を生成することができ、この映像を例えばディスプレイ等の出力手段１３に表示することができる。 7C to 7H are obtained from five frames selected by setting among a plurality of video frames respectively shot by six cameras set in advance by the output video setting means 15. , 62-8a to 62-8e are combined, and each of them is generated by the multi-view video generation unit 18 based on the time interval preset by the output video setting unit 15. The generated multi-view video is output. Note that during the reproduction of the images shown in FIGS. 7C to 7H, the actual shooting time itself is stopped (the subject itself is not moving). Thereby, as shown in FIG. 7, it is possible to generate an image that wraps around the subject from different angles, and this image can be displayed on the output means 13 such as a display.

また、図７（ｉ）では、出力映像設定手段１５により予め設定された１台のカメラで撮影された複数の映像フレームのうち、設定により選択される３つのフレームから得られる被写体６２−９ａ〜６２−９ｃが表示されている。 Also, in FIG. 7 (i), subjects 62-9a to 62-9a obtained from three frames selected by setting among a plurality of video frames photographed by one camera preset by the output video setting means 15 are used. 62-9c is displayed.

更に、図７（ｊ）では、時間が経過し、ある所定のカメラで撮影されたフレーム画像（合成のない画像）のみが出力される。なお、このような画像は、予め出力映像設定手段１５により設定された出力条件に基づいて、予め画像合成手段１７及び多視点映像生成手段１８等により生成された各種画像を蓄積手段１４に一時的に蓄積しておき、それらの画像を用いて出力映像生成手段１９において出力映像を生成する。 Further, in FIG. 7 (j), only a frame image (image without synthesis) taken with a predetermined camera is output after a lapse of time. Note that such an image is obtained by temporarily storing various images generated in advance by the image synthesizing unit 17 and the multi-viewpoint video generating unit 18 on the storage unit 14 based on the output conditions set in advance by the output video setting unit 15. The output video generation means 19 generates an output video using these images.

このように、出力映像設定手段１５により設定された画像を用いることにより、合成画像や、多視点映像を組み合わせて被写体の軌跡や、被写体の動作の流れを高精度に取得してディスプレイ等の出力手段１３により表示することができる。これにより、１２台のカメラ映像を切り替えて表示することにより立体感のあるダイナミックな映像を生成することができる。 In this way, by using the image set by the output video setting means 15, a combined image or a multi-view video is combined to obtain a subject trajectory and a subject motion flow with high accuracy and output to a display or the like. It can be displayed by means 13. Thereby, a dynamic image with a stereoscopic effect can be generated by switching and displaying the 12 camera images.

＜出力映像：第３の実施例＞
次に、本実施の形態における出力映像生成の第３の実施例について図を用いて説明する。第３の実施例としては、上述の第１の実施例及び第２の実施例を組み合わせて出力映像を生成する。 <Output Video: Third Example>
Next, a third example of output video generation according to the present embodiment will be described with reference to the drawings. As a third example, an output video is generated by combining the first and second examples described above.

例えば、上述した図６を用いて説明すると、図６（ａ）では被写体５２−１ｅのみ表示された画像を表示し、図６（ｂ）では設定により選択される２つのフレームから得られる被写体５２−２ｄ、５２−２ｅのみ表示された画像を表示する。 For example, referring to FIG. 6 described above, in FIG. 6A, an image in which only the subject 52-1e is displayed is displayed, and in FIG. 6B, the subject 52 obtained from two frames selected by setting. -2d and 52-2e are displayed.

また、図６（ｃ）〜図６（ｇ）についても図６（ｂ）と同様に、設定により選択される２つのフレームから得られる被写体のみ表示する。具体的には、図６（ｃ）では被写体５２−３ｄ、５２−３ｅのみ表示された画像を表示し、図６（ｄ）では被写体５２−４ｃ、５２−４ｄのみ表示された画像を表示し、図６（ｅ）では被写体５２−５ｂ、５２−５ｃのみ表示された画像を表示し、図６（ｆ）では被写体５２−６ｂ、５２−６ｃのみ表示された画像を表示し、図６（ｇ）では被写体５２−７ａ、５２−７ｂのみ表示された画像を表示する。 6C to 6G, only the subject obtained from the two frames selected by setting is displayed as in FIG. 6B. Specifically, in FIG. 6C, an image displaying only the subjects 52-3d and 52-3e is displayed, and in FIG. 6D, an image displaying only the subjects 52-4c and 52-4d is displayed. 6 (e) displays an image displaying only the subjects 52-5b and 52-5c, and FIG. 6 (f) displays an image displaying only the subjects 52-6b and 52-6c. In g), an image in which only the subjects 52-7a and 52-7b are displayed is displayed.

更に、図６（ｈ）では被写体５２−８ａのみ表示された画像をディスプレイ等の出力手段１３により表示するようにする。これにより、被写体が動きながら、被写体に対して異なる角度から回り込むような映像を表示することができるため、多視点映像において、あたかも時間が経過しているような合成画像を表示することができる。これにより、より高精度な出力映像を提供することができる。上述した映像生成装置１０により、時間経過に伴って動く被写体を映像として高精度に提示することができる。 Further, in FIG. 6H, an image displaying only the subject 52-8a is displayed by the output means 13 such as a display. As a result, it is possible to display an image that moves around the subject from different angles while the subject moves, so that it is possible to display a composite image as if time has elapsed in a multi-viewpoint video. Thereby, a more accurate output image can be provided. The above-described video generation apparatus 10 can present a subject that moves with time as a video with high accuracy.

＜実行プログラム＞
ここで、上述した映像生成装置１０は、上述した専用の装置構成等を用いて本発明における映像生成を行うこともできるが、各構成における処理をコンピュータに実行させるための実行プログラムを生成し、例えば汎用のパーソナルコンピュータ、サーバ等にそのプログラムをインストールすることにより、本発明に係る映像生成を実現することができる。 <Execution program>
Here, the video generation device 10 described above can perform video generation in the present invention using the dedicated device configuration described above, but generates an execution program for causing a computer to execute the processing in each configuration, For example, the video generation according to the present invention can be realized by installing the program in a general-purpose personal computer, server, or the like.

＜ハードウェア構成＞
ここで、本発明における映像生成処理が実行可能なコンピュータのハードウェア構成例について図を用いて説明する。図８は、本発明における映像生成処理が実現可能なハードウェア構成の一例を示す図である。 <Hardware configuration>
Here, an example of a hardware configuration of a computer capable of executing the video generation processing according to the present invention will be described with reference to the drawings. FIG. 8 is a diagram illustrating an example of a hardware configuration capable of realizing the video generation processing according to the present invention.

図８におけるコンピュータ本体には、入力装置７１と、出力装置７２と、ドライブ装置７３と、補助記憶装置７４と、メモリ装置７５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）７６と、ネットワーク接続装置７７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 8 includes an input device 71, an output device 72, a drive device 73, an auxiliary storage device 74, a memory device 75, a CPU (Central Processing Unit) 76 for performing various controls, and a network connection device. 77, which are connected to each other by a system bus B.

入力装置７１は、ユーザ等が操作するキーボード及びマウス等のポインティングデバイスや音声入力デバイス等を有しており、ユーザ等からのプログラムの実行指示等、各種操作信号、音声信号を入力する。出力装置７２は、本発明における処理を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するディスプレイやスピーカ等を有し、ＣＰＵ７６が有する制御プログラムにより実行経過や結果等を表示又は音声出力することができる。 The input device 71 includes a keyboard and a pointing device such as a mouse operated by a user, a voice input device, and the like, and inputs various operation signals and voice signals such as a program execution instruction from the user. The output device 72 has a display, a speaker, and the like that display various windows and data necessary for operating the computer main body for performing the processing according to the present invention. Display or audio output is possible.

ここで、本発明において、コンピュータ本体にインストールされる実行プログラムは、例えばＣＤ−ＲＯＭやＤＶＤ等の記録媒体７８等により提供される。プログラムを記録した記録媒体７８は、ドライブ装置７３にセット可能であり、記録媒体７８に含まれる実行プログラムが、記録媒体７８からドライブ装置７３を介して補助記憶装置７４にインストールされる。 Here, in the present invention, the execution program installed in the computer main body is provided by a recording medium 78 such as a CD-ROM or a DVD. The recording medium 78 on which the program is recorded can be set in the drive device 73, and the execution program included in the recording medium 78 is installed from the recording medium 78 to the auxiliary storage device 74 via the drive device 73.

また、ドライブ装置７３は、本発明に係る実行プログラムを記録媒体７８に記録することができる。これにより、その記録媒体７８を用いて、他の複数のコンピュータに容易にインストールすることができ、容易に映像生成処理を実現することができる。 Further, the drive device 73 can record the execution program according to the present invention in the recording medium 78. Thus, the recording medium 78 can be used for easy installation on a plurality of other computers, and video generation processing can be easily realized.

補助記憶装置７４は、ハードディスク等のストレージ手段であり、本発明における実行プログラムや、コンピュータに設けられた制御プログラム等を蓄積し必要に応じて入出力を行うことができる。また、補助記憶装置７４は、上述した動き出力映像の設定条件や、被写体領域抽出結果、合成画像、多視点映像、出力映像、カメラパラメータ、生成される表示画面等を蓄積する蓄積手段として用いることもできる。 The auxiliary storage device 74 is a storage means such as a hard disk, and can store an execution program according to the present invention, a control program provided in a computer, etc., and perform input / output as necessary. Further, the auxiliary storage device 74 is used as a storage unit that stores the setting conditions of the motion output video, the subject region extraction result, the composite image, the multi-view video, the output video, the camera parameter, the generated display screen, and the like. You can also.

メモリ装置７５は、ＣＰＵ７６により補助記憶装置７４から読み出された実行プログラム等を格納する。なお、メモリ装置７５は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等からなる。 The memory device 75 stores an execution program or the like read from the auxiliary storage device 74 by the CPU 76. The memory device 75 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵ７６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、及び補助記憶装置７４から読み出されメモリ装置７５に格納されている実行プログラム等に基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して、映像生成における各処理を実現することができる。また、プログラムの実行中に必要な各種情報等は、補助記憶装置７４から取得することができ、また格納することもできる。 Based on a control program such as an OS (Operating System) and an execution program read from the auxiliary storage device 74 and stored in the memory device 75, the CPU 76 inputs various calculations and data with each hardware component. Each process in the video generation can be realized by controlling processing of the entire computer such as output. Various information necessary during the execution of the program can be acquired from the auxiliary storage device 74 and stored.

ネットワーク接続装置７７は、電話回線やＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ケーブル等の通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、プログラムを実行することで得られた実行結果又は本発明における実行プログラムを他の端末等に提供することができる。 The network connection device 77 obtains an execution program from another terminal connected to the communication network or executes the program by connecting to a communication network such as a telephone line or a LAN (Local Area Network) cable. The execution result obtained in this way or the execution program in the present invention can be provided to other terminals or the like.

上述したようなハードウェア構成により、特別な装置構成を必要とせず、低コストで映像生成処理を実現することができる。また、プログラムをインストールすることにより、容易に映像生成処理を実現することができる。次に、本発明における実行プログラムを用いた処理手順についてフローチャートを用いて説明する。 With the hardware configuration as described above, a video generation process can be realized at a low cost without requiring a special device configuration. Also, by installing the program, it is possible to easily realize the video generation process. Next, a processing procedure using the execution program in the present invention will be described using a flowchart.

＜映像生成処理＞
図９は、本発明における映像生成処理手順の一例を示すフローチャートである。図９において、まず撮像手段としてのカメラ等を用いて動く被写体を撮影した映像を取得する（Ｓ０１）。なお、Ｓ０１の処理では、設置位置の異なる複数のカメラから撮影された映像を用いるものとする。また、Ｓ０１の処理では、映像（画像）そのもののデータ（フレーム）だけでなく、撮影時間、各カメラのカメラパラメータ等を取得する。 <Video generation processing>
FIG. 9 is a flowchart showing an example of a video generation processing procedure in the present invention. In FIG. 9, first, an image obtained by photographing a moving subject using a camera or the like as an imaging unit is acquired (S01). Note that in the processing of S01, images taken from a plurality of cameras having different installation positions are used. In the process of S01, not only the data (frame) of the video (image) itself, but also the shooting time, camera parameters of each camera, and the like are acquired.

次に、最終的にどのような映像を出力するか、その出力映像の設定を行う（Ｓ０２）。具体的には、どの時間帯で、どのカメラのフレームや背景画像を用いて、どのカメラから抽出された被写体を合成し、また複数のカメラから多視点映像を生成し、それらの合成した画像や映像を所定の時間帯で繋ぎ合わせて１つの映像を生成するための各種条件を設定する。なお、これらの設定条件は、例えば蓄積手段１４等に蓄積しておくことができ、また任意に設定を変更することもできる。 Next, what kind of video is finally output is set for the output video (S02). Specifically, in which time frame, using the frame and background image of which camera, the subject extracted from which camera is synthesized, and the multi-view video is generated from a plurality of cameras. Various conditions are set for generating one video by connecting the video in a predetermined time zone. These setting conditions can be stored, for example, in the storage unit 14 or the like, and the settings can be arbitrarily changed.

次に、カメラから撮影された映像のうち、Ｓ０２の処理により設定された条件により出力される映像から被写体領域を抽出し（Ｓ０３）、抽出した被写体と、Ｓ０２の処理において設定された出力映像の設定条件に対応する所定のカメラから得られた背景画像（被写体も含む）とを合成し、合成画像を生成する（Ｓ０４）。 Next, a subject area is extracted from the video captured by the camera according to the conditions set in the process of S02 (S03), and the extracted subject and the output video set in the process of S02 are extracted. A background image (including a subject) obtained from a predetermined camera corresponding to the setting condition is synthesized to generate a synthesized image (S04).

更に、Ｓ０２の処理において設定された出力映像の条件に対応する多視点映像を生成する（Ｓ０５）。次に、Ｓ０４の処理により得られる合成画像と、Ｓ０５の処理により得られる多視点映像とを用いて、Ｓ０２における出力映像の設定条件に基づいて、出力映像を生成し（Ｓ０６）、生成した映像を画面に出力する（Ｓ０７）。なお、生成した映像は、例えば蓄積手段１４等に蓄積させてもよい。 Further, a multi-view video corresponding to the output video condition set in the process of S02 is generated (S05). Next, using the composite image obtained by the process of S04 and the multi-view video obtained by the process of S05, an output video is generated based on the setting condition of the output video in S02 (S06), and the generated video Is output to the screen (S07). The generated video may be stored in the storage unit 14 or the like, for example.

ここで、上述した処理手順においては、上述のＳ０１の処理とＳ０２の処理とを逆にしてもよい。つまり、予め設定された出力映像の設定条件に対応する映像のみを所定のカメラから取得することもできる。これにより、無駄な映像を蓄積することがなく、記録容量を削減することができる。また、必要な映像のみを効率的に取得することができる。また、Ｓ０４の処理における合成画像生成や、Ｓ０５の処理における多視点映像生成を逆の順序で行ってよい。上述した映像生成処理により、時間経過に伴って動く被写体を映像として高精度に提示することができる。 Here, in the processing procedure described above, the processing in S01 and the processing in S02 may be reversed. That is, it is possible to acquire only a video corresponding to a preset output video setting condition from a predetermined camera. Thereby, it is possible to reduce the recording capacity without accumulating useless video. In addition, only necessary images can be acquired efficiently. Further, the composite image generation in the process of S04 and the multi-view video generation in the process of S05 may be performed in the reverse order. By the video generation process described above, a subject that moves with time can be presented with high accuracy as a video.

上述したように本発明によれば、時間経過に伴って動く被写体を映像として高精度に提示することができる。また、本発明によれば、例えばスポーツ選手等の時間経過に沿った動きを、様々な視点から見ることができるため、スポーツ中継や運動解析等において視覚的にわかりやすい運動情報の提示を行うことができる。 As described above, according to the present invention, it is possible to present a subject that moves with time as a video with high accuracy. In addition, according to the present invention, for example, movement along the passage of time of a sports player or the like can be seen from various viewpoints, so that it is possible to present exercise information that is visually easy to understand in sports relaying, exercise analysis, and the like. it can.

以上本発明の好ましい実施の形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 Although the preferred embodiment of the present invention has been described in detail above, the present invention is not limited to the specific embodiment, and various modifications are possible within the scope of the gist of the present invention described in the claims. Can be changed.

本発明における映像生成装置の一構成例を示す図である。It is a figure which shows the example of 1 structure of the video production | generation apparatus in this invention. 多視点映像を取得するための一例を示す図である。It is a figure which shows an example for acquiring a multiview video. 図２に対応して得られる映像フレームの一例を示す図である。It is a figure which shows an example of the video frame obtained corresponding to FIG. 被写体領域画像の映像信号の一例を示す図である。It is a figure which shows an example of the video signal of a to-be-photographed area | region image. 合成画像の一例を示す図である。It is a figure which shows an example of a synthesized image. 出力映像生成の第１の実施例を説明するための図である。It is a figure for demonstrating the 1st Example of output image generation. 出力映像生成の第２の実施例を説明するための図である。It is a figure for demonstrating the 2nd Example of output image generation. 本発明における映像生成処理が実現可能なハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions which can implement | achieve the image | video production | generation process in this invention. 本発明における映像生成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the image | video production | generation process procedure in this invention.

Explanation of symbols

１０映像生成装置
１１撮像手段
１２入力手段
１３出力手段
１４蓄積手段
１５出力映像設定手段
１６被写体領域抽出手段
１７画像合成手段
１８多視点映像生成手段
１９出力映像生成手段
２０送受信手段
２１制御手段
３０，４１、５２，６２被写体
３１カメラ
３２被写体領域
３３背景領域
４０合成画像
４２，５１，６１背景画像
５０，６０フレーム
７１入力装置
７２出力装置
７３ドライブ装置
７４補助記憶装置
７５メモリ装置
７６ＣＰＵ
７７ネットワーク接続装置
７８記録媒体 DESCRIPTION OF SYMBOLS 10 Image | video production | generation apparatus 11 Imaging means 12 Input means 13 Output means 14 Accumulation means 15 Output image setting means 16 Subject area extraction means 17 Image composition means 18 Multi-view image generation means 19 Output image generation means 20 Transmission / reception means 21 Control means 30, 41 , 52, 62 Subject 31 Camera 32 Subject region 33 Background region 40 Composite image 42, 51, 61 Background image 50, 60 Frame 71 Input device 72 Output device 73 Drive device 74 Auxiliary storage device 75 Memory device 76 CPU
77 Network connection device 78 Recording medium

Claims

In a video generation device that generates a predetermined video based on a preset condition from videos of a moving subject captured by a plurality of imaging means,
Output video setting means for setting the content of the video to be output;
Subject area extraction means for extracting a subject area of a video selected from videos taken by the plurality of imaging means based on setting conditions obtained by the output video setting means;
Image synthesis means for synthesizing the subject area extracted by the subject area extraction means with a predetermined background image;
A multi-view video generation unit that generates a multi-view video by switching a video selected from videos captured by the plurality of imaging units in a predetermined order based on a setting condition obtained by the output video setting unit;
An image generation apparatus comprising: an output image generation unit configured to generate an output image from images obtained by the image synthesis unit and the multi-viewpoint image generation unit.

The image composition means includes
2. The video generation apparatus according to claim 1, wherein a plurality of subject area images extracted by the subject area extraction unit are combined with the background image in a predetermined order.

The image composition means includes
3. The video generation apparatus according to claim 2, wherein the background image is synthesized in an order based on distance information from the imaging unit to a subject area obtained by camera parameters in the plurality of imaging units.

The imaging means includes
4. The video generation apparatus according to claim 1, wherein only a video from an imaging unit that captures a video used for an output video set by the output video setting unit is acquired. 5.

The image composition means includes
5. The video generation apparatus according to claim 1, wherein images having different numbers of subjects to be combined with the same image are generated based on setting conditions obtained by the output video setting unit. 6. .

In a video generation program for causing a computer to execute a video generation process for generating a predetermined video based on a preset condition from videos of a moving subject imaged by a plurality of imaging means,
An output video setting step for setting the content of the video to be output;
A subject area extracting step for extracting a subject area of a video selected from videos taken by the plurality of imaging means based on the setting condition obtained by the output video setting step;
An image synthesis step of synthesizing the subject area extracted by the subject area extraction step with a predetermined background image;
A multi-view video generation step of generating a multi-view video by switching a video selected from videos taken by the plurality of imaging means in a predetermined order based on the setting condition obtained by the output video setting step;
A video generation program for causing a computer to execute an output video generation step of generating an output video from the video obtained by the image synthesis step and the multi-view video generation step.