JPH10164435A

JPH10164435A - Time-space integration and managing method for a plurality of video images, its device and recording medium recording its program

Info

Publication number: JPH10164435A
Application number: JP9271600A
Authority: JP
Inventors: Akito Akutsu; 明人阿久津; Yoshinobu Tonomura; 佳伸外村; Hiroshi Hamada; 洋浜田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-10-04
Filing date: 1997-10-03
Publication date: 1998-06-19
Anticipated expiration: 2017-10-03
Also published as: JP3499729B2

Abstract

PROBLEM TO BE SOLVED: To provide an integration method and device for a plurality of video images to manage, express and operate them in unified way, in which video information is efficiently acquired from a plurality of video images in a same photographic space. SOLUTION: A photographic state detection section 103 reds an image data string to detect camera on/off information and camera operation information. A video split section 104 splits a video image into shots based on the camera on/off information, and an object background separate section 105 separates an object from a background based on the camera operation information. While an object motion information extract section 106 makes object information mutually correspond between frames, a photographic space re-compositing section 107 composites again the camera operation information and the background and an inter-shot relation calculation section 108 calculates a spatial relation among a plurality of photographic spaces composited again. The camera on/off information, the camera operation information, and each information among the objects, the motion of object, re-composited background and shots are managed and stored in time space and one object or over and the image pickup space are composited again, according to a request of the user or the like and the result is displayed or outputted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の映像の統合
により新たな映像を作成する技術に関するものであっ
て、特に複数映像の時空間統合，管理方法及びその装置
並びにそのプログラムを記録した記録媒体に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for creating a new video by integrating a plurality of videos, and more particularly to a method and apparatus for integrating and managing a plurality of videos in a spatio-temporal manner, and a recording device for recording the program. It is about media.

【０００２】[0002]

【従来の技術】コンピュータの高性能化、ハードディス
クの大容量化、ビデオカメラの小型化，デジタル化と、
これらの低価格化に伴い、一般家庭へのそれらの機器の
普及が進んでいる。このような高性能コンピュータ及び
大容量ハードディスクの普及、さらには情報の圧縮技術
の確立により、映像のデジタル化が身近なものとなって
きている。映像をデジタル化することにより、汎用パー
ソナルコンピュータでの映像の扱いが可能になると共
に、高解像度のパーソナルコンピュータ用ディスプレイ
への出力も可能となった。このことは、一般に映像は、
ＴＶモニタ（６４０×４８０画素）のみの出力に限られ
ていたが、ＴＶモニタの解像度以上の解像度での出力も
可能としている。例えば、パーソナルコンピュータ用デ
ィスプレイ上へは、複数の映像を同時に表示すること等
が可能である。2. Description of the Related Art Computers with higher performance, hard disks with larger capacities, video cameras with smaller sizes, digitalization,
Along with these price reductions, the use of such devices in general households is increasing. With the widespread use of such high-performance computers and large-capacity hard disks, and the establishment of information compression techniques, video digitization has become familiar. Digitization of video has made it possible to handle video on a general-purpose personal computer and output it to a high-resolution personal computer display. This generally means that
Although output is limited to only the TV monitor (640 × 480 pixels), output at a resolution higher than the resolution of the TV monitor is also possible. For example, it is possible to simultaneously display a plurality of images on a display for a personal computer.

【０００３】映像機器のこのような発展に伴い、映像を
エンハンスする方法が幾つか報告されている。Ｍｉｃｈ
ａｌＩｒａｎｉａｍｄＳｈｍｕｅｌＰｅｌｅｇ
は、文献「“Ｍｏｔｉｏｎａｎａｌｙｓｉｓｆｏｒ
ｉｍａｇｅｅｎｈａｎｃｅｍｅｎｔ：Ｒｅｓｏｌｕ
ｔｉｏｎ，ＯｃｃｌｕｓｉｏｎａｎｄＴｒａｎｓｐ
ａｒｅｎｃｙ”，ＪｏｕｒｎａｌｏｆＶｉｓｕａｌ
ｃｏｍｍｕｎｉｃａｔｉｏｎａｎｄｉｍａｇｅ
ｒｅｐｒｅｓｅｎｔａｔｉｏｎ，Ｖｏｌ．４，Ｎｏ．
４，Ｄｅｃｅｍｂｒ，ｐｐ，３２４−３３５，１９９
３」で、映像中の動き情報を用いて、映像を高解像度に
する方法、被写体などによる背景のいんぺい領域を補間
する方法等を提案している。また、ローラエイ．テイ
ドシオは、映像から高解像の静止画を作成する方法を報
告している（特開平５−３０４６７５号）。最近では、
ＯＲＡＤ社が、映像に新しい付加価値を付けた映像の再
生を実現したシステム“ＤｉｇｉｔａｌＲｅｐｌａ
ｙ”を発表している。システムの機能として例えば、被
写体の強調表現，追跡及び拡大機能，並びに図形及びそ
の線分，距離，及び速度等の情報を映像と共に表示する
機能を有している。With the development of video equipment, several methods for enhancing video have been reported. Mich
al Irani amd Shmuel Peleg
Is described in the document "Motion analysis for
image enhancement: Resolu
, Occlusion and Transp
arency ", Journal of Visual
communication and image
representation, Vol. 4, No.
4, Decembr, pp, 324-335, 199
No. 3 "proposes a method of using a motion information in a video to increase the resolution of the video, a method of interpolating a noisy area of a background caused by a subject, and the like. Also, Laura A. Tadecio reports a method of creating a high-resolution still image from a video (Japanese Patent Application Laid-Open No. 5-304675). recently,
ORAD has developed a system called "Digital Repla," which has realized the reproduction of video with new added value.
The functions of the system include, for example, an emphasis expression, tracking and enlargement function of a subject, and a function of displaying a graphic and information such as a line segment, a distance, and a speed thereof together with an image.

【０００４】また、映像の新しいユーザインタフェース
に関する報告もあり、Ｍ．Ｍｉｌｌ他は、文献「“Ａ
ＭａｇｎｉｆｉｅｒＴｏｏｌｆｏｒＶｉｄｅｏ
Ｄａｔａ”，ＰｒｏｃｅｅｄｉｎｓｏｆＣＨＩ’９
２，ｐｐ．９３−９８（１９９２）」で、映像のフレー
ムを時間解像度レベルに応じて空間に配置し、粗い時間
解像度から細かい時間解像度へと時間への新しい映像の
見方、アクセススタイルを可能にした報告をしている。
また、Ｅ．ＥｌｌｉｏｔａｎｄＡ．Ｗ．Ｄａｖｉｓ
は、文献「“ＭｏｔｉｏｎＩｍａｇｅＰｒｏｃｅｓ
ｓｉｎｇ”，ＳｔｒｉｋｉｎｇＰｏｓｓｉｂｉｌｉｔ
ｉｅｓ，ＡＤＶＡＮＣＥＤＩＭＡＧＩＮＧ，ＡＵＧＵ
ＳＴ（１９９２）」て、映像画像（２次元）＋時間（１
次元）の３次元物体として表現し、映像の時間情報の新
しい表現方法と時間情報への直観的なアクセスの実現を
報告している。There is also a report on a new video user interface. Mill et al., Reference "A
Magnifier Tool for Video
Data ", Proceedins of CHI'9
2, pp. 93-98 (1992) ", a report was made that allowed frames of video to be arranged in space according to the temporal resolution level, and allowed a new perspective and access style for video from coarse temporal resolution to fine temporal resolution. ing.
Also, E.I. Elliot and A. W. Davis
Is based on the document "Motion Image Processes
sing ”, Striking Possibilit
ies, ADVANCED IMAGING, AUGU
ST (1992) ", and the video image (two-dimensional) + time (1
(3D) as a three-dimensional object, and reports on a new method of expressing time information of video and the realization of intuitive access to time information.

【０００５】ところで、ビデオカメラによる入力が手軽
になり、表示の形態も自由になる環境においては、複数
の映像を扱いたいという要求が生じる。ここでいう「複
数の映像」とは、例えば、オリンピック等のスポーツ競
技場へ複数設置されたカメラによる複数の映像であると
か、一台のカメラで撮影した映像でも、異なる選手の様
子を各々撮影した映像とかである。また、ここで言う
「扱う」とは、複数の映像を同時に鑑賞したり、比較し
たり、検索，編集したりすることである。By the way, in an environment where input by a video camera is easy and display form is free, there is a demand to handle a plurality of images. The term "plurality of images" as used herein means, for example, a plurality of images by cameras installed at sports stadiums such as the Olympic Games, or images of different players, even with images taken by one camera. Such as the video that did. The term "handling" used herein refers to simultaneously watching, comparing, searching, and editing a plurality of videos.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、複数の
映像に対しては上記従来の報告等では、解決していない
課題が存在する。上記従来の報告では、一つのショット
に対して、エンハンスすることを実現しているが、複数
の映像に対して複数映像間の関係を用いて複数映像をエ
ンハンスすることは不可能である。加えて、複数映像を
時空間的に統合し、統合的に管理しうるものではない。
ここで言う「複数の映像に対してのエンハンスメント」
とは、もとの複数の映像間で関係付けを行うことであ
り、映像コンテンツ情報（映像内に実際に映し出されて
いる対象の情報、すなわち被写体の情報）を自動抽出
し、視覚的に表現することである。また、複数の映像を
関係付け、映像コンテンツ情報が視覚的に表現された一
つ又は複数の映像を作り出すことである。例えば、異な
る選手の様子を撮影した複数の映像から、各選手を共通
空間でオーバーラップさせた一つの映像を作り出すこと
も一つの例である。また、「時空間的に管理する」と
は、映像の時空間構造に基づいて抽出された情報を統一
的に管理することである。例えば、被写体と背景を分離
し、それぞれを管理すると共に、それらの時間空間の関
係情報も管理することである。上記従来の技術の目的に
は、複数の映像を時空間構造を用いて統一的に管理する
ことは、含まれておらず、単純に従来の技術の組み合わ
せでだけでは、複数映像を時空間的に統一的に管理する
ことを実現するのは不可能である。However, there are problems which have not been solved in the above-mentioned conventional reports for a plurality of videos. In the above-mentioned conventional report, enhancement is realized for one shot, but it is impossible to enhance a plurality of videos using a relationship between the plurality of videos. In addition, it is not possible to integrate a plurality of videos spatiotemporally and manage them in an integrated manner.
Here, "enhancement for multiple images"
Is to associate the original videos with each other, and automatically extracts the video content information (information on the subject actually shown in the video, that is, information on the subject) and visually expresses it. It is to be. It is also to associate a plurality of videos and create one or more videos in which video content information is visually represented. For example, one example is to create a single image in which each player overlaps in a common space from a plurality of images of different players. Further, "manage spatiotemporally" means to integrally manage information extracted based on the spatiotemporal structure of video. For example, the subject and the background are separated, and each is managed, and the time-space related information is also managed. The purpose of the above-mentioned conventional technology does not include the unified management of a plurality of videos by using a spatio-temporal structure. It is impossible to achieve unified management.

【０００７】また、従来、実現されている映像のユーザ
インタフェースに関しては、映像をただ単に表現し直し
たものであり、映像の持つ情報や複数映像間の関係を積
極的に抽出し、映像をエンハンスしたものではない。映
像のコンテントや複数の映像間の関係に関して何も考慮
されていないために、映像や映像間コンテントに対する
直感的な把握や、映像情報、特に時間に関する情報の新
たなエンハンスを可能にするものではない。また、複数
映像が時空間的に管理されていないために、映像のコン
テンツに踏み込んだインタラクションは不可能である。[0007] Further, regarding the video user interface that has been conventionally realized, the video is simply re-expressed, and the information of the video and the relationship between a plurality of videos are actively extracted to enhance the video. It was not done. It does not allow for intuitive understanding of video or inter-video content or new enhancements to video information, especially time related information, because nothing is considered about video content or the relationship between multiple videos. . Further, since a plurality of videos are not managed spatiotemporally, it is impossible to interact with the content of the videos.

【０００８】このように従来報告されている映像のエン
ハンスメント、ユーザインタフェース等に関する報告に
は、複数の映像に対して高付加価値を与えるエンハンス
メントできるもの、また時空間的に統一的に管理、表
現、操作を表現しうるものはない。すなわち、上記従来
の技術には問題がある。[0008] As described above, reports on video enhancement, user interface, etc., which are conventionally reported, include enhancements that can provide high added value to a plurality of videos, and management, expression, Nothing can express the operation. That is, there is a problem in the above-mentioned conventional technology.

【０００９】本発明の目的は、同じ空間を撮影した複数
の映像に対して、ユーザが興味や目的に応じて各自のス
タイルで映像の情報を効率よく取得するために、複数映
像を時空間的に統合して映像をエンハンスすることがで
き、複数映像を時空間的に統一的に管理、表現、操作す
ることができる複数映像の時空間統合，管理方法及びそ
の装置並びにそのプログラムを記録した記録媒体を提供
することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a plurality of videos in a spatio-temporal manner in order for a user to efficiently acquire video information in his / her own style according to his / her interests and purposes for a plurality of videos taken in the same space. A method and apparatus for integrating and managing spatio-temporal video, and a device and a program for recording the program, which can integrate, manage, express and operate multiple videos in a unified manner To provide a medium.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明による複数映像の時空間統合、管理装置
は、複数の映像を時空間統合、管理する装置であって、
映像データを読み込み、データ列として保存する画像デ
ータ列メモリ部と、前記画像データ列メモリ部からデー
タ列を読み出し、カメラオンオフ情報とカメラ操作情報
を含む撮影状態情報を検出する撮影状態検出部と、前記
カメラオンオフ情報に基づいて前記データ列の映像をシ
ョット毎に分割する映像分割部と、前記カメラ操作情報
と物理的な特微量を用いて前記映像のフレーム毎に被写
体と背景を分離する被写体、背景分離部と、前記フレー
ム毎に分離された被写体情報をフレーム間で対応付ける
被写体動き情報抽出部と、前記カメラ操作情報と前記フ
レーム毎に分離された背景から前記映像が撮影された撮
影空間を再合成する撮影空間再合成部と、前記分割され
た複数のショットから前記撮影空間再合成部で各々再合
成された複数の撮影空間の間の空間的なショット間関係
を算出するショット間関係算出部と、前記分離された被
写体の情報、前記対応付けられた被写体の情報、前記撮
影状態情報、前記背景の情報、前記ショット間関係の情
報を管理・蓄積する映像構造情報管理・蓄積部と、を具
備することを特徴とする。In order to achieve the above object, a spatio-temporal integration and management apparatus for a plurality of videos according to the present invention is an apparatus for integrating and managing a plurality of videos in a spatio-temporal manner.
An image data string memory unit that reads video data and saves it as a data string, a shooting state detection unit that reads a data string from the image data string memory unit, and detects shooting state information including camera on / off information and camera operation information, A video dividing unit that divides the video of the data string into shots based on the camera on / off information, and a subject that separates a subject and a background for each frame of the video using the camera operation information and a physical characteristic amount; A background separation unit, a subject motion information extraction unit for associating subject information separated for each frame between frames, and a camera space for capturing the video from the camera operation information and the background separated for each frame; An imaging space resynthesizing unit for synthesizing, and a plurality of images recombined from the plurality of divided shots by the imaging space resynthesizing unit. An inter-shot relationship calculation unit that calculates a spatial inter-shot relationship between spaces; information on the separated subject; information on the associated subject; the shooting state information; the information on the background; And a video structure information management / storage section that manages / accumulates related information.

【００１１】また、上記の複数映像の時空間統合、管理
装置において、前記抽出された被写体情報，撮影状態情
報，背景情報，ショット間関係情報及び映像データのす
べて又はその一部を伝送又は受信する映像構造化情報伝
送受信部を、更に具備することを特徴とする。Further, in the above-mentioned spatio-temporal integration and management apparatus for a plurality of videos, all or a part of the extracted subject information, photographing state information, background information, inter-shot relation information and video data is transmitted or received. A video structured information transmission / reception unit is further provided.

【００１２】また、上記の複数映像の時空間統合、管理
装置において、予め与えた条件とユーザからの要求のい
ずれか一方又は双方に従って前記映像構造情報管理・蓄
積部に蓄積・管理されている情報を基に一つ又は複数の
撮影空間と一つ又は複数の被写体を再合成する再合成部
と、前記再合成部で再合成された映像を表示する表示部
と、前記表示部で表示されている映像に基づいて再合成
に関する前記ユーザの要求を入力するユーザ入力部と、
を具備するとともに、必要に応じて前記表示部に表示さ
れた映像をデジタル又はアナログ形式で外部装置に出力
する出力部を具備する、ことを特徴とする。Further, in the above-mentioned spatio-temporal integration and management apparatus for a plurality of videos, information stored and managed in the video structure information management / storage section according to one or both of a condition given in advance and a request from a user. A re-synthesizing unit for re-synthesizing one or a plurality of photographing spaces and one or a plurality of subjects based on, a display unit for displaying an image re-synthesized by the re-synthesizing unit, and displayed on the display unit. A user input unit for inputting the user's request for recomposition based on the video being
And an output unit for outputting a video displayed on the display unit to an external device in a digital or analog format as needed.

【００１３】同じく、本発明による複数映像の時空間統
合、管理方法は、複数の映像を時空間統合、管理する方
法であって、映像データを読み込み、データ列として保
存する画像データ列保存過程と、前記画像データ列メモ
リ部からデータ列を読み出し、カメラオンオフ情報とカ
メラ操作情報を含む撮影状態情報を検出する撮影状態検
出過程と、前記カメラオンオフ情報に基づいて前記デー
タ列の映像をショット毎に分割する映像分割過程と、前
記カメラ操作情報と物理的な特徴量を用いて前記映像の
フレーム毎に被写体と背景を分離する被写体、背景分離
過程と、前記フレーム毎に分離された被写体情報をフレ
ーム間で対応付ける被写体動き情報を抽出過程と、前記
カメラ操作情報と前記フレーム毎に分離された背景から
前記映像が撮影された撮影空間を再合成する撮影空間再
合成過程と、前記分割された複数のショットから前記撮
影空間再合成過程で各々再合成された複数の撮影空間の
間の空間的なショット間関係を算出するショット間関係
算出過程と、前記分離された被写体の情報、前記対応付
けられた被写体の情報、前記撮影撮影情報、前記背景の
情報、前記ショット間関係の情報を管理・蓄積する映像
構造情報管理・蓄積過程と、を具備することを特徴とす
る。Similarly, the spatio-temporal integration and management method for a plurality of videos according to the present invention is a method for integrating and managing a plurality of videos in a spatio-temporal manner, and includes a process of reading video data and storing it as a data sequence. Reading a data string from the image data string memory unit, detecting a shooting state information including camera on / off information and camera operation information, and a video of the data string on a shot-by-shot basis based on the camera on / off information. A video dividing process, a subject for separating a subject and a background for each frame of the video using the camera operation information and the physical feature amount, a background separating process, and subject information separated for each frame to a frame. Extracting the subject motion information to be associated with each other, and capturing the video from the camera operation information and the background separated for each frame. And calculating a spatial inter-shot relationship between the plurality of divided shots and the plurality of shooting spaces re-combined in the shooting space re-synthesis process from the plurality of divided shots. An image structure information management process for managing / accumulating the inter-shot relationship calculation process, the separated subject information, the associated subject information, the shooting / photographing information, the background information, and the shot relationship information. And an accumulating step.

【００１４】また、上記の複数映像の時空間統合、管理
方法において、前記抽出された被写体情報，撮影状態情
報，背景情報，ショット間関係情報及び映像データのす
べて又はその一部を伝送又は受信する映像構造化情報伝
送受信過程を、更に具備することを特徴とする。Further, in the above-mentioned spatio-temporal integration and management method for a plurality of videos, all or a part of the extracted subject information, photographing state information, background information, inter-shot relation information and video data is transmitted or received. The method further comprises the step of transmitting and receiving the image structured information.

【００１５】また、上記の複数映像の時空間統合、管理
方法において、前記映像構造情報管理・蓄積過程の後
に、予め与えた条件とユーザからの要求のいずれか一方
又は双方に従って前記蓄積・管理されている情報を基に
一つ又は複数の撮影空間と一つ又は複数の被写体を再合
成する再合成過程と、前記再合成過程で再合成された映
像を表示又は出力する表示又は出力過程と、を具備する
ことを特徴とする。Further, in the above-mentioned method for integrating and managing a plurality of videos, the video structure information is stored and managed according to one or both of a given condition and a request from a user after the video structure information management and storage process. A re-synthesizing step of re-synthesizing one or a plurality of photographing spaces and one or a plurality of subjects based on the information that has been displayed, and a display or output step of displaying or outputting an image re-synthesized in the re-synthesizing step It is characterized by having.

【００１６】同じく、本発明による複数映像の時空間統
合、管理プログラムを記録したコンピュータ読み取り可
能な記録媒体は、複数の映像を時空間統合、管理するプ
ログラムを記録したコンピュータ読み取り可能な記録媒
体であって、映像データを読み込み、データ列として保
存する画像データ列保存過程と、前記画像データ列メモ
リ部からデータ列を読み出し、カメラオンオフ情報とカ
メラ操作情報を含む撮影状態情報を検出する撮影状態検
出過程と、前記カメラオンオフ情報に基づいて前記デー
タ列の映像をショット毎に分割する映像分割過程と、前
記カメラ操作情報と物理的な特徴量を用いて前記映像の
フレーム毎に被写体と背景を分離する被写体，背景分離
過程と、前記フレーム毎に分離された被写体情報をフレ
ーム間で対応付ける被写体動き情報抽出過程と、前記カ
メラ操作情報と前記フレーム毎に分離された背景から前
記映像が撮影された撮影空間を再合成する撮影空間合成
過程と、前記分割された複数にショットから前記撮影空
間再合成過程で各々再合成された複数の撮影空間の空間
的なショット間関係を算出するショット間関係算出過程
と、前記分離された被写体の情報、前記対応付けられた
被写体の情報、前記撮影状態情報、前記背景の情報、前
記ショット間関係の情報を管理・蓄積する映像構造情報
管理・蓄積過程と、を具備することを特徴とする。Similarly, the computer-readable recording medium recording the spatio-temporal integration and management program of a plurality of videos according to the present invention is a computer-readable recording medium recording a program for integrating and managing a plurality of videos in a spatio-temporal manner. An image data string storing step of reading video data and storing the data as a data string; and a shooting state detecting step of reading a data string from the image data string memory unit and detecting shooting state information including camera on / off information and camera operation information. And a video dividing step of dividing the video of the data sequence into shots based on the camera on / off information, and separating a subject and a background for each frame of the video using the camera operation information and a physical feature amount. The subject and background separation process and the subject information separated for each frame are associated with each other between frames. A subject movement information extracting step, a photographing space synthesizing step of recombining a photographing space in which the video is photographed from the camera operation information and the background separated for each frame, and a photographing space from the plurality of divided shots. An inter-shot relation calculating step of calculating a spatial inter-shot relation of a plurality of shooting spaces re-combined in the re-combining step; information on the separated subject; information on the associated subject; A video structure information managing / accumulating step of managing / accumulating information, the background information, and the information on the relationship between shots.

【００１７】また、上記の複数映像の時空間統合、管理
プログラムを記録したコンピュータ読み取り可能な記録
媒体において、前記抽出された被写体情報，撮影状態情
報，背景情報，ショット間関係情報及び映像データのす
べて又はその一部を伝送又は受信する映像構造化情報伝
送受信過程を、更に具備することを特徴とする。In the above-mentioned computer-readable recording medium on which a spatio-temporal integration of a plurality of images and a management program are recorded, all of the extracted subject information, photographing state information, background information, inter-shot relation information and video data are stored. Alternatively, the method further comprises the step of transmitting or receiving the video structured information for transmitting or receiving a part thereof.

【００１８】また、上記の複数映像の時空間統合、管理
プログラムを記録したコンピュータ読み取り可能な記録
媒体において、前記映像構造情報管理・蓄積過程の後
に、予め与えた条件とユーザからの要求のいずれか一方
又は双方に従って前記蓄積・管理されている情報を元に
一つ又は複数の撮影空間と一つ又は複数の被写体を再合
成する再合成過程と、前記再合成過程で再合成された映
像を表示又は出力する表示又は出力過程と、を具備する
ことを特徴とする。In the above-mentioned computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded, after the video structure information managing / accumulating process, any one of a condition given in advance and a request from a user is provided. A re-synthesizing step of re-synthesizing one or more photographing spaces and one or more subjects based on the information stored and managed according to one or both, and displaying the video re-synthesized in the re-synthesizing step. Or a display or output process for outputting.

【００１９】本発明では、映像データを読み込み、保存
された画像データ列を読み出し、カメラオンオフ情報と
カメラ操作情報を含む撮影状態情報を検出し、このカメ
ラオンオフ情報に基づいて映像をショット毎に分割し、
カメラ操作情報と物理的な特徴量を用いてフレーム毎に
被写体と背景を分離し、分離された被写体情報をフレー
ム間で対応付けを行って被写体動き情報を抽出し、カメ
ラ操作情報とフレーム毎の背景から撮影空間を再合成
し、複数のショットから各々再合成された複数の撮影空
間の間の空間的なショット間関係を算出し、以上で得ら
れた情報を管理・蓄積することにより、複数映像の時空
間的なエンハンスメント、時空間的、統一的な管理，表
現，操作を可能とする複数映像の時空間統合を実現す
る。According to the present invention, video data is read, a stored image data sequence is read, shooting state information including camera on / off information and camera operation information is detected, and the video is divided into shots based on the camera on / off information. And
The subject and the background are separated for each frame using the camera operation information and the physical feature amount, the separated subject information is correlated between the frames to extract the subject movement information, and the camera operation information and the frame-by-frame By re-synthesizing the shooting space from the background, calculating the spatial inter-shot relationship between the plurality of shooting spaces re-synthesized from the plurality of shots, and managing and accumulating the information obtained above, It realizes spatio-temporal integration of multiple images that enables spatio-temporal enhancement of images, spatio-temporal and unified management, expression, and operation.

【００２０】また、抽出された被写体情報等のすべて又
はその一部を伝送又は受信することにより、ユーザーが
自由に任意の場所でそれらの情報を選択，検索，取り出
し、また任意の場所の任意のユーザーに送ることができ
る。また、任意のユーザーが任意の場所から、それらの
情報を送り込むことができる。Further, by transmitting or receiving all or a part of the extracted subject information or the like, the user can freely select, search, and retrieve the information at any place, and at any place, at any place. Can be sent to users. Also, any user can send such information from any place.

【００２１】また、上記管理・蓄積されている情報を基
に、予め与えた条件やユーザからの要求に従って、一つ
又は複数の撮影空間と一つ又は複数の被写体を再合成
し、表示又は外部へアナログ、デジタル形式で出力する
ことにより、同じ空間を撮影した複数の映像に対し、従
来は時間的に空間的に同時に情報を取得することが困難
であった課題に対して、ユーザが興味，目的に応じて各
自のスタイルで複数映像の情報を同時に直感的に効率よ
く取得できるようにする。Further, based on the information managed and stored, one or more photographing spaces and one or more subjects are recombined in accordance with a given condition or a request from a user, and are displayed or externally displayed. By outputting in analog and digital formats, users are interested in the problem that it was difficult to obtain information simultaneously and spatially simultaneously for a plurality of videos taken in the same space. According to the purpose, it is possible to simultaneously and intuitively and efficiently acquire information of a plurality of videos in their own styles.

【００２２】[0022]

【発明の実施の形態】以下、本発明の実施形態例を、図
面を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

【００２３】図１に、本発明の一実施形態例の装置構成
図を示す。以下、構成図に沿って各構成部及び手法を説
明する。FIG. 1 shows an apparatus configuration diagram of an embodiment of the present invention. Hereinafter, each component and method will be described with reference to the configuration diagram.

【００２４】図１のビデオ信号入力部１０１から入力さ
れたビデオ信号は、１０２の画像データ列メモリ部に一
時的に蓄積される。次に、蓄積されたビデオ信号を１０
３の撮影状態検出部において映像処理し、カメラのオン
オフ情報とカメラ操作情報を抽出する。The video signal input from the video signal input unit 101 shown in FIG. 1 is temporarily stored in an image data string memory unit 102. Next, the accumulated video signal is
Image processing is performed by the photographing state detection unit 3 to extract camera on / off information and camera operation information.

【００２５】撮影状態検出部１０３の構成を図２に示
す。また、図３に撮影状態検出部１０３の２０１から２
０４までの構成要素による処理の流れを示す。これらの
図面を用いて、以下、各構成素毎の構成と処理の流れを
詳細に説明する。FIG. 2 shows the configuration of the photographing state detecting unit 103. Also, FIG.
4 shows the flow of processing by the components up to 04. The configuration and processing flow of each component will be described below in detail with reference to these drawings.

【００２６】１０２の画像データ列メモリ部から読み出
されたビデオ信号の数フレームは、図３に示す３０１で
あり、この３０１を通常、時空間画像と呼んでいる。こ
の時空間画像に対し、２０１の直線成分算出部におい
て、各フレーム画像毎に水平、垂直直線成分を各々算出
する。垂直方向直線算出は、図３では３０２であり、水
平方向直線算出は、図３では３０３である。これらの各
算出で得られた３０４を垂直方向直線成分時空間画像、
３０５を水平方向直線成分時空間画像と呼ぶ。Several frames of the video signal read from the image data string memory section 102 are 301 shown in FIG. 3, and this 301 is usually called a spatio-temporal image. With respect to the spatiotemporal image, the horizontal and vertical linear components are calculated for each frame image in a linear component calculation unit 201. The calculation of the vertical straight line is 302 in FIG. 3, and the calculation of the horizontal straight line is 303 in FIG. 304 obtained by each of these calculations is a vertical linear component spatiotemporal image,
305 is called a horizontal linear component spatiotemporal image.

【００２７】続いて、２０２の画像データ並べ替え部
は、画像データ列再配置手段により、次のフィルター処
理に備えて垂直，水平方向直線成分時空間画像３０４，
３０５の並び替えを行う。ここでの並び替え処理は、図
３では３０６に相当し、画面の法線を含む平面で時空間
画像を切断する処理である。フレーム画像のｘ，ｙ座標
軸と垂直をなす方向を画面の法線方向としている。切断
された時間軸を含む平面を一般に時空間断面画像と呼ん
でいる。この時空間断面画像の一例として、コンピュー
タビジョンの分野で用いられている、カメラの進行方向
と画面の法線を含む平面で、時空間画像を切断した時の
切断面（エピポーラ平面画像（ＥｐｉｐｏｌａｒＰｌ
ａｎｅＩｍａｇｅ））がある。この時空間断面画像か
ら被写体の三次元位置を推定している。これは、このエ
ピポーラ平面画像上で、物体の特徴点の軌跡が直線にな
り、この直線の傾きが物体特徴点の動きの大きさになる
ことによっている〔Ｒ．Ｃ．Ｂｏｌｌｅｓ，Ｈ．Ｂａｋ
ｅｒ，ａｎｄＤ．Ｈ．Ｍａｒｉｍｏｎｔ，“Ｅｐｉｐ
ｏｌａｒ−ｐｌａｎｅｉｍａｇｅａｎａｌｙｓｉ
ｓ：Ａｎａｐｐｒｏａｃｈｔｏｄｅｔｅｒｍｉｎ
ｇｓｔｒｕｃｔｕｒｅｆｒｏｍｍｏｔｉｏｎ”，Ｉ
ＪＣＶ，１，１，ｐｐ７−５５，ｊｕｎｅ１９８
９．〕。時空間画像をｘ，ｙ座標軸を含むように切断し
た時空間切断画像を特にｘ−ｔ時空間画像と呼び、同様
にｙ，ｔ座標軸を含む時空間切断画像をｙ−ｔ時空間画
像と呼ぶ。任意のｙの値から任意のｘ−ｔ時空間画像が
切り出され、これら複数枚のｘ−ｔ時空間画像をｘ−ｔ
時空間画像列と呼ぶ。ｙ−ｔ時空間画像列も同様であ
る。Subsequently, the image data rearranging unit 202 receives the vertical and horizontal linear component spatiotemporal images 304,
305 is rearranged. The rearrangement process here corresponds to 306 in FIG. 3, and is a process of cutting the spatiotemporal image on a plane including the normal line of the screen. The direction perpendicular to the x and y coordinate axes of the frame image is defined as the normal direction of the screen. A plane including the cut time axis is generally called a spatiotemporal sectional image. As an example of the spatio-temporal cross-sectional image, a cut plane obtained by cutting the spatio-temporal image on a plane including a camera traveling direction and a screen normal, which is used in the field of computer vision (Epipolar Pl image)
ane Image)). The three-dimensional position of the subject is estimated from the spatiotemporal sectional image. This is because on the epipolar plane image, the trajectory of the feature point of the object becomes a straight line, and the inclination of the straight line becomes the magnitude of the movement of the feature point of the object [R. C. Bolles, H .; Bak
er, and D.E. H. Marimont, "Epip
polar-plane image analysis
s: An approach to determine
g structurefrom motion ", I
JCV, 1, 1, pp7-55, June 198
9. ]. A spatiotemporal cut image obtained by cutting the spatiotemporal image so as to include the x and y coordinate axes is particularly called an xt spatiotemporal image, and a spatiotemporal cut image including the y and t coordinate axes is similarly called a yt spatiotemporal image. . An arbitrary xt spatiotemporal image is cut out from an arbitrary value of y, and the plurality of xt spatiotemporal images are xt
It is called a spatiotemporal image sequence. The same applies to the yt spatiotemporal image sequence.

【００２８】続いて、画像データ並べ替え部２０２にお
いて切断された垂直，水平方向直線成分時空間画像の切
断面を、２０３の映像情報フィルター処理部においてフ
ィルター（第一次微分、第二次微分等）処理を施す。こ
の処理は、線分検出手段により為され、エッジまたは線
を検出することを目的としている。図３では、３０７に
相当する。フィルター処理部２０３によりエッジまたは
線の強度が算出される。切断画像において切断面に見ら
れる時間軸に沿った流れ模様は、映像中の動きによって
生じている。この流れの方向が動きの大きさに対応して
いる。上記まで述べたエッジ検出は、流れの方向を表す
エッジまたは直線を検出しており、画像から動き情報の
みを強調していることになる。上記のエッジ検出された
切断画像列を、垂直、水平方向時空間エッジ画像列と呼
ぶ。Subsequently, the cut planes of the vertical and horizontal linear component spatiotemporal images cut by the image data rearranging section 202 are filtered by the video information filter processing section 203 (first differential, second differential, etc.). ) Process. This processing is performed by the line segment detecting means, and aims at detecting an edge or a line. In FIG. 3, it corresponds to 307. The intensity of an edge or a line is calculated by the filter processing unit 203. The flow pattern along the time axis seen on the cut surface in the cut image is caused by movement in the video. The direction of this flow corresponds to the magnitude of the movement. In the edge detection described above, an edge or a straight line indicating the direction of the flow is detected, and only the motion information is emphasized from the image. The cut image sequence in which the edge is detected is referred to as a vertical or horizontal spatiotemporal edge image sequence.

【００２９】続いて、２０４の特徴抽出部において、積
分手段により、垂直、水平方向時空間エッジ画像列をエ
ッジ画像の法線方向に加算処理する。図３では、３０８
に相当し、３０８ａの破線方向に加算処理を行う。この
加算処理は、先のフィルター処理において強調された動
きを、より強調することを目的に行う。即ち、もし物体
の特徴点の動きが、グローバルな動きによる物の場合、
加算処理をすることにより強調しあい、顕著に加算処理
結果に反映される。半面、物体の特徴点の動きが、ロー
カルな動きによるものの場合、加算処理をすることによ
り弱小し、加算処理結果に反映されにくくなる。また、
この加算処理は、差分処理と違いノイズに強い処理であ
り、ノイズを多く含む映像からでも動き情報の抽出が可
能となることを意味している。この加算処理によって得
られる結果の画像を時空間投影画像と呼ぶ。ｘ−ｔ垂直
方向時空間画像列からは、ｘ−ｔ時空間投影画像３０９
が得られ、ｙ−ｔ水平方向時空間画像からｙ−ｔ時空間
投影画像３１０が得られる。ｘ−ｔ時空間投影画像の時
間軸に沿った流れ模様の意味するところは、映像の左右
方向の動きを表現し、ｙ−ｔ時空間画像の模様は上下方
向の動きを表現している。Subsequently, in the feature extracting unit 204, the vertical and horizontal spatio-temporal edge image sequence is added by the integrating means in the normal direction of the edge image. In FIG. 3, 308
And the addition process is performed in the direction of the broken line 308a. This addition processing is performed for the purpose of further enhancing the movement emphasized in the previous filter processing. That is, if the movement of the feature point of the object is an object due to global movement,
The addition processing emphasizes each other and is significantly reflected in the addition processing result. On the other hand, when the movement of the feature point of the object is a local movement, the addition processing weakens the movement and makes it difficult to be reflected in the addition processing result. Also,
This addition process is a process that is strong against noise, unlike the difference process, and means that motion information can be extracted even from a video that contains a lot of noise. The image obtained as a result of this addition process is called a spatiotemporal projection image. From the xt vertical spatiotemporal image sequence, an xt spatiotemporal projection image 309 is obtained.
Is obtained, and a yt spatiotemporal projection image 310 is obtained from the yt horizontal spatiotemporal image. The meaning of the flow pattern along the time axis of the xt spatio-temporal projection image expresses the horizontal movement of the image, and the pattern of the yt spatio-temporal image expresses the vertical movement.

【００３０】続いて、２０５の特徴統計解析部におい
て、まずカメラオンオフ検出手段により、時間軸と空間
軸とを有する二次元画像として表現された抽出特徴から
時間軸に垂直な直線を検出して、カメラのオンオフ情報
の抽出を行う。具体的には、ｘ−ｔ時空間投影画像をＦ
（ｘ，ｔ）、ｙ−ｔ時空間投影画像をＦ（ｙ，ｔ）で表
す。次式で表した評価式から算出される値Ｃが、予め与
えた閾値以上であれば、その時間ｔでカメラのオンオフ
が存在したとする。Subsequently, in the feature statistical analysis unit 205, first, a straight line perpendicular to the time axis is detected from the extracted features expressed as a two-dimensional image having a time axis and a space axis by the camera on / off detection means. The camera on / off information is extracted. Specifically, the xt spatiotemporal projection image is represented by F
The (x, t), yt spatiotemporal projection image is represented by F (y, t). If the value C calculated from the evaluation equation expressed by the following equation is equal to or greater than a predetermined threshold, it is assumed that the camera has been turned on and off at the time t.

【００３１】[0031]

【数１】Ｃ（ｔ）＝ｓｕｍ（Ｆ（ｘ，ｔ））ｄｘ＋ｓｕｍ（Ｆ（ｙ，ｔ））ｄｙ …（１）次に、カメラ操作情報の抽出を行う。抽出するカメラ操
作情報を図４に示す。カメラ操作は、基本７操作とそれ
らの組み合わせ操作で構成されている。基本操作には、
フィックス（カメラ固定），パン４０１（カメラを左右
に振る操作）、ズーム４０２（画角を変化させることに
より被写体を拡大，縮小する操作）、チルト４０３（カ
メラを上下に振る操作）、トラック４０４（カメラを左
右に移動する操作）、ブーム４０５（カメラを上下に移
動する操作）、ドリー４０６（カメラを前後に移動する
操作）がある。すなわち、フイックスは静止、パン，テ
ィルトは、カメラ投影中心固定の光軸方向の変化、ズー
ムは画角の変化、トラック，ブーム，ドリーはカメラ投
影中心の位置変化を伴う操作である。カメラ投影中心の
位置変化に伴い、トラック，ブーム，ドリーは被写体の
三次元配置情報を映像の動きの中に含む操作である。ト
ラック，ブーム，ドリーによって撮影された映像の動き
は、被写体がカメラに対して相対的に近い場合、速い動
きを示し、逆に速い場合、遅い動きを示す。C (t) = sum (F (x, t)) dx + sum (F (y, t)) dy (1) Next, camera operation information is extracted. FIG. 4 shows camera operation information to be extracted. The camera operation includes seven basic operations and a combination thereof. Basic operations include:
Fixed (fixed camera), pan 401 (operation of shaking the camera left and right), zoom 402 (operation of enlarging or reducing a subject by changing the angle of view), tilt 403 (operation of shaking the camera up and down), track 404 (operation of shaking the camera up and down) There are operations for moving the camera left and right), boom 405 (operation for moving the camera up and down), and dolly 406 (operation for moving the camera back and forth). That is, fix is an operation involving stationary, pan and tilt are changes in the optical axis direction fixed to the camera projection center, zoom is an operation involving a change in the angle of view, and track, boom, and dolly are operations involving a change in the position of the camera projection center. With the change in the position of the camera projection center, the track, boom, and dolly are operations that include the three-dimensional arrangement information of the subject in the motion of the image. The movement of an image captured by a truck, a boom, and a dolly indicates a fast movement when the subject is relatively close to the camera, and a slow movement when the subject is fast.

【００３２】上記の方法で算出されたｘ−ｔ時空間投影
画像をＦ（ｘ，ｔ）で表す。ｘ−ｔ時空間投影画像は、
空間ｘと時間ｔの関数である。時間ｔ₀のｘ−ｔ時空間
投影画像の空間分布をＦ（ｘ，ｔ₀）で表し、同様に時
間ｔ₁のｘ−ｔ時空間投影画像の空間分布Ｆ（ｘ，ｔ₁）
で表す。また、以下で算出するグローバルな動きパラメ
ータをａ，ｂ，ｃで表し、それぞれａはカメラ操作を言
うズームパラメータ、ｂはパンパラメータ、ｃはティル
トパラメータを意味する。以下、このカメラ操作パラメ
ータを求める手法を示す。The xt spatiotemporal projection image calculated by the above method is represented by F (x, t). The xt spatiotemporal projection image is
It is a function of space x and time t. The spatial distribution F (x, t ₀₎ of the x-t spatio-temporal projection images of time t ₀ expressed in, similarly the spatial distribution of the time t ₁ of the x-t spatio-temporal projection images F (x, t ₁₎
Expressed by The global motion parameters calculated below are represented by a, b, and c, where a is a zoom parameter indicating camera operation, b is a pan parameter, and c is a tilt parameter. Hereinafter, a method for obtaining the camera operation parameters will be described.

【００３３】もし作成されたｘ−ｔ時空間投影画像にグ
ローバルな動きが存在していたとすれば、Ｆ（ｘ，
ｔ₀）とＦ（ｘ，ｔ₁）の間に以下の関係がある。If there is a global motion in the created xt spatiotemporal projection image, F (x,
t ₀ ) and F (x, t ₁ ) have the following relationship.

【００３４】[0034]

【数２】Ｆ（ｘ%，ｔ₁）＝Ｆ（ａｘ＋ｂ，ｔ₀）同様にｙ−ｔ時空間投影画像には、F (x%, t ₁ ) = F (ax + b, t ₀ ) Similarly, the yt spatiotemporal projection image includes:

【数３】Ｆ（ｙ%，ｔ₁）＝Ｆ（ａｙ＋ｂ，ｔ₀）の関係がある。まず、上記のｘ%とｘ、ｙ%とｙの対応付
けを対応付け手段により行う。この様子を図５に示す。## EQU3 ## There is a relationship of F (y%, t ₁ ) = F (ay + b, t ₀ ). First, the association between x% and x and y% and y is performed by the association unit. This is shown in FIG.

【００３５】図５中の５０１はｘ−ｔ時空間投影画像を
表し、５０２は時間Ｔの空間分布Ｆ（ｘ，Ｔ）、５０３
は時間Ｔ−１の空間分布Ｆ（ｘ，Ｔ−１）をそれぞれ表
す。５０４の座標値を図に示すように対応付けを行い、
対応座標５０５を算出する。この算出方法の他に、微少
な範囲ごとの相関関数を算出し対応付けることも可能で
ある。この対応付けられた５０４と５０５は、任意の座
標値を示しており、これら座標間の関係は５０６に示す
直線になる。この直線の傾きがズームパラメータａを表
しており、切片がパンパラメータｂをそれぞれ表してい
る。In FIG. 5, reference numeral 501 denotes an xt spatiotemporal projection image, and 502 denotes a spatial distribution F (x, T) of time T, 503.
Represents a spatial distribution F (x, T-1) at time T-1. The coordinate values of 504 are associated as shown in the figure,
The corresponding coordinates 505 are calculated. In addition to this calculation method, it is also possible to calculate and associate a correlation function for each minute range. The associated 504 and 505 indicate arbitrary coordinate values, and the relationship between these coordinates is a straight line 506. The inclination of this straight line represents the zoom parameter a, and the intercept represents the pan parameter b.

【００３６】続いて、カメラ操作パラメータ算出手段に
より、上記で対応付けられた空間座標値を用いてカメラ
操作パラメータを算出する。具体的には、直線５０６の
ａとｂを算出するために、対応付けられた空間座標値を
用いて以下の関係式でパラメータ空間に射影（投票）し
て射影空間の最大値５０７を抽出し、パラメータａ，ｂ
を算出することを行う。この変換は、一般にＨｏｕｇｈ
変換〔Ｐ．Ｖ．Ｃ．Ｈｏｕｇｈ，“Ｍｅｔｈｏｄａｎ
ｄＭｅａｎｓｆｏｒＲｅｃｏｇｎｉｚｉｎｇＣ
ｏｍｐｌｅｘＰａｔｔｅｒｎｓ”，Ｕ．Ｓ．Ｐａｔｅ
ｎｔＮｏ．３０６９５４，１９６２〕と呼ばれている
ものである。対応付けられた任意の座標をｘ%とｘで表
すと、ａとｂの関係は下記式のようになる。Subsequently, the camera operation parameters are calculated by the camera operation parameter calculation means using the spatial coordinate values associated with the above. More specifically, in order to calculate a and b of the straight line 506, the maximum value 507 of the projection space is extracted by projecting (voting) to the parameter space using the associated spatial coordinate values according to the following relational expression. , Parameters a and b
Is calculated. This conversion is generally performed by Hough
Conversion [P. V. C. Hough, "Method an
d Means for Recognizing C
omplex Patterns ", US Pat.
nt No. 306954, 1962]. When the associated arbitrary coordinates are represented by x% and x, the relationship between a and b is as follows.

【００３７】[0037]

【数４】ｂ＝ｘ%・ｃｏｓ（ａ）＋ｘ・ｓｉｎ（ａ）Ｈｏｕｇｈ変換は、一般に複数個の点からそれらの点が
構成する直線を推定する方法として確立されている。画
像空間の一つの点が（射影空間）Ｈｏｕｇｈ空間では一
本の曲線を表し、射影された複数個の曲線の交点５０７
の座標値が抽出すべき直線の傾きと切片を表している。B = x% · cos (a) + x · sin (a) The Hough transform is generally established as a method for estimating a straight line constituted by a plurality of points. One point in the image space represents one curve in the (projection space) Hough space, and an intersection 507 of a plurality of projected curves is provided.
Indicate the slope and intercept of the straight line to be extracted.

【００３８】計算機では、直線を射影空間に投票し、最
大の投票数を示す座標値をもって抽出すべき直線の傾き
と切片を算出している。対応付けられた複数組みの座標
値の各々を射影空間に投票してパラメータを算出してい
る。The computer votes the straight line in the projective space, and calculates the slope and intercept of the straight line to be extracted with the coordinate value indicating the maximum number of votes. The parameters are calculated by voting each of a plurality of sets of coordinate values associated with each other in the projective space.

【００３９】同様にｙ−ｔ時空間投影画像からティルト
パラメータｃが算出できる。また、三次元情報を含むよ
うな操作が行われた場合の時空間投影（積分）画像であ
っても、ミクロ的（部分的）には三次元操作を含まない
操作が行われた場合の画像に等しいことから、かかる画
像であっても上記処理を部分的（ブロック的）に施すこ
とにより処理が可能である。以上が撮影状態検出部１０
３の構成と処理の流れである。Similarly, the tilt parameter c can be calculated from the yt spatiotemporal projection image. Further, even if the image is a spatiotemporal projection (integral) image when an operation including three-dimensional information is performed, an image obtained when an operation that does not include a three-dimensional operation is performed microscopically (partially). Therefore, even such an image can be processed by partially (blockwise) performing the above processing. The above is the photographing state detection unit 10
3 is a configuration and processing flow.

【００４０】次に、図１に戻り、１０４の映像分割部に
おいて、撮影状態検出部１０３で算出したカメラのオン
オフ情報に基づいて映像をショット毎に分割する。この
ようにカメラのオンオフで分割されたショットでは、シ
ョット内の画像は、連続的な同じ空間の情報をもってい
ると考えられる。Next, returning to FIG. 1, the video dividing unit 104 divides the video into shots based on the camera on / off information calculated by the photographing state detecting unit 103. In the shot divided by turning the camera on and off in this way, the images in the shot are considered to have continuous information of the same space.

【００４１】次に、１０５の被写体，背景分離部におい
て、被写体と背景の分離を行う。被写体，背景分離部１
０５の構成を図６に示す。以下、被写体，背景分離部１
０５の６０１から６０４までの構成要素の構成とそれら
による処理の流れを詳細に説明する。Next, the subject and background are separated in the subject / background separation unit 105. Subject / background separation unit 1
FIG. Hereinafter, the subject / background separation unit 1
The configuration of the components 601 to 604 of 05 and the flow of processing by the components will be described in detail.

【００４２】まず、６０１のカメラ操作相殺部におい
て、映像フレーム変形手段により、カメラ操作情報に基
づいて、画像データからカメラ操作をキャンセルするこ
とを行う。隣接する画像データ間でカメラ操作により生
じた変化・変位分だけ画像データ間で変化・変位させ
る。隣接する画像データをＦ（ｘ，ｙ，ｔ）、Ｆ（ｘ，
ｙ，ｔ＋１）とする。カメラ操作Ａは（Ａはマトリク
ス）を用いると、隣接する画像データ間では、次の関係
がある。First, the camera operation canceling unit 601 cancels the camera operation from the image data based on the camera operation information by the video frame deforming means. The image data is changed / displaced by an amount corresponding to the change / displacement generated by the camera operation between the adjacent image data. F (x, y, t), F (x,
y, t + 1). When the camera operation A uses (A is a matrix), the following relationship exists between adjacent image data.

【００４３】[0043]

【数５】Ｆ（ｘ，ｙ，ｔ＋１）＝ＡＦ（ｘ，ｙ，ｔ）カメラ操作のキャンセルは、次の式で表せる。F (x, y, t + 1) = AF (x, y, t) Cancellation of the camera operation can be expressed by the following equation.

【００４４】[0044]

【数６】Ｆ（ｘ，ｙ，ｔ）＝Ａ^-1Ｆ（ｘ，ｙ，ｔ＋１）続いて、６０２の画像データ比較部では、差分処理手段
により、上記でカメラ操作がキャンセルされた隣接する
画像間で比較処理を行う。ここで行う処理は、カメラ操
作をキャンセルした画像どうしの比較処理であり、算出
されるものは画像間の輝度、色等の情報の差分の絶対値
等である。この比較処理により、背景は差し引かれ、被
写体の動きのみの変化分が背景との差分として抽出され
る。F (x, y, t) = A ⁻¹ F (x, y, t + 1) Subsequently, in the image data comparison unit 602, the adjacent camera whose camera operation has been canceled by the difference processing means as described above. Perform comparison processing between images. The processing performed here is a comparison processing between the images for which the camera operation has been cancelled, and what is calculated is an absolute value of a difference between information such as luminance and color between the images. By this comparison processing, the background is subtracted, and a change in only the movement of the subject is extracted as a difference from the background.

【００４５】続いて、６０３の比較画像データ比較部で
は、隣接する比較画像データ間の比較を行う。ここで行
う処理は、画像間での積や、画像間で比較し小さい方の
値を比較画像の値とする等の比較演算である。これら一
連の処理により、３枚の連続する画像データに基づいて
その真ん中の画像の被写体（移動物体）の領域が強調さ
れる。Subsequently, the comparison image data comparison unit 603 compares adjacent comparison image data. The processing performed here is a comparison operation such as a product between images or a smaller value compared between images as a value of a comparison image. Through a series of these processes, the region of the subject (moving object) in the middle image is enhanced based on three continuous image data.

【００４６】続いて、６０４の領域抽出部では、先ず、
強調された被写体領域の２値化処理を２値化処理手段に
より行う。予め与えられた閾値Ｓを用いて、次の条件で
２値化処理を行う。比較データ画像をｆ（ｘ，ｙ）と
し、２値化画像をＦ（ｘ，ｙ）とする。Subsequently, in the area extracting unit 604, first,
The binarization processing of the emphasized subject area is performed by the binarization processing means. Using the threshold value S given in advance, binarization processing is performed under the following conditions. Let the comparison data image be f (x, y) and let the binarized image be F (x, y).

【００４７】[0047]

【数７】Ｆ（ｘ，ｙ）＝１：ｉｆｆ（ｘ，ｙ）≧ＳＦ（ｘ，ｙ）＝０：ｉｆｆ（ｘ，ｙ）＜Ｓ図７に２値化処理の例を示す。７０１が比較データ画像
であり、７０２が２値化画像である。閾値を９とした場
合を想定している。F (x, y) = 1: if f (x, y) ≧ S F (x, y) = 0: if f (x, y) <S FIG. 7 shows an example of the binarization processing. Show. 701 is a comparison data image, and 702 is a binarized image. It is assumed that the threshold is set to 9.

【００４８】続いて、ラベル付け手段により、２値化画
像Ｆ（ｘ，ｙ）のラベル付けを行う。ラベル付けのアル
ゴリズムを図８に示す。２値化画像Ｆ（ｘ，ｙ）をＦ＝
｛Ｆ_i,j｝、ラベル画像をＬ＝｛ｌ_i,j｝で表す。ｌ_i,j
は各連結成分のラベルを表す正整数である。また、ｌは
連結成分番号を表す変数、Ｔ（ｉ）はラベル表をそれぞ
れ表す。８０１の初期設定においてｌ＝１とし、画素
（２，２）からラベル走査を開始する。現在の画素を
（ｉ，ｊ）とし、８０２において判断する。ｆ_i,j＝１
ならば８０３へ、ｆ_i,j＝０ならｌ_i,j＝０として８０７
へ処理を進める。図９に示した走査方法で、現在の画素
ｘ₀＝（ｉ，ｊ）と、その近傍で操作済みの画素を図１
０に示すように表し、ｘ_pのラベル（画像Ｌの値）をｌ_p
（ｐ＝１，２，３，４）とする。８０３では、｛Ｔ（ｌ
_p），ｌ_pが０でない、ｐ＝１，２，３，４｝中に異なる
正の値がｎ種類あるとし，それらを小さい値から順にＬ
₁，Ｌ₂，…，Ｌ_nとする。ｎ＝０ならば８０４へ、ｎ＝
１ならば８０５へ、ｎ＝２ならば８０６へ処理を進め、
各処理後、８０７へ進む、８０７では全画素について終
了したかどうかを判断し、全画素終了であれば８０８，
８０９の処理をしてラベリングを完了する。Subsequently, the labeling means labels the binarized image F (x, y). The labeling algorithm is shown in FIG. The binarized image F (x, y) is represented by F =
{F _{i, j} } and the label image are represented by L = {l _{i, j} }. l _{i, j}
Is a positive integer representing the label of each connected component. Also, 1 is a variable representing a connected component number, and T (i) represents a label table. In the initial setting of 801, l = 1 is set, and label scanning is started from pixel (2, 2). The current pixel is set to (i, j), and the judgment is made at 802. f _{i, j} = 1
If f _{i, j} = 0, then go to 803 and set l _{i, j} = 0 to 807
Processing proceeds to In the scanning method shown in FIG. 9, the current pixel x ₀ = (i, j) and the pixels which have been operated in the vicinity thereof are shown in FIG.
0, and the label of x _p (the value of image L) is l _p
(P = 1, 2, 3, 4). In 803, ΔT (l
_p), l _p is not 0, and p = 1, 2, 3, 4} different positive value during the there are n type, L them from a small value in order
₁ , L ₂ ,..., L _n . If n = 0, go to 804, n =
If it is 1, the processing proceeds to 805, and if n = 2, the processing proceeds to 806,
After each process, the process proceeds to 807. In 807, it is determined whether or not the processing is completed for all the pixels.
The processing of 809 is performed to complete the labeling.

【００４９】次いで、物理的特徴算出手段により、ラベ
リングされた領域に対して物理的な特徴量を算出する。
ここで算出する物理的な特徴は、輝度，色分布，テクス
チャ等である。次いで、照合手段により、予め与えた特
徴量と算出したラベル領域の特徴量とを比較照合し、被
写体領域を決定する。次いで、背景抽出手段により、上
記で抽出した被写体領域をフレーム画像から差し引くこ
とで、背景領域を分離する。Next, the physical feature calculating means calculates a physical feature for the labeled area.
The physical characteristics calculated here are luminance, color distribution, texture, and the like. Next, the matching unit compares and compares the feature amount given in advance with the calculated feature amount of the label region to determine a subject region. Next, the background region is separated by subtracting the subject region extracted above from the frame image by the background extraction unit.

【００５０】以上が被写体，背景分離部１０５の構成と
処理の流れである。The above is the configuration of the subject / background separation unit 105 and the flow of processing.

【００５１】次に、図１に戻り、１０６の被写体動き情
報抽出部において、照合手段により、各フレーム画像毎
に抽出された領域で算出した物理的な特徴量を隣接する
フレーム間で比較し、比較量に対して予め与えた条件と
照合し、照合した領域は類似する物理的特徴を持つ領
域、すなわち同一被写体による領域であるとして時間的
な関係付けを行う。この関係付けの情報を被写体の動き
情報とする。Returning to FIG. 1, in the subject motion information extraction unit 106, the comparing means compares the physical feature amounts calculated in the area extracted for each frame image between adjacent frames. The comparison amount is compared with a condition given in advance, and a temporal relationship is performed assuming that the compared region is a region having similar physical characteristics, that is, a region of the same subject. This association information is used as the motion information of the subject.

【００５２】一方、１０７の撮影空間再合成部におい
て、空間重ね合わせ手段により、被写体，背景分離部１
０５で分離された背景を、撮影状態検出部１０３で算出
したカメラ操作情報に基づいて画像フレームを変形・変
位させて一つの連続する空間として重ね合わせ、フレー
ムを超える広い撮影空間として再合成しなおす。図１１
に再合成の様子を示す。図１１（ａ）において、１１０
１と１１０２は時間的に連続するフレーム画像であり、
パンのカメラ操作で撮影された映像である場合、１１０
２は１１０１に対して１１０３（フレームあたりのパン
操作量）分だけシフトして合成する。同様にティルトの
場合１１０４（フレームあたりのティルト操作量）分だ
けシフトして合成する。ズーム操作の場合、図１１
（ｂ）に示すように、ズーム量１１０５に応じて画像の
サイズを拡大，縮小し合成する。この合成の方法で作ら
れた背景画像は、一般に言うパノラマ空間であり、パノ
ラマ空間が持つ特有の歪みを持つ。この歪みは、画像を
円柱変換した場合に生じる歪みと等価である。On the other hand, in the photographing space resynthesizing unit 107, the subject / background separating unit 1 is operated by the spatial superimposing means.
The background separated in 05 is overlaid as one continuous space by deforming and displacing the image frame based on the camera operation information calculated by the shooting state detection unit 103, and recombining as a wide shooting space beyond the frame. . FIG.
Fig. 3 shows the re-synthesis. In FIG. 11A, 110
1 and 1102 are temporally continuous frame images,
If the image is shot by panning the camera, 110
2 is shifted by 1103 (pan operation amount per frame) with respect to 1101 and synthesized. Similarly, in the case of tilt, the images are shifted and combined by 1104 (the amount of tilt operation per frame). In the case of the zoom operation, FIG.
As shown in (b), the size of the image is enlarged and reduced in accordance with the zoom amount 1105, and is synthesized. The background image created by this synthesizing method is a general panoramic space, and has a unique distortion that the panoramic space has. This distortion is equivalent to the distortion that occurs when an image is converted into a cylinder.

【００５３】次に、１０８のショット間関係算出部で、
撮影空間変換手段により、再合成された各ショット毎の
撮影空間に対して、撮影空間の間でそれらの大きさと位
置が等しくなるように撮影空間を変換した後、ショット
間関係情報の算出を行う。ここで、算出する情報は、シ
ョットの空間に関する情報である。関係付けを行うショ
ットに記録されているそれぞれの空間の配置関係をショ
ット間関係情報とする。このショット間の関係は、撮影
空間合成部１９７で再合成したパノラマ空間を比較する
ことで算出する。同じ焦点距離で撮影された映像から作
成したパノラマ空間間では、同じ円柱変換の歪みを有す
る。図１２に円柱変換の様子を示す。三次元空間上の線
１２０１は、円柱変換により円柱１２０３上の１２０２
へ変換される。図１２中の点Ｏは、カメラの投影中心で
あり、円柱１２０３上は、画像面である。円柱の大きさ
は、撮影時の焦点距離ｆの大きさで一意で決まる。この
ことにより、同じ焦点距離で撮影された映像から作成し
たパノラマ空間間では、同じ円柱変換の歪みを有する。
同じ焦点距離で撮影された映像から作成したパノラマ空
間間の関係は、それらの画像の空間的な平行移動量を算
出することで可能である。一つのパノラマ画像をテンプ
レートとして相互関数計数を用いてマッチングにより算
出可能である。この算出の際、重なり部分の相関関係と
重なり領域から評価関数を新たに定義することで安定に
マッチングが可能である。焦点距離が異なる画像間の関
係付けには、円柱歪みを考慮してマッチングしなければ
なない。一つの画像を基準として焦点距離ｆを少しづつ
変化させてマッチングすることで関係を算出することが
可能である。カメラ操作が行われずに撮影された映像間
では、画像の大きさを拡大・縮小させてマッチングする
ことで関係を算出することが可能である。Next, in the inter-shot relation calculation unit 108,
The photographing space conversion means converts the photographing space for the recombined photographing space for each shot so that the size and position of the photographing space are equal, and then calculates the inter-shot relation information. . Here, the information to be calculated is information relating to the space of the shot. The arrangement relationship of each space recorded in the shot to be related is defined as inter-shot relationship information. The relationship between the shots is calculated by comparing the panoramic space recombined by the photographing space combining unit 197. Panoramic spaces created from images shot at the same focal length have the same cylinder transformation distortion. FIG. 12 shows a state of the cylinder conversion. The line 1201 on the three-dimensional space is transformed into a 1202 on the cylinder 1203 by the cylinder transformation.
Is converted to The point O in FIG. 12 is the projection center of the camera, and the area above the cylinder 1203 is the image plane. The size of the cylinder is uniquely determined by the size of the focal length f at the time of shooting. As a result, the panorama space created from the video shot at the same focal length has the same cylinder transformation distortion.
The relationship between panoramic spaces created from videos shot at the same focal length can be obtained by calculating the amount of spatial translation of those images. One panoramic image can be calculated by matching using a mutual function count as a template. In this calculation, stable matching can be achieved by newly defining an evaluation function from the correlation between the overlapping portions and the overlapping region. In order to relate images having different focal lengths, matching must be performed in consideration of cylindrical distortion. The relationship can be calculated by performing matching while gradually changing the focal length f with reference to one image. It is possible to calculate the relationship between the video images captured without performing the camera operation, by enlarging / reducing the size of the image and performing matching.

【００５４】算出されたカメラオンオフ情報，カメラ操
作情報，被写体情報，被写体動き情報，再合成した背景
情報，及びショット間関係情報は、複数の映像に関し、
映像構造情報管理・蓄積部１０９に時空間的に管理・蓄
積される。従来においては、映像は、ファイルとして又
はＲＧＢの時間変化の信号として蓄積装置の中で管理さ
れてきた。信号レベルでの管理は、機械的な処理や伝
送，蓄積，表示において有効であった。映像が計算機で
扱えるマルチメディア時代においては、映像に対する扱
いは、従来のただ単に信号の処理，蓄積，表示すること
等から、大量に蓄積された映像データベースからの検
索，編集，加工等の高度な映像の扱いになってきてい
る。このような映像の高度な扱いを可能にするために
は、映像を信号レベルからより映像の中身に突っ込んだ
レベルでの情報で管理しなければならない。上述の抽出
した情報はこのレベルの情報であり、この情報で映像を
表現し管理することで、ＲＧＢの時間変化信号のみで表
現された映像とは全く異なるより高度な扱いが可能とな
る。映像の中身を反映したこの表現は、単なる信号の時
間変化とは異なり映像の時間，空間に関する意味的な表
現であるとも考えられる。この映像の新たな表現を時空
間的に蓄積・管理することで人間にとって直観的で分か
りやすい高度な映像の扱いが実現可能となる。The calculated camera on / off information, camera operation information, subject information, subject motion information, recombined background information, and shot relation information are used for a plurality of videos.
The video structure information is managed and stored in the image structure information management and storage unit 109 spatiotemporally. In the past, video has been managed in the storage device as a file or as an RGB time-varying signal. Control at the signal level was effective in mechanical processing, transmission, storage, and display. In the multimedia era, where images can be handled by computers, the handling of images is not limited to conventional signal processing, storage, and display, but rather advanced techniques such as searching, editing, and processing from a large amount of stored video databases. Video is becoming a treatment. In order to enable such advanced handling of video, the video must be managed with information at a level that is deeper into the content of the video than at the signal level. The extracted information is information at this level, and by expressing and managing a video with this information, it is possible to perform a higher-level handling completely different from a video expressed only with the RGB time-varying signal. This expression reflecting the contents of the image is considered to be a semantic expression relating to the time and space of the image, unlike a mere time change of the signal. By accumulating and managing the new expression of the video in a spatiotemporal manner, it becomes possible to realize advanced video handling that is intuitive and easy for humans to understand.

【００５５】抽出された映像の構造情報，映像データに
対してデータ圧縮して蓄積する。データ圧縮することで
蓄積スペースの効率化や、データのネットワークを介し
た送受信を実現する。映像構造情報の中で時間を変数と
して変化する情報に対しては、ハフマン符号化等の可逆
符号化を用いて圧縮する。空間に関する情報であって画
像に関するもの（例えば、抽出された被写体の画像や再
合成された背景画像等）は、静止画像として非可逆符号
化を用いて圧縮する。非可逆符号化の代表的な手法は、
ＪＰＥＧ符号化である。３２０×２４０の画像でカメラ
を３６０度水平に回して撮影された映像から、本発明を
用いて再合成された画像は、約５メガバイトの情報を有
し、ＪＰＥＧ符号化を用いると１０分の１の圧縮効率が
見込まれて約５００キロバイトのデータに圧縮可能であ
る。現在のインターネット等を用いる場合、約５メガバ
イトの情報では、そのデータ伝送は時間的な制限から不
可能であるが、約５００キロバイトのデータの伝送は実
用的に可能である。また、時間的に変化する被写体の画
像等は、同様に非可逆符号化であるＨ２６１やＭＰＥＧ
等の符号化を用いることで約１０分の１から２０分の１
程度まで圧縮可能である。インターネット等の細い線
（情報伝送レートの低い線）で構造化された映像を送る
場合、静止画像としてＪＰＥＧ圧縮された背景画像と、
同じく時間情報を空間に展開した静止画像（例えば、背
景に被写体を展開したストロボ画像等）としてＪＰＥＧ
圧縮された被写体情報等の、ユーザーの要求に答えるだ
けの最低限の情報を送ることで、ネットワーク使用，時
間の効率化とインタラクションレスポンスの良さを実現
可能としている。The structure information and video data of the extracted video are compressed and stored. By compressing the data, the storage space can be made more efficient, and data can be transmitted and received via a network. Information that changes with time as a variable in the video structure information is compressed using lossless coding such as Huffman coding. Information relating to a space and relating to an image (for example, an extracted image of a subject or a recombined background image) is compressed as a still image using irreversible coding. A typical method of lossy encoding is
JPEG encoding. An image recombined using the present invention from a video taken by rotating the camera 360 degrees horizontally with a 320 × 240 image has about 5 megabytes of information and 10 minutes using JPEG encoding. Assuming a compression efficiency of 1, the data can be compressed to about 500 kilobytes of data. When the current Internet or the like is used, data transmission of about 5 megabytes is impossible due to time limitation, but transmission of about 500 kilobytes of data is practically possible. In addition, the image of a subject that changes with time, and the like, are also irreversible encoded by H261 or MPEG.
Approximately 1/10 to 1/20 by using encoding such as
Can be compressed to a degree. When sending a video structured by a thin line (line with a low information transmission rate) such as the Internet, a background image compressed by JPEG as a still image,
Similarly, JPEG is used as a still image in which time information is expanded in space (for example, a strobe image in which a subject is expanded in the background).
By sending the minimum information required to respond to the user's request, such as compressed subject information, it is possible to realize network use, time efficiency and good interaction response.

【００５６】次に、１１０１の再合成部では、１１２の
ユーザ入力部からの要求に応じて、もしくは予め与えた
条件に従って、またはユーザの要求と予め与えた条件の
両方に従って、映像構造情報管理・蓄積部１０９に管理
されている情報の再合成を行う。意味的な映像のフィル
ターリングを行うことも可能である。背景のみ、被写体
のみの映像なども作成可能である。時間と空間の映像情
報から情報情報を空間へ展開し空間のみの情報として表
現した画像も作成可能である。この例として、前述のよ
うなパノラマ展開された空間へ被写体をストロボ的に表
現した映像がある。従来技術では、時間的にサンプリン
グされたストロボ表現しか実現できなかったことに加え
て、空間的にサンプリングされたストロボ表現も可能で
ある。Next, in the re-synthesizing unit 1101, the video structure information management and management is performed in response to a request from the user input unit 112, or in accordance with a previously given condition, or in accordance with both a user request and a previously given condition. The information managed by the storage unit 109 is re-synthesized. It is also possible to filter semantic images. It is also possible to create an image of only the background or only the subject. It is also possible to create an image that expands information information from time and space video information to space and expresses it as information only for space. As an example of this, there is an image in which a subject is strobe-likely expressed in a space where a panorama has been developed as described above. In the prior art, in addition to being able to achieve only a strobed representation that is temporally sampled, a strobed representation that is spatially sampled is also possible.

【００５７】図１３に時間、空間サンプリングのストロ
ボ表現の様子を示す。１３０１が再合成されたパノラマ
空間である。１３０２が被写体を表している。１３０３
が空間的サンプリングされたストロボ表現であり、１３
０４が時間的にサンプリングされたストロボ表現であ
る。１３０４は、被写体の配置が被写体の速度も表現し
ており、一方、１３０３は、被写体の空間における変化
を表現している。ショット間関係情報を用いて、異なる
ショットに存在する複数の被写体を一つの背景に合成す
ることも可能である。例えば、ショットＡに撮影されて
いる被写体（選手Ａ）をパノラマ空間上にストロボ表現
し、ショットＢに撮影されている被写体（選手Ｂ）をス
トロボ表現上へオーバーラップして動画として表示する
ことも可能である。ここでのショットＡとショットＢは
空間的に共通の空間を有する。この映像のエンハンスメ
ントは、旨い選出、下手な選手とのフォームの違いなど
を視覚で直感的に把握することをユーザに容易にしてい
る。FIG. 13 shows a strobe representation of time and space sampling. Reference numeral 1301 denotes a recombined panoramic space. Reference numeral 1302 denotes a subject. 1303
Is the spatially sampled strobe representation and 13
04 is a strobe expression sampled temporally. Reference numeral 1304 denotes the arrangement of the subject also expresses the speed of the subject, while reference numeral 1303 denotes a change in the space of the subject. It is also possible to combine a plurality of subjects existing in different shots with one background using the inter-shot relationship information. For example, the subject (player A) photographed in shot A may be strobe-represented in a panoramic space, and the subject (player B) photographed in shot B may be overlapped on the strobe representation and displayed as a moving image. It is possible. Here, shot A and shot B have a spatially common space. This enhancement of the video makes it easy for the user to visually and intuitively grasp the good selection, the difference in form with the poor player, and the like.

【００５８】再合成部１１０では、ユーザ入力部１１２
からのユーザの要求に応じてさまざまな表現が可能とな
る。１１１の表示部において表示された映像からユーザ
は、１１２のユーザ入力部を介してユーザの要求に対す
る表現のフィードバックが可能である。また、１１３の
出力部においては、デジタル，アナログ出力が可能であ
り、デジタル出力は、外部のプリンターやパーソナルコ
ンピュータ等への出力であり、アナログ出力は、モニタ
等への映像信号出力である。なお、出力部１１３は必要
に応じて設けるようにしても良い。In the re-combining unit 110, the user input unit 112
Various expressions are possible according to the user's request. From the video displayed on the display unit 111, the user can give feedback of the expression to the user's request via the user input unit 112. The output unit 113 can perform digital and analog outputs. The digital output is an output to an external printer, a personal computer, or the like, and the analog output is a video signal output to a monitor or the like. Note that the output unit 113 may be provided as needed.

【００５９】以上、本発明を一実施形態例に基づき具体
的に説明したが、本発明は、前記実施形態例に限定され
ることはなく、本発明の要旨を逸脱しない範囲において
種々の変更が可能であることは言うまでもない。As described above, the present invention has been specifically described based on one embodiment. However, the present invention is not limited to the above embodiment, and various changes can be made without departing from the gist of the present invention. It goes without saying that it is possible.

【００６０】[0060]

【発明の効果】以上のように、本発明によれば、映像デ
ータを読み込み、保存された画像データ列を読み出し、
カメラオンオフ情報とカメラ操作情報を含む撮影状態情
報を検出し、このカメラオンオフ情報に基づいて映像を
ショット毎に分割し、カメラ操作情報と物理的な特徴量
を用いてフレーム毎に被写体と背景を分離し、分離され
た被写体情報をフレーム間で対応付けを行って被写体動
き情報を抽出し、カメラ操作情報とフレーム毎の背景か
ら撮影空間を再合成し、複数のショットから各々再合成
された複数の撮影空間の間の空間的にショット間関係を
算出し、以上で得られた情報を管理・蓄積するようにし
たので、複数映像の時空間的なエンハンスメント、時空
間的、統一的な管理，表現，操作を可能とする複数映像
の時空間統合が実現可能となる。As described above, according to the present invention, video data is read, a stored image data sequence is read,
Detects shooting state information including camera on / off information and camera operation information, divides an image into shots based on the camera on / off information, and uses the camera operation information and physical features to determine the subject and background for each frame. The separated subject information is correlated between frames to extract subject motion information, the shooting space is re-synthesized from the camera operation information and the background for each frame, and a plurality of re-synthesized images are obtained from a plurality of shots. The spatial relationship between shots between the shooting spaces is calculated, and the information obtained above is managed and stored, so that the spatio-temporal enhancement of multiple images, spatio-temporal, unified management, Spatio-temporal integration of multiple images that enables expression and operation becomes feasible.

【００６１】また、上記管理・蓄積されている情報を基
に、予め与えた条件やユーザからの要求に従って、一つ
又は複数の撮影空間と一つ又は複数の被写体を再合成
し、表示し又は外部へアナログ、デジタル形式で出力す
るようにしたの、同じ空間を撮影した複数の映像に対し
て、ユーザが興味や目的に応じて各自のスタイルで複数
映像の情報を同時に直感的に効率よく取得することか可
能である。Further, based on the information managed and stored, one or more photographing spaces and one or more subjects are recombined and displayed according to a given condition or a request from a user. Users can simultaneously and intuitively and efficiently obtain information on multiple videos in the same style according to their interests and purposes for multiple videos that have been shot in the same space by outputting them in analog or digital format to the outside. It is possible to do.

[Brief description of the drawings]

【図１】本発明の一実施形態例の構成と処理の流れを説
明する図である。FIG. 1 is a diagram illustrating a configuration and a processing flow of an embodiment of the present invention.

【図２】上記実施形態例における撮影状態検出部の構成
と処理の流れを説明する図である。FIG. 2 is a diagram illustrating a configuration and a processing flow of a shooting state detection unit according to the embodiment.

【図３】上記実施形態例における撮影状態検出部の処理
の流れ図である。FIG. 3 is a flowchart of a process of a shooting state detection unit in the embodiment.

【図４】上記実施形態例におけるカメラ操作情報を説明
する図である。FIG. 4 is a diagram illustrating camera operation information in the embodiment.

【図５】上記実施形態例におけるカメラ操作情報の抽出
アルゴリズムを説明する図である。FIG. 5 is a diagram illustrating an algorithm for extracting camera operation information in the embodiment.

【図６】上記実施形態例における被写体，背景分離部の
構成と処理の流れを説明する図である。FIG. 6 is a diagram illustrating a configuration of a subject / background separation unit and a processing flow in the embodiment.

【図７】上記実施形態例における２値化処理の例を示す
図である。FIG. 7 is a diagram illustrating an example of a binarization process in the embodiment.

【図８】上記実施形態例におけるラベリングの処理の流
れ図である。FIG. 8 is a flowchart of a labeling process in the embodiment.

【図９】上記実施形態例における画像走査順を示す図で
ある。FIG. 9 is a diagram showing an image scanning order in the embodiment.

【図１０】上記実施形態例における対象画素及び走査済
み画素を示す図である。FIG. 10 is a diagram showing target pixels and scanned pixels in the embodiment.

【図１１】上記実施形態例におけるカメラ走査に基づく
撮影空間の再合成方法を説明する図である。FIG. 11 is a diagram illustrating a method of recombining a shooting space based on camera scanning in the embodiment.

【図１２】円柱変換を説明する図である。FIG. 12 is a diagram illustrating a cylinder conversion.

【図１３】上記実施形態例による時間，空間サンプリン
グによるストロボ表現を説明する図である。FIG. 13 is a diagram illustrating strobe expression by time and space sampling according to the embodiment.

[Explanation of symbols]

１０１ビデオ信号入力部１０２画像データ列メモリ部１０３撮影状態検出部１０４映像分割部１０５被写体，背景分離部１０６被写体動き情報抽出部１０７撮影空間再合成部１０８ショット間関係算出部１０９映像構造情報管理・蓄積部１１０再合成部１１１表示部１１２ユーザ入力部１１３出力部１１４映像構造化情報伝送受信部２０１直線成分算出部２０２画像データ並べ替え部２０３映像情報フィルター部２０４特徴抽出部２０５特徴統計解析部３０１時空間画像３０２垂直方向直線算出３０３水平方向直線算出３０４垂直方法直線成分時空間画像３０５水平方法直線成分時空間画像３０６並べ替え処理３０７フィルター処理３０８エッジ画像の法線方向の加算処理３０９ｘ−ｔ時空間投影画像３１０ｙ−ｔ時空間投影画像６０１カメラ操作相殺部６０２画像データ比較部６０３比較画像データ比較部６０４領域抽出部 Reference Signs List 101 video signal input unit 102 image data string memory unit 103 shooting state detection unit 104 video division unit 105 subject / background separation unit 106 subject motion information extraction unit 107 shooting space recombining unit 108 shot relation relation calculation unit 109 video structure information management / Storage unit 110 re-synthesis unit 111 display unit 112 user input unit 113 output unit 114 video structured information transmission / reception unit 201 linear component calculation unit 202 image data rearrangement unit 203 video information filter unit 204 feature extraction unit 205 feature statistical analysis unit 301 Spatiotemporal image 302 Vertical straight line calculation 303 Horizontal straight line calculation 304 Vertical method linear component spatiotemporal image 305 Horizontal method linear component spatiotemporal image 306 Rearrangement processing 307 Filter processing 308 Edge image normal direction addition processing 309 xt Spatio-temporal projection image 310 yt spatiotemporal projection image 601 camera operation canceling unit 602 image data comparing unit 603 comparative image data comparing unit 604 region extracting unit

Claims

[Claims]

1. An apparatus for spatially and spatially integrating and managing a plurality of videos, an image data sequence memory unit for reading video data and storing the data as a data sequence, a data sequence read from the image data sequence memory unit, and a camera A shooting state detection unit that detects shooting state information including on / off information and camera operation information; a video division unit that divides the video of the data sequence into shots based on the camera on / off information; A subject and background separating unit that separates a subject and a background for each frame of the video using various feature amounts; a subject motion information extracting unit that associates subject information separated for each frame between frames; A shooting space re-synthesizing unit for re-synthesizing a shooting space in which the video is shot from a background separated for each frame; and An inter-shot relationship calculation unit for calculating a spatial inter-shot relationship between a plurality of imaging spaces recombined by the imaging space recombining unit from a number of shots; information on the separated subjects; And a video structure information managing / accumulating unit for managing / accumulating information of the obtained subject, the shooting state information, the information of the background, and the information of the relation between shots. , Management equipment.

2. A video structured information transmission / reception unit for transmitting or receiving all or a part of the extracted subject information, photographing state information, background information, shot relation information, and video data. The spatio-temporal integration / management apparatus for a plurality of videos according to claim 1, wherein:

3. One or a plurality of photographing spaces and one or more photographing spaces based on information stored and managed in the video structure information management and storage unit according to one or both of a condition given in advance and a request from a user. Or a re-synthesizing unit for re-synthesizing a plurality of subjects; a display unit for displaying an image re-synthesized by the re-synthesizing unit; and a request from the user for re-synthesis based on the image displayed on the display unit. A user input unit for inputting, and an output unit for outputting an image displayed on the display unit in a digital or analog format to an external device as necessary, further comprising: A spatio-temporal integration and management device for a plurality of videos as described in.

4. The image capturing state detection unit includes: a linear component calculation unit that calculates horizontal and vertical linear components for an image frame read from an image data memory unit; and the calculated horizontal and vertical components. An image data arrangement rearranging section for rearranging a temporal and spatial arrangement of an image including a linear component; an image information filter section for performing a filtering process on the image data obtained by the rearranging; and a feature based on a result of the filtering process. A feature extraction unit for extracting the feature information; and a feature statistical analysis unit for statistically analyzing the extracted feature to detect camera on / off information and camera operation information. Or a spatio-temporal integration and management device for a plurality of videos according to 3.

5. The image data rearranging unit includes image data sequence rearrangement means for rearranging the image data sequence into a plurality of spatiotemporal sectional images including a normal line and a time axis of the image. The apparatus for integrating and managing spatio-temporal images of a plurality of videos according to claim 4.

6. The plurality of video information processing apparatuses according to claim 4, wherein the video information filter processing unit includes a line segment detecting unit that detects an edge or a line of video information of the rearranged image data. A spatio-temporal integration and management device for video.

7. The method according to claim 6, wherein the feature extracting unit includes an integrating unit that adds information on the detected edge or line in a normal direction of an image. Spatial integration and management equipment.

8. A camera on / off detecting means for detecting a straight line perpendicular to the time axis of an extracted feature expressed in a two-dimensional image having a time axis and a space axis, and calculating camera on / off information. The spatio-temporal integration / management apparatus for a plurality of videos according to claim 4, comprising:

9. The feature statistical analysis unit compares a spatial distribution of two arbitrary times of an extracted feature expressed in a two-dimensional image having a time axis and a spatial axis, and associates spatial coordinates with each other. The spatio-temporal integration / management apparatus for a plurality of videos according to claim 4, comprising: attaching means; and camera operation parameter calculating means for calculating camera operating parameters by statistically processing the associated spatial coordinates. .

10. A camera operation canceling unit that cancels a camera operation from adjacent image data based on camera operation information detected by a shooting state detection unit, wherein the camera operation information cancels out. An image data comparison unit for comparing the obtained image data, a comparison image data comparison unit for comparing the adjacent comparison image data, and a subject area extracted from the comparison data calculated by the comparison image data comparison unit The spatio-temporal integration and management device for a plurality of videos according to claim 1, 2, or 3, further comprising:

11. The camera operation canceling unit includes a video frame deforming unit that deforms and displaces an adjacent image frame based on the camera operation information so as to cancel the deformation and displacement caused by the camera operation. The apparatus for integrating and managing spatiotemporal images of a plurality of videos according to claim 10.

12. The image data comparison unit according to claim 10, wherein the image data comparison unit includes difference processing means for performing difference processing of luminance and color information between adjacent image data whose camera operations have been canceled. A spatio-temporal integration and management device for multiple images.

13. The binarization processing unit that performs binarization processing on the comparison data, and a label that labels the binarized data that has been binarized. Attaching means; physical characteristic calculating means for calculating physical characteristics of the labeled area; collating for extracting a subject area by collating with a predetermined condition with respect to the calculated physical characteristics. The spatio-temporal integration / management apparatus for a plurality of videos according to claim 10, comprising:

14. The apparatus according to claim 1, wherein the subject / background separation unit includes a background extraction unit that extracts a background by subtracting a subject area for each frame extracted by the area extraction unit from a frame image. 14. The spatio-temporal integration and management device for a plurality of videos according to 13.

15. The subject motion information extraction unit performs a comparison with a condition given in advance to a comparison amount of a physical feature of a region temporally adjacent to a subject region for each frame extracted by the region extraction unit. 14. The spatio-temporal integration / management apparatus for a plurality of videos according to claim 13, further comprising matching means for associating subject information between frames.

16. The image capturing space re-synthesizing unit deforms and displaces adjacent image frames based on camera operation information detected by the image capturing state detecting unit, and superimposes the image frames as one continuous image capturing space. The spatio-temporal integration and management apparatus for a plurality of videos according to claim 1, 2, or 3, further comprising means.

17. The inter-shot relation calculating unit sets the photographing space to each photographing space created by the photographing space re-synthesizing unit so that the photographing space has the same size and position between the photographing spaces. 4. The apparatus according to claim 1, further comprising a photographing space converting means for performing conversion.

18. The image structure information managing / accumulating unit, wherein the extracted subject information, shooting state information, background information, shot relationship information, video data, shooting state information,
Means for compressing all or part of the shooting space of a plurality of videos recombined using background information, inter-shot relation information, and video data, and managing and accumulating them in time and space. The spatio-temporal integration and management device for a plurality of videos according to claim 1.

19. The image structure information managing / accumulating unit expands the extracted subject information in a space and compresses the data as a still image, and converts the shooting space of the recombined plurality of images into a still image as a still image. 19. The apparatus according to claim 18, further comprising: a compression unit.

20. A method for spatio-temporally integrating and managing a plurality of images, comprising: a step of reading image data and storing the image data as a data string; and a step of reading a data string from the image data string memory unit. A photographing state detecting step of detecting photographing state information including on / off information and camera operation information; an image dividing step of dividing an image of the data string into shots based on the camera on / off information; A subject and background separating step for separating the subject and the background for each frame of the video using various feature amounts; a subject motion information extracting step for associating the subject information separated for each frame between the frames; And a photographing space synthesizing step of recomposing a photographing space in which the video is photographed from a background separated for each frame, A shot relation calculation step of calculating a spatial shot relation of a plurality of shooting spaces recombined in the shooting space resynthesis process from the divided shots in the shooting space resynthesis process; information on the separated subject; A video structure information managing / accumulating process for managing / accumulating information of the attached subject, the photographing state information, the background information, and the information of the relation between shots. Integration and management methods.

21. A video structured information transmission / reception step of transmitting or receiving all or a part of the extracted subject information, photographing state information, background information, shot relation information, and video data. 21. The method according to claim 20, wherein the spatiotemporal integration and management of a plurality of videos are performed.

22. After the video structure information managing / accumulating process, one or a plurality of photographing spaces based on the information accumulated / managed according to one or both of a condition given in advance and a request from a user. 22. A re-synthesizing step of re-synthesizing one or a plurality of subjects, and a display or output step of displaying or outputting a video re-synthesized in the re-synthesizing step. Method for integrating and managing spatio-temporal images as described in 1.

23. The photographing state detecting step includes: a straight line component calculating step of calculating horizontal and vertical straight line components for an image frame read from an image data string memory step; Image data arrangement rearranging step of rearranging the temporal and spatial arrangement of the image including, image information filtering processing step of performing a filtering process on the image data obtained by the rearranging, and a characteristic from the result of the filtering process. 22. The method according to claim 20, further comprising: a feature extraction step of extracting; and a feature statistical analysis step of statistically analyzing the extracted features to detect camera on / off operation information and camera operation information. Or 2
2. The method for spatio-temporal integration and management of multiple images according to 2.

24. The image data arrangement rearranging process, wherein the image data sequence is rearranged and rearranged into a plurality of spatiotemporal sectional images including a normal line and a time axis of the image. Method for integrating and managing spatio-temporal images as described in 1.

25. The video information filtering process according to claim 23, wherein edges or lines of video information of the image data obtained by the rearrangement are detected.
Method for integrating and managing spatio-temporal images as described in 1.

26. The method according to claim 25, wherein in the feature extracting step, information on the detected edge or line is added in a normal direction of an image to extract a feature. Spatial integration, management method.

27. The feature statistical analysis process, wherein a straight line perpendicular to the time axis of the extracted feature expressed in a two-dimensional image having a time axis and a space axis is detected, and camera on / off information is calculated. 24. The method for spatio-temporal integration and management of a plurality of videos according to claim 23.

28. The feature statistic analysis step compares the spatial distribution of any two of the extracted features expressed in a two-dimensional image having a time axis and a spatial axis, and associates the spatial coordinates. 24. The spatio-temporal integration of a plurality of videos according to claim 23, further comprising: attaching a camera operation parameter, and statistically processing the associated spatial coordinates to calculate a camera operation parameter. Management method.

29. The subject / background separating step includes: a camera operation canceling step of canceling a camera operation between adjacent image data based on camera operation information detected in a shooting state detecting step; An image data comparison process of comparing the compared image data, a comparison image data comparison process of comparing the adjacent comparison image data, and a subject area extracted from the comparison data calculated from the comparison image data comparison process 22. An area extracting step, comprising:
2. The method for spatio-temporal integration and management of multiple images according to 2.

30. The camera operation canceling step according to claim 29, wherein an adjacent image frame is deformed / displaced based on the camera operation information so as to cancel the deformation / displacement due to the camera operation. Spatio-temporal integration and management method of multiple videos described.

31. The spatio-temporal image processing method according to claim 29, wherein, in the image data comparison process, difference processing of luminance and color information is performed between adjacent image data whose camera operations have been canceled. Integration and management methods.

32. The region extracting step includes: a binarizing process for performing a binarizing process on the comparison data; and a label for labeling the binarized data that has been subjected to the binarizing process. Attaching step, calculating a physical characteristic of the labeled area, and comparing the calculated physical characteristic with a predetermined condition to extract a subject area. 30. The method of claim 29, further comprising the steps of:

33. The plural images according to claim 32, wherein in the subject / background separating step, a background is extracted by subtracting a subject area for each frame extracted in the area extracting step from a frame image. Spatio-temporal integration, management method.

34. In the subject motion information extracting step, a comparison is made with a condition given in advance to a comparison amount of a physical feature of a temporally adjacent area of the subject area for each frame extracted in the area extracting step. 33. The spatio-temporal integration and management method for a plurality of videos according to claim 32, wherein subject information between the frames is associated with each other.

35. In the photographing space re-synthesizing step, adjacent image frames are deformed and displaced based on camera operation information detected in the photographing state detecting step, and are superimposed as one continuous photographing space. 23. The method according to claim 20, 21, or 22, wherein the plurality of videos are spatio-temporally integrated and managed.

36. In the inter-shot relation calculating step, the photographing space is created such that the size and position of the photographing space are equal between the photographing spaces created in the photographing space recombining step. 23. The spatio-temporal integration and management method for a plurality of videos according to claim 20, 21 or 22.

37. The image structure information managing / accumulating process includes:
The extracted subject information, shooting state information, background information,
All or a part of the shooting space of a plurality of videos recombined using the shot relationship information, video data, and shooting status information, background information, shot relationship information, and video data, or a part thereof, is subjected to spatio-temporal processing. 21. The spatio-temporal integration / management method for a plurality of videos according to claim 20, wherein the management / accumulation is performed.

38. The image structure information managing / accumulating process includes:
38. The method according to claim 37, wherein the extracted subject information is expanded in a space, the data is compressed as a still image, and the shooting space of the recombined plurality of videos is compressed as a still image. Spatial integration, management method.

39. A computer-readable recording medium on which a program for integrating and managing a plurality of videos in a spatiotemporal manner is recorded, wherein: an image data sequence storing step of reading video data and storing the data as a data sequence; A photographing state detecting step of reading a data string from a memory unit and detecting photographing state information including camera on / off information and camera operation information; and a video dividing step of dividing an image of the data string into shots based on the camera on / off information. A subject for separating a subject and a background for each frame of the video using the camera operation information and a physical feature amount, a background separation process, and a subject motion for associating the subject information separated for each frame between frames Extracting an image from the camera operation information and the background separated for each frame. A photographing space combining process of recombining the divided photographing spaces; and a shot space calculating a spatial inter-shot relationship of a plurality of photographing spaces recombined from the divided plurality of shots in the photographing space recombining process. A relation calculating step, and a video structure information managing / accumulating step of managing / accumulating the information of the separated subject, the information of the associated subject, the shooting state information, the background information, and the information of the relation between shots. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded, comprising:

40. The apparatus further comprises a video structured information transmission / reception step of transmitting or receiving all or a part of the extracted subject information, photographing state information, background information, shot relation information, and video data. 40. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program according to claim 39 are recorded.

41. After the video structure information managing / accumulating process, one or a plurality of photographing spaces based on the information accumulated / managed according to one or both of a condition given in advance and a request from a user. 41. A re-synthesizing step of re-synthesizing one or a plurality of subjects, and a display or output step of displaying or outputting a video re-synthesized in the re-synthesizing step. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded.

42. The photographing state detecting step includes: a linear component calculating step of calculating horizontal and vertical linear components for an image frame read from an image data string memory step; and the horizontal and vertical linear components. Image data arrangement rearranging step of rearranging the temporal and spatial arrangement of the image including, a video information filtering step of performing a filtering process on the image data obtained by the rearranging, and a feature from the result of the filtering process. 41. A feature extracting step of extracting, and a feature statistical analyzing step of statistically analyzing the extracted features to detect camera on / off operation information and camera operation information. Or 4
2. A computer-readable recording medium recording the spatio-temporal integration and management program of a plurality of videos according to 1.

43. The image data arrangement rearranging step, wherein the image data sequence is rearranged and rearranged into a plurality of spatiotemporal sectional images including a normal line and a time axis of the image. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded.

44. The video information filtering process, wherein an edge or a line of video information of the image data obtained by the rearrangement is detected.
A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded.

45. The method according to claim 44, wherein in the feature extracting step, information on the detected edge or line is added in a normal direction of an image to extract a feature. A computer-readable recording medium that records a space integration and management program.

46. In the feature statistical analysis step, a straight line perpendicular to a time axis of an extracted feature expressed in a two-dimensional image having a time axis and a space axis is detected, and camera on / off information is calculated. 43. A computer-readable recording medium on which a spatiotemporal integration of a plurality of videos and a management program according to claim 42 are recorded.

47. The feature statistic analysis step compares the spatial distribution of any two of the extracted features represented in a two-dimensional image having a time axis and a spatial axis with each other and associates spatial coordinates. 43. A spatio-temporal integration of a plurality of videos according to claim 42, further comprising: attaching a camera operation parameter to statistically process the associated spatial coordinates to calculate a camera operation parameter. A computer-readable recording medium recording a management program.

48. The subject / background separating step includes: a camera operation canceling step of canceling a camera operation between adjacent image data based on camera operation information detected in a shooting state detecting step; An image data comparison process of comparing the compared image data, a comparison image data comparison process of comparing the adjacent comparison image data, and a subject area extracted from the comparison data calculated from the comparison image data comparison process 41. An area extracting step, comprising:
2. A computer-readable recording medium recording the spatio-temporal integration and management program of a plurality of videos according to 1.

49. The method according to claim 48, wherein, in the camera operation offsetting step, an adjacent image frame is deformed / displaced based on the camera operation information so as to offset a deformation / displacement caused by a camera operation. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos described above and a management program are recorded.

50. The spatio-temporal image processing method according to claim 48, wherein, in the image data comparison process, difference processing of luminance and color information is performed between adjacent image data whose camera operations have been canceled. A computer-readable recording medium that records an integration and management program.

51. The area extracting step includes: a binarizing processing step of performing a binarizing process on the comparison data; and a label for labeling the binarized binary data. Attaching step, calculating a physical characteristic of the labeled area, and comparing the calculated physical characteristic with a predetermined condition to extract a subject area. 49. The computer-readable recording medium according to claim 48, further comprising the steps of: storing a spatio-temporal integration of a plurality of videos and a management program.

52. The plural images according to claim 51, wherein in the subject / background separating step, a background is extracted by subtracting a subject area for each frame extracted in the area extracting step from a frame image. A computer-readable recording medium that records the spatio-temporal integration and management program.

53. In the subject motion information extracting step, a comparison is made with a condition given in advance to a comparison amount of a physical feature of a temporally adjacent area of the subject area for each frame extracted in the area extracting step. 52. The computer-readable recording medium according to claim 51, wherein the image information is associated with frames, and the spatio-temporal integration of a plurality of videos and a management program are recorded.

54. In the photographing space resynthesizing step, adjacent image frames are deformed and displaced based on camera operation information detected in the photographing state detecting step, and are superimposed as one continuous photographing space. 42. A computer-readable recording medium on which a spatio-temporal integration of a plurality of videos and a management program are recorded according to claim 39, 40 or 41.

55. In the shot relationship calculation step, the photographing space for each shot created in the photographing space re-synthesizing step is set such that the size and position of the photographing space are equal between the photographing spaces. 42. The computer-readable recording medium according to claim 39, 40 or 41, wherein the computer-readable recording medium records a spatio-temporal integration of a plurality of videos and a management program.

56. The video structure information managing / accumulating process includes:
The extracted subject information, shooting state information, background information,
All or a part of the shooting space of a plurality of videos recombined using the shot relationship information, video data, and shooting state information, background information, shot relationship information, and video data, is compressed in time and space. 40. The computer-readable recording medium according to claim 39, wherein the computer-readable recording medium stores a management program.

57. The image structure information managing / accumulating process includes:
57. The method according to claim 56, wherein the extracted subject information is expanded in a space, the data is compressed as a still image, and the shooting space of the recombined plurality of videos is compressed as a still image. A computer-readable recording medium that records a space integration and management program.