JP4168940B2

JP4168940B2 - Video display system

Info

Publication number: JP4168940B2
Application number: JP2004016774A
Authority: JP
Inventors: 和也佐藤; 哲司羽下; 一裕阿部; 淑彦秦; 俊治野沢
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-01-26
Filing date: 2004-01-26
Publication date: 2008-10-22
Anticipated expiration: 2024-01-26
Also published as: JP2005210573A

Description

本発明は、映像表示システムに関し、特に、映像データの中に写っている物体を認識し、写っている物体が明示的に表示されるような映像表示システムに関するものである。 The present invention relates to a video display system, and more particularly to a video display system that recognizes an object shown in video data and explicitly displays the shot object.

従来の映像表示システムは、例えば特許文献１に示されるように、映像データを複数の期間に分割し、各期間の特徴を画像から計算して分類を行い、また各期間の映像を代表する代表画像を選択し、この代表画像を分類種類毎に表示を行うことにより、映像内容の把握および特定シーンの検索を容易にすることをねらっている。
また、例えば特許文献２などでは、映像データの輝度差分比較によって分割を行い、分割された期間毎にシーンのタイプを静止画シーン、カメラ動きシーン、対象物動きシーンなどに分類して、それぞれのタイプに応じた代表フレームを検出している。 A conventional video display system, for example, as disclosed in Patent Document 1, divides video data into a plurality of periods, performs classification by calculating the characteristics of each period from an image, and also represents a representative video of each period. By selecting an image and displaying this representative image for each classification type, it is intended to make it easy to grasp the contents of the video and search for a specific scene.
Further, for example, in Patent Document 2, the video data is divided by luminance difference comparison, and the scene types are classified into still image scenes, camera motion scenes, object motion scenes, and the like for each divided period. A representative frame corresponding to the type is detected.

しかし、これら従来の映像表示システムでは、複数期間へ分割する方法が一定間隔毎であったりカットの切り替わりを検出したりして分割することが想定されており、個々の物体の存在を対象としたものではない。また、各期間の代表画像についても単に各期間の先頭画像であったり、動き情報が最大値のものや画面類似度が最大のものであったりすることが想定されているだけで、どのような特徴を持った物体がその期間内に写っている映像なのかを把握することは難しく、例えばある挙動を特徴とした物体を検索したいというような目的に対しても十分ではない。 However, in these conventional video display systems, it is assumed that the method of dividing into a plurality of periods is to be divided at regular intervals or by detecting cut switching, and the target is the presence of individual objects. It is not a thing. Also, it is assumed that the representative image of each period is simply the top image of each period, or the motion information has the maximum value or the maximum screen similarity. It is difficult to grasp whether a featured object is an image captured within that period, and it is not sufficient for purposes such as searching for an object characterized by a certain behavior.

一方、特許文献３によると、映像データ中に写る物体を対象とした映像内容管理を行い、これを基に所望の映像部分を検索することをねらっている。すなわち、その要約書に記載されているように、監視カメラからのビデオ内の動く物体は、動きセグメント分割装置により、動きセグメント分割方法を用いてビデオ・シーケンス内で検出される。物体が、物体追跡装置にあるセグメント分割されたデータを通じて追跡される。物体並びにその動きを記述する注釈を付したグラフの形で、ビデオの記号表示が発生される。動き解析装置が物体を追跡した結果を解析し、幾つかの事象を記述する索引をグラフの動きに注釈として付ける。その後、物体の出現／消滅、置くこと／取り去ること、入ってくること／出てゆくこと及び動き／静止のような関心のある事象を確認する為の規則に基づいた分類方式を用いて、索引が付けられる。空間−時間的な質問並びに事象及び物体に基づく質問によって定められたビデオのクリップが呼出されて、所望のビデオを見せる。
しかし、写っている物体がその期間においてどのような特徴のものであったかを明示的に表示する代表画像を抽出するような仕組みは特に持っていない。 On the other hand, according to Patent Document 3, video content management is performed for an object appearing in video data, and a desired video portion is searched based on this. That is, as described in the abstract, moving objects in the video from the surveillance camera are detected in the video sequence by the motion segmentation device using a motion segmentation method. An object is tracked through segmented data in an object tracking device. A symbolic representation of the video is generated in the form of an annotated graph describing the object and its movement. The motion analyzer analyzes the results of tracking the object and annotates the motion of the graph with an index describing some events. The index is then used using a rule-based classification scheme to identify interesting events such as object appearance / disappearance, placing / removing, entering / exiting and movement / stillness. Is attached. A clip of the video defined by the spatio-temporal question and the event and object based question is called to show the desired video.
However, there is no particular mechanism for extracting a representative image that explicitly displays what kind of features the captured object has during that period.

特開２００３−２８３９６８号公報（第３頁、第１図）Japanese Unexamined Patent Publication No. 2003-283968 (page 3, FIG. 1) 特開平９−２３３４２２号公報（第１頁、第１および２図）JP-A-9-233422 (first page, FIGS. 1 and 2) 特開平１０−８４５２５号公報（第１頁、第５図）JP-A-10-84525 (first page, FIG. 5)

従来の映像表示システムでは、映像を分割する方法が、厳密には写っている個々の物体を単位としておらず、また表示される代表画像が、写っている物体の分割された映像期間における特徴を明示的に表現していない。従って、表示される代表画像がその期間の映像の中身を十分表現できず、各映像期間においてどのようなものが写っているかの内容を把握することを困難にさせている。従って、例えばある特徴を持つ物体が写っている映像部分を検索しようとする場合、検索条件に合ったものが写っていることを確認しながら所望の映像部分を絞り込むという作業についても非常に難しくしている。 In the conventional video display system, the method of dividing the video is not strictly based on the individual object shown in the image, and the representative image to be displayed has the characteristics in the divided video period of the imaged object. It is not expressed explicitly. Therefore, the representative image to be displayed cannot sufficiently represent the contents of the video during that period, making it difficult to grasp what is shown in each video period. Therefore, for example, when trying to search for a video portion in which an object having a certain feature is captured, it is very difficult to narrow down a desired video portion while confirming that a subject that meets the search condition is captured. ing.

本発明は、上記のような従来のものの問題点を解決するためになされたものであり、映像内容の把握をより短時間で容易にすることができる映像表示システムを提供することを目的とするものである。 The present invention has been made in order to solve the above-described problems of the prior art, and it is an object of the present invention to provide a video display system capable of facilitating the understanding of video content in a shorter time. Is.

本発明に係る映像表示システムは、映像データを入力する映像入力部と、入力された映像データを蓄積する映像記憶部と、映像データ中に写っている個々の物体を検出し、同一の物体が複数の画像フレームに渡って写っている期間を認識して１つの映像期間単位と捉えると共に、上記映像期間単位内に写っている一連の同一物体を１つのオブジェクトとして上記映像期間単位と関連付け、このオブジェクトの特徴をメタデータとして抽出するオブジェクト処理部と、上記各映像期間単位において、上記メタデータに基づいて上記映像記憶部に蓄積された映像データの中から所定の基準を満たす少なくとも１枚の画像フレームを当該映像期間単位における代表画像ベースとして抽出し、かつ当該映像期間単位と関連付けられたオブジェクトが強調されるように上記代表画像ベースを加工し、当該映像期間単位における代表画像を生成する代表画像処理部と、上記代表画像の表示を行う表示部とを備えたものである。 The video display system according to the present invention detects a video input unit that inputs video data, a video storage unit that stores the input video data, and individual objects appearing in the video data. Recognizing a period captured over a plurality of image frames and capturing it as one video period unit, and associating a series of identical objects captured within the video period unit with the video period unit as one object, An object processing unit for extracting object features as metadata; and at least one image satisfying a predetermined criterion from among video data stored in the video storage unit based on the metadata in each video period unit The frame is extracted as the representative image base in the video period unit, and the object associated with the video period unit is strong. Processed the representative image-based as a representative image processing unit for generating a representative image in the video period unit, in which a display unit for displaying the representative image.

この発明によれば、映像中に写っている個々の物体を対象に映像期間の分割を行い、対象となった各物体の期間中における特徴を強調して表示する機能を備えたので、映像中の個々の物体が続けて写っているという意味のある単位での映像期間を扱うことができ、かつその映像期間単位に分割された要因が明確に表示される。これによって映像中に写っているものをベースとした映像内容の把握がより短時間で容易に行えるという効果があり、従って、映像データ中から所望の映像を検索することも容易となる。 According to the present invention, the video period is divided for each object appearing in the video, and the feature of each target object during the period is emphasized and displayed. It is possible to handle a video period in a meaningful unit in which individual objects are continuously captured, and the factors divided into the video period units are clearly displayed. As a result, there is an effect that it is possible to easily grasp the content of the video based on what is shown in the video in a short time, and therefore it becomes easy to search for a desired video from the video data.

実施の形態１．
図１は、本発明の実施の形態１による映像表示システムの機能ブロック構成を示し、映像表示システムの構成要素と動作手順を説明するための図である。以下に、この図を基に、本実施の形態による映像表示システムの構成と動作について説明する。本システムは、パソコンやワークステーションのように、コンピュータによって処理が実行される環境において構築されるものである。 Embodiment 1 FIG.
FIG. 1 shows a functional block configuration of a video display system according to Embodiment 1 of the present invention, and is a diagram for explaining components and operation procedures of the video display system. The configuration and operation of the video display system according to this embodiment will be described below with reference to this figure. This system is constructed in an environment where processing is executed by a computer, such as a personal computer or a workstation.

映像入力部１０では、ビデオカメラや映像記録再生装置、もしくはＬＡＮ（Local Area Network）や公衆回線等のネットワークから伝送されてくる映像データの入力を受け付け、入力された映像データを後段の映像記憶部２０とオブジェクト処理部３０に渡す。
なお、映像入力部１０は、アナログのビデオカメラのような機器から映像データが入力される場合は、Ａ／Ｄ（Analog-Digital）変換機能を有している。一方、ＬＡＮ等を経由して既にディジタル化された映像データが入力されるような場合には、Ａ／Ｄ変換機能は不要であるが、物理的および論理的なインターフェースとして対応する通信プロトコル処理を行い、映像データ部分のみを取り出す機能を有している。 The video input unit 10 receives input of video data transmitted from a video camera, a video recording / playback apparatus, or a network such as a LAN (Local Area Network) or a public line, and the input video data is stored in a subsequent video storage unit. 20 and the object processing unit 30.
Note that the video input unit 10 has an A / D (Analog-Digital) conversion function when video data is input from a device such as an analog video camera. On the other hand, when video data that has already been digitized is input via a LAN or the like, the A / D conversion function is not necessary, but communication protocol processing corresponding to a physical and logical interface is performed. And has a function of extracting only the video data portion.

映像記憶部２０は、前記入力された映像データを蓄積しておくもので、後で代表画像処理部５０によって映像データの読み出しが可能なように、ＨＤＤ（Hard Disk Drive）やＲＡＭ（Random Access Memory）といった電子メディアによって実装する。 The video storage unit 20 stores the input video data, and an HDD (Hard Disk Drive) or a RAM (Random Access Memory) so that the representative image processing unit 50 can read the video data later. ) Using electronic media.

オブジェクト処理部３０は、オブジェクト抽出部３１とメタデータ抽出部３２とからなり、以下にその機能を説明する。
始めに、オブジェクト抽出部３１によって、映像入力部１０から渡された映像データ中に写っている個々の物体を背景画像から切り出し、さらに、同一の物体が複数の画像フレーム（以下、画像フレームのことを単にフレームということもある。）間に渡って写っている期間を認識して１つの映像期間単位にまとめると共に、この映像期間単位内に写っている一連の同一物体と認識したものを１つのオブジェクトとして関連づける。以下、このように、映像期間単位と関連付け付けられたオブジェクトのことを、対象オブジェクト、あるいは映像期間単位に対応するオブジェクトと言うこともある。
なお、映像からの物体の切り出しや同一物体が複数フレーム間に渡って写っていることの認識方法については、例えば特開２００１−０７６１５６号公報「画像監視装置」に見られるような、画像差分抽出やテンプレート探索といった画像処理技術を用いることによって自動的に行うことを、一般には想定している。ただし、必要に応じて人手を介した処理を交えることも考えられる。 The object processing unit 30 includes an object extraction unit 31 and a metadata extraction unit 32, and the function thereof will be described below.
First, the object extraction unit 31 cuts out each object shown in the video data passed from the video input unit 10 from the background image, and the same object is a plurality of image frames (hereinafter referred to as image frames). Is also simply referred to as a frame.) Recognizing the periods that are captured in between, and grouping them into one video period unit, and recognizing a series of identical objects that are captured within this video period unit as one Associate as an object. Hereinafter, the object associated with the video period unit in this way may be referred to as a target object or an object corresponding to the video period unit.
Note that, for the method of extracting an object from a video and recognizing that the same object is captured between a plurality of frames, for example, image difference extraction as disclosed in “Image Monitoring Device” of Japanese Patent Application Laid-Open No. 2001-076156. In general, it is assumed that this is automatically performed by using an image processing technique such as template search. However, it is conceivable to use manual processing as necessary.

次に、メタデータ抽出部３２によって、この１つの映像期間単位における対象オブジェクトの特徴をメタデータの形で抽出する。抽出する方法としては、前述の特開２００１−０７６１５６号公報に見られるような画像処理技術を用いることにより、物体の面積や位置や移動軌跡といった特徴の抽出を自動的に行うことを、一般には想定している。ただし、これについても必要に応じて人手を介した処理を交えることも考えられる。
抽出されたメタデータは、後続の処理がしやすい形で保持する。例えばこのシステムを実現する際にファイルシステムを持つコンピュータを利用するような場合、メタデータファイルとして保存するなどの方法をとる。 Next, the metadata extraction unit 32 extracts the characteristics of the target object in one video period unit in the form of metadata. As an extraction method, it is generally performed to automatically extract features such as an area, a position, and a movement locus of an object by using an image processing technique such as that disclosed in Japanese Patent Laid-Open No. 2001-076156. Assumed. However, it is also conceivable to use manual processing for this as necessary.
The extracted metadata is held in a form that facilitates subsequent processing. For example, when a computer having a file system is used to realize this system, a method such as saving as a metadata file is used.

メタデータ記憶部４０では、メタデータ抽出部３２で抽出されたメタデータを記憶する。図２は、メタデータ記憶部４０において管理されるメタデータの一例を表のイメージで表したものである。抽出されるメタデータの例としては、例えば１つのオブジェクトが映像に写り始めた時点である先頭フレーム情報（ここでのフレーム情報とは該当フレームを特定するポインタ情報であって、フレーム番号やタイムスタンプなどによって表現する。）や、オブジェクトが最後に写っている最終フレーム情報や、該当映像期間単位内にそのオブジェクトが最大の面積で写っている時とかオブジェクトが人間の場合なら顔が最もよく写っている時といったような条件にあてはまる特定の代表フレーム情報などがある。あるいは、オブジェクトそのものの特徴として、期間内におけるオブジェクトの持つ色情報や、平均の動きベクトル量や、どの位置を通過していったかといった軌跡情報や、通過場所における滞留時間情報といったものも挙げられる。また、さらに詳細な情報の例としては、オブジェクトが写っている各フレーム単位における、オブジェクトが写っている画面上の位置座標情報（位置とは、例えば、重心、オブジェクトに外接する楕円形や四角形などの閉曲線、オブジェクトの面積のある割合を領域内に含む楕円形や四角形などの閉曲線、などを対象とした位置である。）、オブジェクト領域部分すなわちオブジェクトが例えば人である場合には人の形をした領域のビットマップ情報、などがある。 The metadata storage unit 40 stores the metadata extracted by the metadata extraction unit 32. FIG. 2 shows an example of metadata managed in the metadata storage unit 40 in the form of a table. As an example of the extracted metadata, for example, first frame information at the time when one object starts to appear in the video (frame information here is pointer information for identifying the corresponding frame, and includes a frame number and a time stamp. Etc.), the last frame information in which the object is shown at the end, or when the object is shown in the maximum area within the corresponding video period unit, or when the object is a human, the face is shown best. There are specific representative frame information that applies to conditions such as Alternatively, as the characteristics of the object itself, color information that the object has in the period, average motion vector amount, trajectory information such as which position it has passed through, and dwell time information at the passage location can also be mentioned. In addition, as an example of more detailed information, in each frame unit in which the object is captured, position coordinate information on the screen where the object is captured (the position is, for example, the center of gravity, an ellipse or a rectangle circumscribing the object, etc. Closed curve, a position that targets a closed curve such as an ellipse or a rectangle that contains a certain percentage of the area of the object in the region, etc.), if the object region part, ie the object is a person, for example, the shape of a person Bitmap information of the selected area.

また、前述のような画像処理から直接導かれる特徴量に対して、間接的な特徴量としてのメタデータの一例を以下に挙げる。映像中より抽出されたオブジェクトがその形状や動作の特徴を基にして、例えば人なのか車なのかといったといった種類分けや、人の場合に立っているのか寝ているのかといった状態や、さらに例えば人の場合に体型や顔の形状などから人種や性別や年齢といった情報などが考えられる。あるいは、人の顔の形状などの特徴を基にあらかじめシステムに登録されている個人を特定できたり、映像データ情報とは別に、個人ＩＤを識別するＩＤカードやタグなどから個人を特定する情報が得られたりする場合には、対象オブジェクトが持っている個人ＩＤや名称をメタデータとして持つことも考えられる。これらの特徴を抽出するには、通常は、専用の画像処理アルゴリズムをメタデータ抽出部３２内に持つことになるが、場合によっては人手を介した処理を交えることも考えられる。 In addition, an example of metadata as an indirect feature amount with respect to the feature amount directly derived from the image processing as described above is given below. Based on the shape and motion characteristics of the object extracted from the video, for example, whether it is a person or a car, whether it is a person standing or sleeping, etc. In the case of a person, information such as race, gender and age can be considered from the body shape and face shape. Alternatively, an individual registered in the system in advance based on characteristics such as the shape of a person's face can be specified, or information for specifying an individual from an ID card or tag for identifying a personal ID is provided separately from video data information. If it is obtained, it may be possible to have the personal ID or name of the target object as metadata. In order to extract these features, usually, a dedicated image processing algorithm is provided in the metadata extraction unit 32. However, in some cases, it is also conceivable to perform processing through manual operation.

代表画像処理部５０は、代表画像ベース選択部５１とオブジェクト強調部５２とからなり、各オブジェクトが含まれる各映像期間単位の特徴を簡単に表現するための代表画像を生成する。ただし、本発明における代表画像とは、単に１枚の静止画像によって成り立っているという制限は特に無く、時間的に変化する動画像のような性質のものも含めた表現形態をも全て代表画像と呼ぶことにしている。
以下にその機能を説明する。 The representative image processing unit 50 includes a representative image base selecting unit 51 and an object emphasizing unit 52, and generates a representative image for easily expressing the characteristics of each video period unit including each object. However, the representative image in the present invention is not particularly limited to be composed of only one still image, and all representation forms including those having properties such as time-varying moving images are all representative images. I will call it.
The function will be described below.

始めに、代表画像ベース選択部５１によって、代表画像を生成するためにベースとなるフレームを選択する。すなわち、各映像期間単位において、メタデータに基づいて映像記憶部２０に蓄積された映像データの中から所定の基準を満たす少なくとも１枚の画像フレームを当該映像期間単位における代表画像ベースとして抽出する。
所定の基準を満たすとは、例えば、メタデータ抽出処理の際にオブジェクトの画面上の面積が最大となるフレームをメタデータの項目の１つとしてあらかじめ抽出しておいた場合、これらのフレームを代表画像ベースに選ぶことができる。また、このようにあらかじめメタデータの中に代表画像ベースとなるべきフレームの抽出が完了している必要はなく、例えばメタデータに各フレームにおけるオブジェクトの面積が保存されている場合なら、代表画像ベース選択部５１において、この面積がメタデータ中で最大の値のものを計算してから代表画像ベースを決定しても構わない。 First, the representative image base selection unit 51 selects a frame serving as a base for generating a representative image. That is, in each video period unit, at least one image frame satisfying a predetermined standard is extracted as the representative image base in the video period unit from the video data stored in the video storage unit 20 based on the metadata.
Satisfying a predetermined standard means that, for example, when a frame having the maximum area on the screen of an object is extracted in advance as one of the metadata items in the metadata extraction process, these frames are represented. You can choose an image base. In addition, it is not necessary that the extraction of the frame to be the representative image base in the metadata is completed in advance as described above. For example, if the area of the object in each frame is stored in the metadata, the representative image base The selection unit 51 may determine the representative image base after calculating the area having the maximum value in the metadata.

また、この他の代表画像ベース選択の基準例として、オブジェクトの面積が最大ではなく平均となる時のもの、面積ではなく位置が画面中央など所定の場所に最も近い時のもの、オブジェクトの動き量が少ない時のもの、オブジェクトが特定の行動や動作を行った時のもの、オブジェクトが正面を向いている時のもの、人の場合なら顔がよく写っている時のもの、全身が写っている時のもの、といったような基準でもよい。
また、オブジェクトの特徴の表し方として、移動軌跡などの動きなど見せたい場合には、代表画像ベースとして複数のフレームを選択する場合もある。 Other representative image base selection examples include when the object area is not the maximum but the average, when the object is not the area but the position is closest to a predetermined location such as the center of the screen, and the amount of object movement When the object is low, when the object performs a specific action or action, when the object is facing the front, in the case of a person, when the face is well reflected, the whole body is reflected Standards such as those of time may be used.
In addition, as a way of expressing the feature of an object, when it is desired to show a movement such as a movement locus, a plurality of frames may be selected as a representative image base.

代表画像処理部５０の機能として、次に、オブジェクト強調部５２によって、写っているオブジェクト、すなわちその映像期間単位と関連付けられたオブジェクトの特徴が強調されるよう、選択された代表画像ベースに対して加工を施す。
例えば、メタデータとして、画面上におけるオブジェクトに外接する閉曲線（例えば、四角形や楕円形など）の位置座標を持っている場合、先に選択した代表画像ベース上にオブジェクトに外接する閉曲線でオブジェクトを囲むように重畳表示を加えたものを改めて代表画像とする。
あるいは、別の例としては、オブジェクト領域をある割合で含む閉曲線（例えば、四角形や楕円形など）で囲んだり、オブジェクトの領域部分だけ、すなわちオブジェクトが例えば人である場合には人の形をしている領域だけを囲んだり目立つ色で着色したり明度や彩度を変化させたりしてもよい。 As a function of the representative image processing unit 50, the object enhancement unit 52 then applies the selected representative image base so as to emphasize the characteristics of the captured object, that is, the object associated with the video period unit. Apply processing.
For example, when metadata has a position coordinate of a closed curve (for example, a rectangle or an ellipse) circumscribing the object on the screen, the object is surrounded by a closed curve circumscribing the object on the representative image base selected earlier. In this way, the image added with the superimposed display is made a representative image again.
Alternatively, as another example, the object area may be surrounded by a closed curve (for example, a rectangle or an ellipse), or only the area of the object, that is, if the object is, for example, a person. It is also possible to surround only a region that is covered or to be colored with a conspicuous color, or to change brightness or saturation.

あるいは、オブジェクトの領域部分やオブジェクトの外接閉曲線（例えば、四角形や楕円形など）内の領域などを拡大して表示することによる強調も可能である。この場合、元の代表画像ベース上で拡大したオブジェクト領域を再重畳して表示したものを代表画像としてもよいし、オブジェクトの外接閉曲線内の領域を最大限に拡大した部分のみを取り出して改めて代表画像と指定し直してもよい。あるいは元の代表画像と拡大したオブジェクト画像とを二つ並列に表示して代表画像としてもよい。
あるいは、映像データの圧縮方法において部分的に解像度を変化させられるような方式が用いられておれば、オブジェクトの領域だけを高精細に表示するといった方法も考えられる。 Alternatively, emphasis can be achieved by enlarging and displaying an object region portion or a region within a circumscribed curve (for example, a rectangle or an ellipse) of the object. In this case, a representative image may be obtained by re-superimposing and displaying the enlarged object area on the original representative image base, or only the part in which the area in the circumscribed curve of the object is enlarged to the maximum is taken out and the representative area is represented again. You may re-specify as an image. Alternatively, the original representative image and the enlarged object image may be displayed in parallel to form a representative image.
Alternatively, if a method in which the resolution is partially changed in the video data compression method is used, a method of displaying only the object region with high definition is also conceivable.

上記のような例は、いずれもある時刻に撮影された１枚のフレームだけを基にして、映像期間単位と関連付けられたオブジェクトが強調されるように代表画像ベースを加工し、静止状態の代表画像を生成する例であるが、別の例として、オブジェクトの移動軌跡など動き情報を見せたい場合のオブジェクト強調方法の例を以下に挙げる。
このようなオブジェクトの時間的変化を示す代表画像の生成は、例えば、所定の時間間隔で複数のフレームを代表画像ベースとして選択し、どれか１枚のフレームを背景画像として利用した上に、各フレームからオブジェクト領域部分のみを切り出した画像をそれぞれ順次重畳と削除を繰り返すことによって、オブジェクト領域部分だけが背景画像の上を動画ふうに移動しているように表示させたり、重畳表示した画像は消さずに順次重畳を繰り返すことによってストロボ写真のような移動軌跡が残る形で表示したり、オブジェクト領域の画像を重畳するのは最初と最後といったような代表的な箇所だけにした上で、折れ線や曲線で表された移動軌跡を重畳表示させたり、また、例えば選択した複数のフレームを単に準動画としてコマ送り的な表示をさせたり、といった方法である。これらの方法でも、もちろん、前に述べたように各オブジェクト領域の周囲を枠で囲んで強調するといった方法と並行して用いてもよい。 In the above examples, the representative image base is processed based on only one frame shot at a certain time so that the object associated with the video period unit is emphasized, and the representative in the stationary state Although it is an example which produces | generates an image, as another example, the example of the object emphasis method when wanting to show motion information, such as a movement locus | trajectory of an object, is given below.
The generation of the representative image indicating the temporal change of the object is performed by, for example, selecting a plurality of frames as a representative image base at a predetermined time interval and using any one frame as a background image. By repeatedly superimposing and deleting images that are cut out of only the object area from the frame, only the object area is displayed as if it is moving on the background image, or the superimposed image is erased. By repeating the superimposing process in order, the moving trajectory like a stroboscopic photograph is displayed, or the image of the object area is superimposed only on representative points such as the first and last, The movement trajectory represented by a curve can be superimposed and displayed, for example, multiple frames selected can be simply framed as a quasi-video Or to the display, it is a method such as. Of course, these methods may be used in parallel with the method of emphasizing each object region by surrounding it with a frame as described above.

以上に示した例は全てグラフィカルな方法による強調表現であるが、メタデータとして保持している特徴情報をテキストによって代表画像上に重畳表示したり、代表画像の周辺に追加表示したり、音声による表現を行ったりすることによって、オブジェクトの特徴を強調的に表示するという方法も考えられる。 The examples shown above are all emphasized representations using a graphical method, but feature information held as metadata is superimposed on the representative image by text, displayed additionally around the representative image, or by voice A method of highlighting the features of an object by expressing it is also conceivable.

また、以上で述べたような方法のうち、複数の表示方法を同時に表示してもよい。複数の表示方法を同時に表現するという方法は、１つは、例えば、Ａという方法で作成した代表画像の隣にＢという方法で作成した代表画像を並べて表示するという表現方法である。あるいは、Ａという方法が上に述べた１枚のフレームだけで代表画像を生成するような場合で、Ｂという方法が上に述べた動き情報を見せる方法のような場合には、Ａによって生成された代表画像の上にさらにＢの方法でオブジェクトの部分領域を重畳表示する、といった方法でもよい。 In addition, among the methods described above, a plurality of display methods may be displayed simultaneously. One of the methods of simultaneously expressing a plurality of display methods is an expression method in which, for example, a representative image created by the method B is displayed side by side next to the representative image created by the method A. Alternatively, if the method A is to generate a representative image with only one frame as described above, and the method B is a method to show motion information as described above, it is generated by A. Alternatively, a method of superimposing and displaying a partial area of the object by the method B on the representative image may be used.

一方、条件入力部６０は、本実施の形態においては検索条件が入力される検索条件入力部と表示方法に関する条件が入力される表示条件入力部とを兼ねており、映像表示システムを利用するユーザからの表示方法に関する条件や、検索を行うアプリケーションの場合にはその検索条件を受け付け、表示条件については代表画像処理部５０に情報を渡し、検索条件についてはメタデータ評価部７０に情報を渡す。入力の仕方としては、例えば、キーボードやマウス、タッチパネル等を使う一般のパソコンなどに見られるユーザインタフェースや、専用のボタンやダイヤル等の入力装置を介して行う。 On the other hand, the condition input unit 60 serves as a search condition input unit for inputting a search condition and a display condition input unit for inputting a condition related to a display method in the present embodiment, and is a user who uses the video display system. In the case of an application for performing a search, the search condition is accepted, information is displayed to the representative image processing unit 50 for the display condition, and information is passed to the metadata evaluation unit 70 for the search condition. As an input method, for example, the input is performed via a user interface found on a general personal computer using a keyboard, mouse, touch panel, or the like, or an input device such as a dedicated button or dial.

条件入力部６０から表示に関する条件を受け取った場合は、代表画像ベース選択部５１においてはその条件に合うように選択を行い、オブジェクト強調部５２においてはその条件に合うように代表画像の加工を行う。
ここでいう表示に関する条件とは、代表画像処理部５０の説明の所で述べたような表示方法についてのバリエーションである。例えば、代表画像ベースとしてオブジェクトの面積が最大のものを選ぶ、大きさではなく画面上の位置が中心に近いものを選ぶ、といった静止画レベルでの代表画像ベースの選択基準を指定する。あるいは代表画像におけるオブジェクトの強調表現の方法として、例えば、オブジェクト領域の周りを枠で囲む、オブジェクト領域内を着色する、オブジェクト領域部分を最大限に拡大したものを代表画像として使用する、といった強調表示方法を指定してもよい。あるいはオブジェクトの動きを表現したい場合、ストロボ写真的な順次表示にする、オブジェクト領域を最大限拡大した状態で準動画ふうな順次表示を行う、曲線で軌跡を重畳表示する、といった表示方法を指定してもよい。 When a display condition is received from the condition input unit 60, the representative image base selection unit 51 performs selection so as to meet the condition, and the object enhancement unit 52 performs processing of the representative image so as to meet the condition. .
The display conditions here are variations on the display method as described in the description of the representative image processing unit 50. For example, a representative image base selection criterion at the still image level, such as selecting the object with the largest area as the representative image base, or selecting the object whose position on the screen is close to the center instead of the size is designated. Alternatively, as a method of emphasizing the object in the representative image, for example, highlighting such as surrounding the object area with a frame, coloring the object area, or using the object area portion that is maximized as the representative image is used. You may specify the method. Or, if you want to express the movement of an object, specify the display method, such as strobe photo sequential display, sequential display like a quasi-movie with the object area expanded to the maximum extent, or display of traces superimposed on a curve. May be.

条件入力部６０から検索に関する条件を受け取った場合、その情報はメタデータ評価部７０に渡され、ここでメタデータの内容と入力された条件とを照合し、条件にあったメタデータを持つオブジェクトのリストを代表画像処理部５０に伝え、代表画像処理部５０は条件にあったオブジェクトに関して代表画像の生成を行う。 When a search condition is received from the condition input unit 60, the information is passed to the metadata evaluation unit 70, where the content of the metadata is collated with the input condition, and an object having metadata that meets the condition. The representative image processing unit 50 generates a representative image for an object that satisfies the conditions.

条件検索の例として、例えば、メタデータの内容にオブジェクトの平均移動ベクトル情報を持っている場合を挙げて説明する。例えば単純に、オブジェクトが画面上に出現してから退出するまでの平均の移動ベクトルを画面左右方向の軸に投射した値をメタデータに持つものとする。すなわち、例えば右側に動いた場合は正の数値、左側に動いた場合は負の数値、平均した左右方向の動きがない場合は零、というような数値データを持つ。
映像中に写っている人の中で“右向きに移動している人”という条件がユーザから与えられた場合、メタデータの中でオブジェクト平均移動ベクトルの値について調べ、値が正の数値であるオブジェクトだけを検索条件に合致したものとして抽出する。もっとも、厳密に正か正でないかで判定するのではなく、零に近い値で判定閾値を設けるなどしていてもよい。
もちろんこのような単純な平均移動ベクトルを用いる場合を例にとっても、左右方向だけでなく上下斜めなどの方向であってもよく、あるいは画面上に何が写っているのかという環境情報を与えてやることにより、右に移動するものという特徴は例えば玄関に入るという行動、左に移動するものという特徴は玄関から出るという行動に置き換えてやることもできる。 As an example of the condition search, for example, a case where the average movement vector information of the object is included in the content of the metadata will be described. For example, it is assumed that the metadata has a value obtained by projecting an average movement vector from the appearance of an object on the screen to the exit on the horizontal axis of the screen. That is, for example, numerical data such as a positive numerical value when moving to the right side, a negative numerical value when moving to the left side, and zero when there is no average left-right movement is included.
When the user gives a condition of “person moving to the right” among the people shown in the video, the value of the object average movement vector is examined in the metadata, and the value is a positive number. Extract only objects that match the search criteria. Of course, the determination threshold value may be provided with a value close to zero instead of determining whether it is strictly positive or not positive.
Of course, even in the case of using such a simple average movement vector, not only the horizontal direction but also the vertical and diagonal directions may be used, or environmental information about what is shown on the screen will be given. Thus, the feature of moving to the right can be replaced with, for example, the action of entering the entrance, and the feature of moving to the left can be replaced with the action of exiting the entrance.

最後に、表示部８０において、代表画像処理部５０によって生成された各オブジェクトの代表画像をディスプレイ等の表示装置に対して表示を行う。表示すべきオブジェクトが複数ある場合は、各映像期間の開始時刻や代表画像ベースの時刻などを基準に時刻順に並べたり、あるいはただ時刻によって並べる順番を決めるだけでなく時間軸上の相対位置に対応した間隔で代表画像を表示する配置位置を決めたりしてもよい。 Finally, the display unit 80 displays the representative image of each object generated by the representative image processing unit 50 on a display device such as a display. When there are multiple objects to be displayed, they are arranged in order of time based on the start time of each video period, representative image base time, etc., or not only the order of arrangement based on the time but also the relative position on the time axis The arrangement position for displaying the representative image may be determined at the intervals.

また、前記メタデータ評価部７０において条件に合ったオブジェクトのリストを代表画像処理部５０に伝えると同時に、条件照合を行った場合の照合度合などを点数化した値や、複数の検索条件が与えられた場合には各々の条件毎に合致したオブジェクトのリストを、表示部８０に伝えたりするようにしてもよい。
その場合、例えば照合度合の点数が高いオブジェクトから順に表示を行ったり、複数の条件がある場合には個別の条件に合ったオブジェクト毎に表示したりするなどの工夫が可能になる。照合度合の点数の例としては、例えば上記のように右向きに移動している人を検索する場合、オブジェクト平均移動ベクトルの値が大きいものほど右向きに大きく動いたということになるので、複数の表示オブジェクトがある場合、単に時刻順ではなくこの数値順に並べて表示するという方法もある。 In addition, the metadata evaluation unit 70 informs the representative image processing unit 50 of a list of objects that meet the conditions, and at the same time, gives a value obtained by scoring the matching degree when the condition matching is performed, and a plurality of search conditions. If it is, a list of objects that match each condition may be transmitted to the display unit 80.
In that case, for example, it is possible to display in order from the object with the highest matching score, or to display for each object that meets the individual conditions when there are a plurality of conditions. As an example of the score of the matching degree, for example, when searching for a person moving rightward as described above, the larger the object average movement vector value, the larger the rightward movement, so that a plurality of displays If there are objects, there is also a method of displaying them in this numerical order instead of simply in time order.

図３は、これまで説明した本実施の形態による映像表示システムの各機能と処理に関して、簡単に時系列順に並べ直したフローチャートであり、この図を基に本実施の形態による映像表示システムの動作をもう一度説明する。
まず、映像入力部１０によって入力された映像データが映像記憶部２０によって記憶される（ステップＳＴ１）。次に、映像データからオブジェクト抽出部３１によってオブジェクトを抽出し（ステップＳＴ２）、そのオブジェクトの特徴量をメタデータ抽出部３２によってメタデータとして抽出し、メタデータ記憶部４０において記憶する（ステップＳＴ３）。 FIG. 3 is a flowchart in which the functions and processes of the video display system according to the present embodiment described so far are simply rearranged in chronological order, and the operation of the video display system according to the present embodiment based on this figure. Will be explained again.
First, the video data input by the video input unit 10 is stored in the video storage unit 20 (step ST1). Next, an object is extracted from the video data by the object extraction unit 31 (step ST2), and the feature amount of the object is extracted as metadata by the metadata extraction unit 32 and stored in the metadata storage unit 40 (step ST3). .

一方、検索条件の入力と表示条件の入力についてはユーザからの入力であるため一般に時系列上の位置は不定であり、また場合によっては無かったりもする。例えば、与える条件は全てシステム稼働初期において設定しておいたり、絞り込み条件無しで全てのオブジェクトを表示させたりするような使い方も考えられるからである。ここでは、図３で示した例で説明すると、まず条件入力部６０において検索条件を入力し（ステップＳＴ４）、メタデータ評価部７０において表示すべきオブジェクトを決定する（ステップＳＴ５）。次に、条件入力部６０において表示条件を入力し（ステップＳＴ６）、この表示条件を基に代表画像ベース選択部５１によって代表画像のベースを選択し（ステップＳＴ７）、また、オブジェクト強調部５２において代表画像を加工生成し（ステップＳＴ８）、最後に、表示部８０によって代表画像を表示する（ステップＳＴ９）。 On the other hand, since the input of the search condition and the input of the display condition are input from the user, the position on the time series is generally indefinite and may not exist depending on the case. For example, all the conditions to be given can be set at the initial operation of the system, or usage can be considered in which all objects are displayed without any narrowing conditions. Here, in the example shown in FIG. 3, the search condition is first input in the condition input unit 60 (step ST4), and the object to be displayed is determined in the metadata evaluation unit 70 (step ST5). Next, the display condition is input in the condition input unit 60 (step ST6), the base of the representative image is selected by the representative image base selection unit 51 based on the display condition (step ST7), and the object enhancement unit 52 The representative image is processed and generated (step ST8). Finally, the representative image is displayed on the display unit 80 (step ST9).

次に、図４を用いて、オブジェクトと、そのオブジェクトに対応する映像期間単位と代表画像との関係について、より詳細に説明する。
まず、オブジェクトの抽出について述べる。図４では、フレーム番号１〜５の５枚の映像フレームに２人の人物が写っている。詳細には、１枚目のフレーム（フレーム１）には１人、２枚目から４枚目までのフレーム（フレーム２〜４）には２人、５枚目のフレーム（フレーム５）には１人の人が写っており、これは画像差分抽出処理などによって検出される。
さらに、テンプレート探索処理などによって、１枚目から４枚目までのフレーム（フレーム１〜４）に同一人物が続けて写っており、２枚目から５枚目までのフレーム（フレーム２〜５）にはもう１人別の人物が続けて写っているということが認識される。この１人目の人をここではオブジェクト１、２人目の人をオブジェクト２と呼び、それぞれに対応する映像期間単位は、オブジェクト１についてはフレーム１からフレーム４まで、オブジェクト２についてはフレーム２からフレーム５までとなる。
この例のように、二つの映像期間単位が時間的に重なりを持っていてもよい。すなわち映像データを分割すると言っても複数の映像期間単位に同一の映像部分が含まれることもある。
そして次に、このオブジェクト１とオブジェクト２に対して、各々の映像期間（フレーム１からフレーム４まで、およびフレーム２からフレーム５まで）内における特徴を抽出したものが、メタデータである。 Next, the relationship between the object, the video period unit corresponding to the object, and the representative image will be described in more detail with reference to FIG.
First, object extraction will be described. In FIG. 4, two persons are shown in five video frames with frame numbers 1 to 5. Specifically, the first frame (frame 1) is one person, the second to fourth frames (frames 2 to 4) are two persons, and the fifth frame (frame 5) is One person is shown, and this is detected by image difference extraction processing or the like.
Further, the same person is continuously captured in the first to fourth frames (frames 1 to 4) by the template search process or the like, and the second to fifth frames (frames 2 to 5). It is recognized that another person is still in the image. The first person is called object 1 and the second person is called object 2, and the corresponding video period units are frame 1 to frame 4 for object 1 and frame 2 to frame 5 for object 2. Up to.
As in this example, two video period units may overlap in time. That is, even if the video data is divided, the same video portion may be included in a plurality of video period units.
Next, metadata obtained by extracting features in each video period (from frame 1 to frame 4 and from frame 2 to frame 5) is extracted from object 1 and object 2.

次に、代表画像の生成について述べる。図４の例では、オブジェクト１では、フレーム１からフレーム４までの中から、まずフレーム１をオブジェクト１の代表画像ベースとして選択し、さらにフレーム１の中のオブジェクト１が写っている領域に外接する四角形（枠に相当し、図４では長方形である。）を太線で重畳表示することによってオブジェクトを強調し、代表画像としている。
同様に、オブジェクト２の例では、まずフレーム２からフレーム５までの中からフレーム４を代表画像ベースとして選択し、さらにオブジェクト２の写っている領域に外接する四角形（枠に相当し、図４では長方形である。）を太線で重畳表示することによってオブジェクト領域を強調し、代表画像としている。 Next, generation of a representative image will be described. In the example of FIG. 4, the object 1 first selects the frame 1 as the representative image base of the object 1 from the frames 1 to 4, and further circumscribes the area in the frame 1 where the object 1 is shown. An object is emphasized by superimposing and displaying a quadrangle (corresponding to a frame, which is a rectangle in FIG. 4) with a thick line, and a representative image is obtained.
Similarly, in the example of the object 2, first, the frame 4 is selected from the frames 2 to 5 as the representative image base, and further, a rectangle circumscribing the area where the object 2 is captured (corresponding to a frame, in FIG. The object region is emphasized by superimposing the rectangle with a thick line to form a representative image.

図５は、本実施の形態における画面表示の一例である。ある映像データに対して、ユーザが、検索時刻の範囲を指定し、さらに“右向きに移動する人”という条件を指定し、その条件に合った２０個のオブジェクトのうち、最初の５個を表示しているという状態を表している。５個のオブジェクトの持つそれぞれのメタデータのうち、先頭フレーム時刻の早いものから順に上から代表画像を並べサムネイル表示している。また、代表画像の表示内容としては、以下に示すように、複数の表示方法を同時に並べて表示している。 FIG. 5 is an example of a screen display in the present embodiment. For a certain video data, the user specifies the search time range, specifies the condition of “person moving right”, and displays the first 5 of the 20 objects that meet the condition. It represents the state of being. Among the metadata possessed by the five objects, representative images are arranged in thumbnail order from the top in the order of the first frame time. As the display contents of the representative image, a plurality of display methods are displayed side by side as shown below.

１つは、各映像期間単位内において対応する各オブジェクトが最大面積で写っているフレームを代表画像ベースとし、オブジェクトを強調する加工として、オブジェクトの領域を囲む外接四角形（枠）を重畳表示した１枚の静止画を表示している。各画像中に複数の人が写っているものもあるが、強調表示されているものは対象オブジェクトのみであり、その他の人はそのオブジェクトに対しては背景の一部という扱いとしている。これを静止状態を示す代表画像１とする。
二つ目は、その隣（図５に向かって右側）に対象オブジェクトの領域のみを最大限拡大したものを表示している。さらにこの表示に関しては、単に静止画としてではなく、映像期間単位の時間範囲に渡る複数のフレームからオブジェクト領域部分を切り出して順次表示を行わせている。映像期間が長い場合には、期間中全てのフレームではなく所定の間隔でフレームを間引くことによって準動画ふうの動きにさせる。これをオブジェクトの時間的変化を示す代表画像２とする。
またさらにその隣（図５に向かって右側）には、各オブジェクトに関するその他のメタデータ情報として、オブジェクトの先頭時刻と最終時刻、代表画像１（静止画）のベースとなったフレームの時刻などもテキストで併せて表記させている。 One is a method in which a frame in which each corresponding object is captured in the maximum area within each video period unit is used as a representative image base, and a circumscribed rectangle (frame) surrounding the region of the object is superimposed and displayed as processing for enhancing the object. A still image is displayed. Although some images include a plurality of people, only the target object is highlighted, and other people treat the object as a part of the background. This is a representative image 1 indicating a stationary state.
The second is a display in which only the area of the target object is enlarged to the maximum (next right side in FIG. 5). Further, regarding this display, the object region portion is cut out from a plurality of frames over a time range in units of video periods, not simply as a still image, and sequentially displayed. In the case where the video period is long, the movement is made like a quasi-moving picture by thinning out frames at a predetermined interval instead of all the frames during the period. This is a representative image 2 showing the temporal change of the object.
Next to that (on the right side in FIG. 5), as other metadata information about each object, the start time and end time of the object, the time of the frame that is the base of the representative image 1 (still image), etc. It is written together with text.

なお、図５では、代表画像ベースとして抽出された１枚の画像フレームにオブジェクトを強調する加工を施して静止状態を示す代表画像１としているが、オブジェクトを強調する加工を施すことなく、代表画像ベースとして抽出された１枚の画像フレームをそのまま静止状態を示す代表画像１としてもよい。 In FIG. 5, the processing is performed to emphasize an object on one image frame extracted as a representative image base to obtain a representative image 1 indicating a stationary state. However, the representative image 1 is not subjected to processing to emphasize the object. One image frame extracted as a base may be used as a representative image 1 indicating a still state as it is.

また、この映像表示システムの使い方に関する応用例としては、このように検索条件に合ったオブジェクトの代表画像のうちいずれかをユーザが選択し、図５の左下にある“映像表示実行”というボタンを押下すると、対応するオブジェクトが写っている期間（映像期間単位）の映像再生が開始されるように、アプリケーションを作成することなどが想定される。 As an application example regarding the use of this video display system, the user selects one of the representative images of the objects that meet the search conditions in this way, and clicks the “video display execution” button in the lower left of FIG. When the button is pressed, it is assumed that an application is created so that video playback in a period (video period unit) in which the corresponding object is captured is started.

また、代表画像のもう１つの生成例を図６に示す。この例では、オブジェクトの動きに注目してストロボ写真のような表示を行っている。すなわち、所定の時間間隔で複数のフレームを選択し、どれか１枚のフレームを背景画像として利用した上で、各フレームからオブジェクト領域部分を切り出した画像を対応する位置と大きさで、それぞれ撮影時刻順に順次上から重畳を繰り返し、最終的には図に示したストロボ写真のように移動軌跡が残る形で表示したものである。 FIG. 6 shows another example of generating a representative image. In this example, a strobe photograph is displayed by paying attention to the movement of the object. That is, a plurality of frames are selected at a predetermined time interval, and one of the frames is used as a background image, and then an image obtained by cutting out an object area portion from each frame is photographed at a corresponding position and size. Superimposition is repeated sequentially from the top in the order of time, and finally the display is made with the movement locus remaining as in the stroboscopic photograph shown in the figure.

この図６の例でも対象オブジェクト以外に２人の人が写っているが、これは背景画像の一部として写っているだけで、強調対象とはなっていない。一方、対象オブジェクトとなっている人に関しては、強調表示方法としてオブジェクト領域部分を枠で囲んだ画像をさらに順次重畳表示させるという方法を用いて区別している。 In the example of FIG. 6, two people are shown in addition to the target object, but this is only shown as a part of the background image and is not an object to be emphasized. On the other hand, a person who is a target object is distinguished by using a method in which an image in which an object region portion is surrounded by a frame is further superimposed and displayed as a highlighting method.

あるいは、全てのオブジェクト領域画像重畳を残したストロボ写真のようにするのではなく、背景画像の上にオブジェクト領域画像を対応する位置と大きさで重畳するという点では同じだが、図５に示した２つ目の表示方法（代表画像２）のように順次重畳と削除を繰り返し行い、対象オブジェクト領域に関して動画ふうになるような表示でもよい。これは、図５において２つ目の順次表示方法による代表画像２を、１つ目の静止状態を示す代表画像１の隣に並列表示するのではなく、オブジェクト領域画像の拡大をせずに１つ目の静止状態を示す代表画像１の上に位置を合わせて重畳するようにしたもの、という解釈と同等である。すなわち複数の表示方法を同時に行うという方法は、複数（この例では２つ）の代表画像を隣に並べるというだけではなく１つの代表画像として表示するようにしてもよい。 Alternatively, it is the same in that the object area image is superimposed on the background image at the corresponding position and size, instead of being a stroboscopic photograph in which all object area image superposition is left, as shown in FIG. As in the second display method (representative image 2), the display may be performed so that the target object area looks like a moving image by sequentially repeating superposition and deletion. This is because the representative image 2 by the second sequential display method in FIG. 5 is not displayed side by side next to the representative image 1 indicating the first stationary state, but the object region image is not enlarged. This is equivalent to the interpretation that the position is superimposed on the representative image 1 indicating the first stationary state. That is, the method of performing a plurality of display methods simultaneously may display not only a plurality of (two in this example) representative images side by side but also display them as one representative image.

このように、映像データ中で同一の物体が連続して写っている期間を１つの映像期間単位として扱い、映像期間単位内に写っている一連の同一物体を１つのオブジェクトとして映像期間単位と関連付け、さらにこの映像期間単位でオブジェクトが強調されるように（例えば、図４あるいは図５の代表画像１のように、オブジェクトが写っている位置を強調した）代表画像を生成して表示することにより、映像中の個々の物体が続けて写っているという意味のある期間の単位で映像を扱うことができ、かつその映像期間単位内における映像の内容把握が容易にできるという効果がある。
また、映像データの中から所望のものが写っている部分を探したいような場合においても、各オブジェクトが写っている映像期間単位での映像内容の把握ができるため、例えば撮影順に表示される各映像期間単位の中からでも所望の映像部分を絞り込むことが容易にできるという効果が得られる。 In this way, a period in which the same object is continuously captured in the video data is treated as one video period unit, and a series of the same object captured in the video period unit is associated with the video period unit as one object. Further, by generating and displaying a representative image so that the object is emphasized in this video period unit (for example, the position where the object is photographed is emphasized as in the representative image 1 of FIG. 4 or FIG. 5). Thus, there is an effect that the video can be handled in a meaningful period unit in which individual objects in the video are continuously captured, and the content of the video can be easily grasped within the video period unit.
In addition, even if you want to search for a part of the video data where the desired object is shown, you can grasp the video content in units of video periods in which each object is shown. There is an effect that a desired video portion can be easily narrowed down even within a period unit.

特に、図５や図６の順次表示部分のように、オブジェクトの動きを表現する代表画像を用いることによって、映像期間単位でのオブジェクトの挙動をより明確に把握することができるという効果がある。 In particular, there is an effect that the behavior of the object in units of video periods can be grasped more clearly by using a representative image representing the movement of the object as in the sequential display portions of FIGS.

また、本実施の形態では、検索条件が入力される検索条件入力部６０と、入力された検索条件に合致するメタデータを持つオブジェクトを抽出するメタデータ評価部７０とを備え、表示部８０は、メタデータ評価部７０で抽出されたオブジェクトと関連付けられた映像期間単位における代表画像の表示を行うので、ユーザは、映像データの中から所望のものが写っている部分を容易に検索することができ、この場合にも、各オブジェクトが写っている映像期間単位での映像内容の把握ができるため、列挙された複数の候補映像期間単位の中からさらに所望の映像部分を絞り込むことが容易にできるという効果が得られる。 Further, in the present embodiment, a search condition input unit 60 to which a search condition is input and a metadata evaluation unit 70 that extracts an object having metadata that matches the input search condition are provided. Since the representative image is displayed in the video period unit associated with the object extracted by the metadata evaluation unit 70, the user can easily search the video data portion where the desired image is shown. In this case as well, since it is possible to grasp the video content in units of video periods in which each object is shown, it is possible to easily narrow down a desired video portion from a plurality of listed candidate video period units. The effect is obtained.

また、図５や図６に示したような強調の視点が異なる複数の表示方法を組み合わせることによって、空間的には映像が映し出しているカメラ画角全体を見渡しながらオブジェクト部分を詳細に拡大して見ることができ、一方、時間軸的にはオブジェクト領域画像の順次表示によって映像期間全体の動きを見渡しながら１枚の静止画によってある時刻の映像を詳細に見ることができる。すなわち、映像期間単位に対して、全体と詳細を見比べることが可能になり、より映像内容の把握が容易になるという効果がある。 Further, by combining a plurality of display methods with different emphasis viewpoints as shown in FIGS. 5 and 6, the object portion can be enlarged in detail while looking over the entire camera angle of view in which the image is projected spatially. On the other hand, on the time axis, the video at a certain time can be seen in detail by one still image while looking over the movement of the whole video period by sequentially displaying the object region images. That is, it is possible to compare the whole with the details for each video period unit, and it is possible to more easily grasp the video content.

また、表示方法に関する条件が入力される表示条件入力部６０を備え、代表画像処理部５０は、入力された表示方法に関する条件に従って代表画像を生成するので、ユーザが表示方法に関する条件を指示することによって、どのような代表画像をベースにして、またどのような強調方法を使って代表画像を生成するかを変えられる。したがって、オブジェクトのどのような特徴に対して注目した表示を行うかを選択することができ、その結果、ユーザは、自分の目的に対応した映像内容の把握がより容易になるという効果がある。 In addition, the display condition input unit 60 for inputting conditions related to the display method is provided, and the representative image processing unit 50 generates a representative image in accordance with the input conditions related to the display method. Therefore, the user instructs the conditions related to the display method. Thus, it is possible to change which representative image is used as a base and which enhancement method is used to generate the representative image. Therefore, it is possible to select which feature of the object is displayed with attention, and as a result, there is an effect that the user can more easily grasp the video content corresponding to his / her purpose.

実施の形態２．
上記実施の形態１では、条件入力部６０を備え、ユーザが検索条件を指定するというアプリケーションを仮定したが、本発明は、必ずしもユーザが条件を指定して検索を行う形式のものだけに限って適用されるものではなく、以下にその例を示す。 Embodiment 2. FIG.
In the first embodiment, it is assumed that the application includes the condition input unit 60 and the user specifies the search condition. However, the present invention is not limited to the form in which the user specifies the condition and performs the search. The following is an example, not applicable.

図７は、本発明の実施の形態２による映像表示システムの機能ブロック構成を示し、映像表示システムの構成要素と動作手順を説明するための図である。以下に、この図を基に、本実施の形態による映像表示システムの構成と動作について説明するが、ここでは主に実施の形態１との相違点について説明する。
本実施の形態による映像表示システムでは、実施の形態１で説明したような条件入力部６０は備えていない。
また、メタデータ評価部７０の処理としても、実施の形態１で説明したような、与えられた条件に合うものだけを抽出するのではなく、各オブジェクトをそのメタデータに対して所定の条件を基にグループ分けを行う。すなわち、メタデータ中のある項目の内容に応じて各オブジェクト（映像期間単位）をグループ分けする。
さらに、表示部８０においては、グループ単位で各オブジェクトと関連付けられた映像期間単位における代表画像の表示を行う。すなわち、グループ分けされた内容毎に代表画像の列挙を行う。 FIG. 7 shows a functional block configuration of the video display system according to the second embodiment of the present invention, and is a diagram for explaining components and operation procedures of the video display system. Hereinafter, the configuration and operation of the video display system according to the present embodiment will be described with reference to this figure. Here, differences from the first embodiment will be mainly described.
The video display system according to the present embodiment does not include the condition input unit 60 as described in the first embodiment.
Also, as the processing of the metadata evaluation unit 70, instead of extracting only those that meet the given conditions as described in the first embodiment, each object is set to a predetermined condition for the metadata. Based on grouping. That is, each object (video period unit) is grouped according to the contents of a certain item in the metadata.
Further, the display unit 80 displays representative images in video period units associated with each object in group units. That is, representative images are listed for each grouped content.

この場合、例えばメタデータ評価部７０において、“右向きに動いた物体”、“左向きに動いた物体”、“左右方向の移動は無い物体”というように３つのグループにオブジェクトを分類し、その結果を表示部８０に伝える。表示部８０においては、図８に画面表示の一例を示すように、“右向き移動”、“左向き移動”、“左右移動少”という３グループに分類して表示を行う。この場合の代表画像の作り方については、例えば図５の場合と同様である。 In this case, for example, the metadata evaluation unit 70 classifies the objects into three groups such as “an object moved rightward”, “an object moved leftward”, and “an object that does not move in the left / right direction”. Is transmitted to the display unit 80. In the display unit 80, as shown in the example of the screen display in FIG. 8, the display is performed by classifying into three groups of “move rightward”, “move leftward”, and “little leftward / rightward movement”. The method of creating the representative image in this case is the same as in FIG.

このような形態の映像表示システムにおいても、実施の形態１で述べたものと同様に、映像中の個々の物体が続けて写っているという意味のある期間の単位で映像を扱うことができ、かつ映像の内容把握が容易にできるという基本的な効果が得られる。
これに加えて、このような形態の場合、ユーザが特に検索条件の入力という操作を行わなくても、映像表示システム側が自動的に映像内容を解析し、しかも所定の条件を基に分けられたグループ毎に提示してくれるので、所望の映像部分を絞り込むことが容易にできるという効果も得られる。 In the video display system of such a form as well as the one described in the first embodiment, the video can be handled in a unit of a meaningful period in which individual objects in the video are continuously captured, In addition, the basic effect of easily grasping the contents of the video can be obtained.
In addition to this, in the case of such a form, the video display system automatically analyzes the video content without the user performing an operation of inputting the search condition in particular, and is divided based on a predetermined condition. Since it is presented for each group, it is possible to easily narrow down a desired video portion.

実施の形態３．
またさらに別の形態の例としては、複数のオブジェクト（映像期間単位）の代表画像をリストして表示するような場合だけでなく、１つのオブジェクト（映像期間単位）に対応する１つの代表画像を表示するだけといったようなシンプルなアプリケーションであってもよい。 Embodiment 3 FIG.
Further, as another example, not only a case where a list of representative images of a plurality of objects (video period units) is displayed, but one representative image corresponding to one object (video period units) is displayed. It may be a simple application that only displays.

図９は、本発明の実施の形態３による映像表示システムの機能ブロック構成を示し、映像表示システムの構成要素と動作手順を説明するための図である。以下に、この図を基に、本実施の形態による映像表示システムの構成と動作について説明するが、ここでは主に実施の形態１との相違点について説明する。
本実施の形態による映像表示システムでは、実施の形態１で説明したような条件入力部６０およびメタデータ評価部７０は備えていない。 FIG. 9 shows a functional block configuration of a video display system according to Embodiment 3 of the present invention, and is a diagram for explaining components and operation procedures of the video display system. Hereinafter, the configuration and operation of the video display system according to the present embodiment will be described with reference to this figure. Here, differences from the first embodiment will be mainly described.
The video display system according to the present embodiment does not include the condition input unit 60 and the metadata evaluation unit 70 as described in the first embodiment.

このように構成されたものにおいて、例えば、オブジェクト処理部３０において映像中から１つのオブジェクトを認識しオブジェクトの特徴をメタデータとして抽出した時点で、代表画像処理部５０において１つの代表画像を生成し、表示部８０において表示するといった単純な動作が考えられる。図１０にその画面表示の一例を示す。この場合の代表画像の作り方については、例えば図５の場合と同様である。 For example, when the object processing unit 30 recognizes one object from the video and extracts the feature of the object as metadata, the representative image processing unit 50 generates one representative image. A simple operation of displaying on the display unit 80 can be considered. FIG. 10 shows an example of the screen display. The method of creating the representative image in this case is the same as in FIG.

このような形態の映像表示システムにおいても、実施の形態１で述べたものと同様に、映像中の個々の物体が続けて写っているという意味のある期間の単位で映像を扱うことができ、かつ映像の内容把握が容易にできるという基本的な効果が得られる。
これに加えて、このような形態の場合、ユーザが特に検索条件の入力という操作を行わなくても、映像表示システム側が自動的に映像内容を解析して提示してくれるという効果も得られる。 In the video display system of such a form as well as the one described in the first embodiment, the video can be handled in a unit of a meaningful period in which individual objects in the video are continuously captured, In addition, the basic effect of easily grasping the contents of the video can be obtained.
In addition to this, in the case of such a form, there is also an effect that the video display system side automatically analyzes and presents the video content even if the user does not particularly perform an operation of inputting search conditions.

実施の形態４．
以上に示した各実施の形態では、代表画像中で強調した表示とする対象は、その映像期間単位内の最初から最後まで連続して写っている、映像期間単位に対応した対象オブジェクトのみであった。しかし、その期間内に１つしか物体が検出されなかった場合を除けば、オブジェクト処理部３０において、その他の物体に対しても別途オブジェクト化がなされており、その特徴量はメタデータとして抽出されている。
そこで、本実施の形態では、各オブジェクトに対応した映像期間単位で映像を扱いつつ、対象オブジェクト以外の物体（映像期間単位内の一部の映像期間に写っているが、当該映像期間単位とは異なる映像期間単位と関連付けられたオブジェクト）に関しても強調表示を行うようにしている。 Embodiment 4 FIG.
In each of the embodiments described above, the target to be highlighted in the representative image is only the target object corresponding to the video period unit that is continuously captured from the beginning to the end of the video period unit. It was. However, except for the case where only one object is detected within that period, the object processing unit 30 separately forms other objects, and the feature amount is extracted as metadata. ing.
Therefore, in this embodiment, an object other than the target object (shown in a part of the video period within the video period unit is handled while handling the video in the video period unit corresponding to each object. Highlighting is also performed on objects associated with different video period units.

図１１に、本実施の形態による映像表示システムにおける代表画像の表示例を示す。比較として実施の形態１における図５では、静止状態を示す代表画像１として、１枚の代表画像ベースの上に対象オブジェクトの領域を囲む外接四角形の枠のみを重畳表示して対象オブジェクトを強調する例を示したが、図１１では、対象オブジェクト以外に検出されている他の物体（当該映像期間単位とは異なる映像期間単位と関連付けられたオブジェクト）に対しても、その領域を囲む外接四角形の枠を重畳表示させることにより強調している。 FIG. 11 shows a display example of a representative image in the video display system according to the present embodiment. As a comparison, in FIG. 5 according to the first embodiment, as a representative image 1 indicating a stationary state, only a circumscribed square frame surrounding the region of the target object is superimposed on a single representative image base to emphasize the target object. Although an example has been shown, in FIG. 11, other objects (objects associated with a video period unit different from the video period unit) detected in addition to the target object also have a circumscribed rectangle surrounding the area. Emphasis is given by superimposing the frame.

図１２に、本実施の形態による映像表示システムにおける代表画像の別の表示例を示す。比較として実施の形態１における図６では、１枚の代表画像ベースを背景画像とした上に対象オブジェクトに関してのみ複数のフレームから該当するオブジェクト領域を切り出して順次重畳表示させていたが、図１２では、対象オブジェクト以外の物体（当該映像期間単位とは異なる映像期間単位と関連付けられたオブジェクト）に関しても、当該映像期間単位の範囲内で写っている領域を切り出して順次重畳表示させている。 FIG. 12 shows another display example of the representative image in the video display system according to the present embodiment. For comparison, in FIG. 6 according to the first embodiment, a single representative image base is used as a background image, and corresponding object areas are cut out from a plurality of frames only for the target object, and are sequentially superimposed and displayed. As for an object other than the target object (an object associated with a video period unit different from the video period unit), a region captured within the range of the video period unit is cut out and sequentially superimposed.

また、このような、各映像期間単位に対応するオブジェクト以外の物体に対する強調表現の有無に関しては、全ての物体を同じ条件で強調する方法から全く強調しない方法までの間をスケーラブルに扱ってもよい。例えば図１１では、対象オブジェクトである左側の人は太い枠線で囲んであるのに対して、右側の人は細い枠線で囲んであり、強調の度合いを変えてある。また、図１２のような例だと、対象オブジェクト以外のものに関しては重畳表示する画像の時間間隔を長くするなどして差を付けることができる。 Further, regarding the presence / absence of emphasis expression for objects other than the object corresponding to each video period unit, a range from a method of emphasizing all objects under the same condition to a method of not emphasizing at all may be handled in a scalable manner. . For example, in FIG. 11, the left person as the target object is surrounded by a thick frame line, whereas the right person is surrounded by a thin frame line, and the degree of emphasis is changed. Further, in the example as shown in FIG. 12, it is possible to make a difference with respect to objects other than the target object by increasing the time interval of images to be superimposed and displayed.

あるいは、１つの映像期間単位内に写っている複数の物体間の扱いについても、対象オブジェクトとそれ以外という２つに分けるだけでなく、対象オブジェクトに近いか遠いか、対象オブジェクトと画面上での交差が有るか無いか、といったような対象オブジェクトとの関連度合いの強さによって段階的なグループに分け、強調の仕方に差別化を行うようにしてもよい。例えば、対象オブジェクトについては最も強く強調した表示を行い、関連度合いの高いグループの物体に対してのみ比較的弱めの強調を行い、関連度合いの低いグループの物体に対しては全く強調を行わない、場合によっては関連度合いの低いグループの物体に対してはむしろ積極的に代表画像ベースとなる背景画像からも消去する、といったような差別化を行ってもよい。 Alternatively, the handling between multiple objects in one video period unit is not only divided into the target object and the other two, but also whether it is near or far from the target object, It may be divided into stepwise groups according to the strength of the degree of association with the target object, such as whether or not there is an intersection, and differentiated in the way of emphasis. For example, the target object is displayed with the strongest emphasis, relatively weak emphasis is given only to the object of the highly related group, and no emphasis is given to the object of the less relevant group, In some cases, it may be differentiated such that a group of objects having a low degree of association is positively deleted from a background image that is a representative image base.

本実施の形態のように、対象オブジェクト以外の物体についても強調表示を行うことによって、複数の物体が写っている映像を扱う場合に、映像内容の把握をより容易にすることができるという効果が得られる。
また、対象オブジェクト以外に複数の物体が写っている場合に、これらの物体に対しては強調度合いを可変にすることにより、ユーザの目的に応じて、周辺の複数の物体に対して注目しているオブジェクトの挙動を際立たせた表示にすることも可能であるという効果が得られる。 As in the present embodiment, by highlighting objects other than the target object, it is possible to make it easier to grasp the contents of the video when handling a video that includes a plurality of objects. can get.
In addition, when a plurality of objects other than the target object are captured, the degree of emphasis on these objects can be varied to pay attention to a plurality of surrounding objects according to the user's purpose. It is possible to obtain a display that emphasizes the behavior of the existing object.

実施の形態５．
実施の形態１において、ユーザが検索条件を与えるということは、ユーザは映像データの中でその検索条件に関連する事項に対して関心があるものと思われる。そこで、表示の仕方という条件を別途入力しなくても、この検索条件に関係のある情報に関して強調表示をするようにしたものが本実施の形態である。 Embodiment 5 FIG.
In Embodiment 1, when a user gives a search condition, it seems that the user is interested in matters related to the search condition in video data. Thus, in the present embodiment, information related to the search condition is highlighted without separately inputting the condition of the display method.

例えば、オブジェクトが持つメタデータの種類として、オブジェクトの移動軌跡情報の他に、その人の推定年齢や性別、個人識別番号などを保持しているとする。
このメタデータ保持状況において、検索条件として、例えば“オブジェクトの平均移動方向が右向きのもの”という条件がユーザから入力されたとする。この場合、ユーザはオブジェクトの移動方向という動きに関する情報に興味があると考えられるため、検索結果としては、平均移動方向が右向きのオブジェクトのみを選択して代表画像を一覧表示すると共に、各代表画像については移動軌跡に関する情報が明示されるような表示を行う。すなわち、例えば移動軌跡の折れ線や曲線を代表画像の中に重畳表示させるとか、合計移動量や平均移動速度といった情報をアイコンやテキスト情報としてそのオブジェクト領域の近くに重畳表示させるとか、などの加工を代表画像処理部５０によって行うような例が挙げられる。もちろん、合計移動量や平均移動速度といった情報をアイコンやテキスト情報として代表画像の周辺に追加表示することも考えられる。 For example, it is assumed that the type of metadata possessed by an object includes an estimated age, sex, personal identification number, etc. of the person in addition to the movement trajectory information of the object.
In this metadata holding situation, it is assumed that, for example, a condition that “the average moving direction of the object is rightward” is input from the user as a search condition. In this case, since it is considered that the user is interested in information regarding the movement direction of the object, the search results are displayed by selecting only the objects whose average movement direction is rightward and displaying a list of representative images. Is displayed in such a manner that information on the movement trajectory is clearly indicated. That is, for example, processing such as displaying a polygonal line or curve of a movement trajectory in a representative image or displaying information such as a total movement amount or average movement speed as an icon or text information near the object area is performed. An example that is performed by the representative image processing unit 50 is given. Of course, information such as the total movement amount and the average movement speed may be additionally displayed around the representative image as icons or text information.

一方、上記と同様のメタデータ保持状況において、検索条件として、例えば“年齢が２０代の人”という条件がユーザから入力されたとする。この場合、ユーザはオブジェクトの動きに関する情報よりもどのような人であるかという個人情報に興味があると考えられる。従って、この場合は、該当する年齢層のオブジェクトのみを選択して代表画像を一覧表示すると共に、その各代表画像については対象オブジェクトの個人情報が明示されるような表示を行う。すなわち、例えば推定年齢や性別、個人識別番号などの情報を、テキストやアイコンを用いてそのオブジェクト領域の近くに重畳表示させる、などの加工を代表画像処理部５０によって行うような例が挙げられる。もちろん、オブジェクトの推定年齢や性別、個人識別番号などの情報を、テキストやアイコンを用いて代表画像の周辺に追加表示することも考えられる。 On the other hand, in the metadata holding situation similar to the above, it is assumed that, for example, a condition that “age is a person in their 20s” is input from the user as a search condition. In this case, it is considered that the user is more interested in personal information about what kind of person the person is than information on the movement of the object. Accordingly, in this case, only the corresponding age group objects are selected to display a list of representative images, and for each representative image, display is performed so that the personal information of the target object is clearly indicated. That is, for example, the representative image processing unit 50 performs processing such as superimposing and displaying information such as estimated age, sex, and personal identification number near the object area using text and icons. Of course, information such as the estimated age, sex, and personal identification number of the object may be additionally displayed around the representative image using text and icons.

本実施の形態のように、ユーザから入力された検索条件を基にして、代表画像で強調すべき内容を自動的に判断して代表画像を生成することにより、ユーザが検索条件の他に表示方法に関する条件を入力する手間が省けるという効果が得られる。しかも、ユーザが関心のある内容に関連する情報が併せて表示されるため、所望の映像を探す目的を持って検索を行ったユーザにとって、より一層意味のある内容把握が可能となるという効果が得られる。 As in the present embodiment, based on the search condition input by the user, the contents to be emphasized in the representative image are automatically determined and the representative image is generated, so that the user can display the search condition in addition to the search condition. There is an effect that it is possible to save time and effort to input the conditions regarding the method. In addition, since information related to the content that the user is interested in is also displayed, it is possible for the user who has searched for the purpose of searching for a desired video to understand the content more meaningfully. can get.

実施の形態６．
実施の形態５においては、ユーザが与えた検索に関する条件に関連するオブジェクトの特徴情報を明示的に表示する例を示したが、本実施の形態においては、その逆に、入力された検索条件に関係の無いメタデータ、すなわち入力された検索条件以外のメタデータに関して表示する。検索処理によって得られた検索結果では、このユーザから入力された検索条件に関する情報表示は全て満たされているものばかりのはずである。従って、オブジェクトに関する特徴をメタデータ内に多数持っているような場合、この検索結果に関する情報の追加表示はあえて必要では無く、むしろそれ以外の特徴情報を追加表示してもらう方が便利な場合も考えられる。 Embodiment 6 FIG.
In the fifth embodiment, an example is shown in which the feature information of the object related to the search-related condition given by the user is explicitly displayed. However, in the present embodiment, conversely, the input search condition is changed to the input search condition. Display for unrelated metadata, that is, metadata other than the input search condition. In the search result obtained by the search process, all the information displays related to the search condition input by this user should be satisfied. Therefore, when there are many features related to the object in the metadata, it is not necessary to display additional information about the search results. Rather, it may be more convenient to have other feature information displayed additionally. Conceivable.

例えば、オブジェクトが持つメタデータの種類として、オブジェクトの移動軌跡情報の他に、その人の推定年齢や性別、個人識別番号などを保持しているとする。
このメタデータ保持状況において、検索条件として、例えばオブジェクトの移動方向に関する条件を入力した場合、結果表示される代表画像中の対象オブジェクトは全て移動方向に関しては一定の基準を満たしているはずである。従って、オブジェクトに関するそれ以外の特徴として、例えば、オブジェクトの推定年齢や性別、個人識別番号といった個人情報を、テキストやアイコンを用いてそのオブジェクト領域の近くに重畳表示させる、などの加工を代表画像処理部５０によって行う例が考えられる。もちろん、オブジェクトの推定年齢や性別、個人識別番号といった個人情報を、テキストやアイコンを用いて代表画像の周辺に追加表示することも考えられる。 For example, it is assumed that the type of metadata possessed by an object includes an estimated age, sex, personal identification number, etc. of the person in addition to the movement trajectory information of the object.
In this metadata holding situation, for example, when a condition related to the moving direction of an object is input as a search condition, all target objects in the representative image displayed as a result should satisfy a certain standard regarding the moving direction. Therefore, as other features related to the object, for example, processing such as superimposing and displaying personal information such as the estimated age, sex, and personal identification number of the object in the vicinity of the object area using text and icons is representative image processing. An example performed by the unit 50 can be considered. Of course, personal information such as the estimated age, sex, and personal identification number of the object may be additionally displayed around the representative image using text and icons.

一方、上記と同様のメタデータ保持状況において、検索条件として、例えば“年齢が２０代の人”という条件がユーザから入力されたとする。この場合、結果表示される代表画像中の対象オブジェクトは全て推定年齢の条件を満たしているものばかりのはずである。従って、オブジェクトに関するそれ以外の情報として、むしろ移動軌跡に関する情報を重畳表示したり、オブジェクトの個人情報に関する表示を行う場合でも推定年齢以外の推定性別や個人識別番号の情報だけに絞ってテキストやアイコンを用いてそのオブジェクト領域の近くに重畳表示させたりする、などの加工を代表画像処理部５０によって行うような例が挙げられる。もちろん、オブジェクトの推定年齢以外の推定性別や個人識別番号などを、テキストやアイコンを用いて代表画像の周辺に追加表示することも考えられる。 On the other hand, in the metadata holding situation similar to the above, it is assumed that, for example, a condition that “age is a person in their 20s” is input from the user as a search condition. In this case, all the target objects in the representative image displayed as a result should satisfy only the estimated age condition. Therefore, as other information related to the object, rather than superimposing information related to the movement trajectory or displaying information related to the personal information of the object, text and icons are limited to only information on the estimated gender and personal identification number other than the estimated age. An example in which the representative image processing unit 50 performs processing such as superimposing and displaying near the object area using the. Of course, an estimated gender other than the estimated age of the object, a personal identification number, and the like may be additionally displayed around the representative image using text and icons.

本実施の形態のように、ユーザから入力された検索条件に関するもの以外のオブジェクトの特徴情報を併せて表示することによって、オブジェクトが多数の特徴をメタデータとして持っているような場合、多数の情報を表示するとかえって画面が見にくくなることがあるが、あえて表示しなくても検索結果によってわかっている内容を減らすことができ、より多くの情報をよりシンプルな表示方法で表示することができるという効果が得られる。 When the object has many features as metadata by displaying the feature information of the objects other than those related to the search condition input by the user as in the present embodiment, a lot of information If you display, the screen may be difficult to see, but the content that is known by the search results can be reduced without displaying it, and more information can be displayed with a simpler display method. Is obtained.

実施の形態７．
以上に示した各実施の形態においては、全て、表示すべきオブジェクトが複数ある場合は、代表画像をサムネイル表示のようなリスト形式で表示するものであった。これに対して、本実施の形態おいては、例えば首振りを行わない固定カメラの映像など、写っている映像の背景が基本的には変わらないような映像の場合について、１枚の代表画像ベースの上に複数のオブジェクト領域画像を重畳表示する。 Embodiment 7 FIG.
In each of the embodiments described above, when there are a plurality of objects to be displayed, the representative image is displayed in a list format such as a thumbnail display. On the other hand, in the present embodiment, one representative image in the case where the background of the captured image is basically unchanged, such as an image of a fixed camera that does not swing. A plurality of object area images are superimposed and displayed on the base.

図１３は、本発明の実施の形態７による映像表示システムの機能ブロック構成を示し、映像表示システムの構成要素と動作手順を説明するための図である。以下に、この図を基に、本実施の形態による映像表示システムの構成と動作について説明するが、ここでは主に実施の形態１との相違点について説明する。
本実施の形態による映像表示システムは、構成としては、表示部８０の機能について特に背景画像生成部８１とオブジェクト合成部８２とからなる点が異なり、動作に関しては、表示部８０に加えて代表画像処理部５０における処理について一部異なる点がある。 FIG. 13 shows a functional block configuration of a video display system according to Embodiment 7 of the present invention, and is a diagram for explaining components and operation procedures of the video display system. Hereinafter, the configuration and operation of the video display system according to the present embodiment will be described with reference to this figure. Here, differences from the first embodiment will be mainly described.
The video display system according to the present embodiment is different in that the function of the display unit 80 is composed of a background image generation unit 81 and an object composition unit 82, and the operation is a representative image in addition to the display unit 80. There are some differences in processing in the processing unit 50.

まず、代表画像処理部５０において、複数のオブジェクトに対する代表画像を生成する際、単に代表画像ベースに対してオブジェクト領域の周囲を囲む外枠を重畳するというような追記処理だけでなく、オブジェクト領域内の画像データを一部切り出すような抽出処理が必要である。また、オブジェクト領域の強調表示を付加する前の代表画像ベースの静止画データについても、別途、表示部８０に渡すようにする。 First, in the representative image processing unit 50, when generating representative images for a plurality of objects, not only an additional recording process such as superimposing an outer frame surrounding the object region on the representative image base, Extraction processing that cuts out a part of the image data is necessary. Further, the representative image-based still image data before the object region highlighting is added is also separately transferred to the display unit 80.

表示部８０においては、表示部８０内における背景画像生成部８１において、代表画像処理部５０によって生成された複数の代表画像の中から１つを選び、さらにこの代表画像の代表画像ベースである静止画データを利用して、背景画像として指定する。
次に、表示部８０内のオブジェクト合成部８２において、各オブジェクトに対するオブジェクト領域の強調表示を、上記背景画像生成部８１において抽出した背景画像に対して順次加えていき、これをディスプレイ等の表示装置に対して表示する。 In the display unit 80, the background image generation unit 81 in the display unit 80 selects one of the plurality of representative images generated by the representative image processing unit 50, and further, the still image that is the representative image base of the representative image. Designated as a background image using image data.
Next, in the object composition unit 82 in the display unit 80, the highlight display of the object area for each object is sequentially added to the background image extracted by the background image generation unit 81, and this is added to a display device such as a display. Is displayed.

図１４に、本実施の形態における画面表示の一例を示す。ある映像データに対して、ユーザが、検索時刻の範囲を指定し、さらに、オブジェクトの平均移動方向が右向きのものという条件を指定し、その条件に合った５個のオブジェクトを表示しているという状態を表している。
これら各々のオブジェクトが写っている映像は全て背景が同じとなる状態であるため、各オブジェクトの代表画像ベースとなる静止画のうち１つを背景画像として流用している。その背景画像の上に、各々のオブジェクトの特徴として、各オブジェクトの動きが一目でわかるような移動軌跡情報を明示することによる強調表現を重畳している。
具体的には、実施の形態１において示した図６のようなポラロイド写真的な表現でもよいが、ここでは多数のオブジェクトを同一の画面に重畳する場合には見えにくくなる場合もあることも考慮して、各オブジェクト領域の画像重畳は代表の１枚だけとし、残りの軌跡情報に関しては各オブジェクト領域が複数フレーム間に渡って通過した重心位置を繋いだ曲線の矢印を重畳することで表現している。 FIG. 14 shows an example of a screen display in the present embodiment. For certain video data, the user specifies a search time range, specifies a condition that the average moving direction of the object is rightward, and displays five objects that meet the condition. Represents a state.
Since all the images in which these objects are reflected are in the same background, one of the still images serving as the representative image base of each object is used as the background image. On the background image, as a feature of each object, an emphasis expression by clearly indicating movement trajectory information so that the movement of each object can be seen at a glance is superimposed.
Specifically, it may be a polaroid photographic expression as shown in FIG. 6 shown in the first embodiment, but here it is also considered that a large number of objects may be difficult to see when superimposed on the same screen. Thus, only one representative image is superimposed on each object area, and the remaining trajectory information is expressed by superimposing a curved arrow that connects the positions of the centroids that each object area has passed across multiple frames. ing.

以上で述べた例は首振りなどを行わない固定カメラという前提で説明したが、例えばカメラの向きが左右に動くような場合でも、撮影方向の変化に伴う背景位置の変化を対応付け、横長のパノラマ写真のような画像を合成する機能を持っている場合、これを各オブジェクトの共通の背景画像として、上記の例と類似の表示が可能である。図１５はこの場合の代表画像の例である。 The example described above is described on the assumption that the camera is not fixed, but for example, even when the camera moves to the left or right, the change in the background position associated with the change in the shooting direction is associated, In the case of having a function for synthesizing an image such as a panoramic picture, it is possible to display the image similar to the above example using this as a common background image for each object. FIG. 15 shows an example of a representative image in this case.

本実施の形態のように、複数のオブジェクトの代表画像を表示する際、サムネイル的に並べて表示するのではなく１枚の背景画像上に重畳して表示することにより、すなわち、同一背景を持つ複数の映像期間単位における代表画像を、共通の背景画像上に各映像期間単位と関連付けられた各オブジェクトが重畳して配置されるように、表示することにより、列挙すべきオブジェクトの代表画像が表示を行う画面領域内に収まりきらない場合には画面スクロールを使って残りのリストを表示し直さなければならないという不便を解消することができるという効果が得られる。
また、別の効果として、１つの背景画像上に複数のオブジェクトを重畳表示することにより、どの位置に多数のオブジェクトが偏って出現しているのか、あるいは多数のオブジェクトと極端に異なる位置や移動軌跡を持つオブジェクトが無いか、あるいはどのオブジェクトとどのオブジェクトとがお互いに近い位置に現れたのか、といった複数オブジェクト間の空間上の位置関係をより明確に把握することが可能になるという効果が得られる。 When displaying representative images of a plurality of objects as in the present embodiment, they are displayed in a superimposed manner on a single background image instead of being displayed side by side as thumbnails, that is, a plurality of images having the same background. By displaying the representative images in the video period units so that the objects associated with the video period units are superimposed on the common background image, the representative images of the objects to be listed are displayed. When it does not fit within the screen area to be performed, the inconvenience that the remaining list must be displayed again using screen scrolling can be solved.
As another effect, by superimposing a plurality of objects on one background image, where many objects appear in a biased position, or positions and movement trajectories that are extremely different from many objects. It is possible to obtain a clearer understanding of the spatial positional relationship between multiple objects, such as whether there is no object with or which object and which object appeared close to each other. .

実施の形態１による映像表示システムの構成要素と動作手順を説明するための図である。FIG. 3 is a diagram for explaining components and operation procedures of the video display system according to the first embodiment. 実施の形態１に係り、メタデータの一例を示す図である。FIG. 10 is a diagram illustrating an example of metadata according to the first embodiment. 実施の形態１による映像表示システムの動作を説明するためのフローチャートである。3 is a flowchart for explaining the operation of the video display system according to the first embodiment. 実施の形態１による映像表示システムについてより詳細に説明するための図であるFIG. 3 is a diagram for explaining in more detail the video display system according to the first embodiment. 実施の形態１に係り、画面表示の一例を示す図である。FIG. 6 is a diagram illustrating an example of a screen display according to the first embodiment. 実施の形態１に係り、代表画像の別の表示例を示す図である。FIG. 10 is a diagram illustrating another display example of a representative image according to the first embodiment. 実施の形態２による映像表示システムの構成要素と動作手順を説明するための図である。FIG. 10 is a diagram for explaining components and operation procedures of a video display system according to a second embodiment. 実施の形態２に係り、画面表示の一例を示す図である。FIG. 10 is a diagram illustrating an example of a screen display according to the second embodiment. 実施の形態３による映像表示システムの構成要素と動作手順を説明するための図である。FIG. 10 is a diagram for explaining components and operation procedures of a video display system according to a third embodiment. 実施の形態３に係り、画面表示の一例を示す図である。FIG. 16 is a diagram illustrating an example of a screen display according to the third embodiment. 実施の形態４に係り、代表画像の表示例を示す図である。FIG. 16 is a diagram illustrating a display example of a representative image according to the fourth embodiment. 実施の形態４に係り、代表画像の別の表示例を示す図である。FIG. 20 is a diagram illustrating another display example of a representative image according to the fourth embodiment. 実施の形態７による映像表示システムの構成要素と動作手順を説明するための図である。FIG. 10 is a diagram for explaining components and operation procedures of a video display system according to a seventh embodiment. 実施の形態７に係り、画面表示の一例を示す図である。FIG. 38 is a diagram illustrating an example of a screen display according to the seventh embodiment. 実施の形態７に係り、代表画像の別の表示例を示す図である。FIG. 38 is a diagram illustrating another display example of a representative image according to the seventh embodiment.

Explanation of symbols

１０映像入力部、２０映像記憶部、３０オブジェクト処理部、３１オブジェクト抽出部、３２メタデータ抽出部、４０メタデータ記憶部、５０代表画像処理部、５１代表画像ベース選択部、５２オブジェクト強調部、６０条件入力部、７０メタデータ評価部、８０表示部、８１背景画像生成部、８２オブジェクト合成部。
10 video input unit, 20 video storage unit, 30 object processing unit, 31 object extraction unit, 32 metadata extraction unit, 40 metadata storage unit, 50 representative image processing unit, 51 representative image base selection unit, 52 object enhancement unit, 60 condition input unit, 70 metadata evaluation unit, 80 display unit, 81 background image generation unit, 82 object composition unit.

Claims

A video input unit for inputting video data;
A video storage unit for storing input video data;
Individual objects appearing in the video data are detected, and the period in which the same object is shown over a plurality of image frames is recognized and regarded as one video period unit. An object processing unit that associates a series of identical objects as one object with the video period unit, and extracts features of the object as metadata;
In each video period unit, extracting at least one image frame satisfying a predetermined standard from the video data stored in the video storage unit based on the metadata as a representative image base in the video period unit, A representative image processing unit that processes the representative image base so that an object associated with the video period unit is emphasized, and generates a representative image in the video period unit;
A video display system comprising: a display unit that displays the representative image.

In the video display system according to claim 1,
A metadata evaluation unit that performs grouping on the metadata of each object based on a predetermined condition,
The video display system, wherein the display unit displays a representative image in a video period unit associated with each object in the group unit.

In the video display system according to claim 1,
A search condition input part for inputting a search condition;
A metadata evaluation unit that extracts an object having metadata that matches the input search condition,
The video display system, wherein the display unit displays a representative image in a video period unit associated with the extracted object.

In the video display system according to any one of claims 1 to 3,
The representative image processing unit generates a representative image so that an object associated with a video period unit different from the video period unit is emphasized although it is captured in a part of the video period within the video period unit. A video display system characterized by that.

In the video display system according to any one of claims 1 to 4,
The representative image processing unit extracts a plurality of image frames as a representative image base in a video period unit, and uses these image frames to display a representative image indicating a temporal change of an object associated with the video period unit. A video display system characterized by generating.

In the video display system according to claim 5,
The generation of the representative image indicating the temporal change of the object is performed by cutting out a part including at least a part of the object region associated with the video period unit in each image frame, and cutting the cut out object part along the time axis. An image display system characterized by performing processing so as to display sequentially.

In the video display system according to claim 5,
The representative image processing unit extracts a single image frame satisfying a predetermined criterion as a representative image base in the video period unit, and performs a process of enhancing the object associated with the video period unit. And a representative image showing a temporal change of the object according to claim 5,
The video display system, wherein the display unit displays both a representative image showing the stationary state and a representative image showing a temporal change of the object.

In the video display system according to any one of claims 1 to 7,
Provided with a display condition input part for inputting conditions related to the display method,
The video display system, wherein the representative image processing unit generates a representative image in accordance with the input condition relating to the display method.

In the video display system according to any one of claims 3 to 8,
The video image display system characterized in that the representative image processing unit superimposes metadata related to an input search condition when processing the representative image base so that the object is emphasized.

In the video display system according to any one of claims 3 to 8,
The video display system, wherein the representative image processing unit superimposes metadata other than the input search condition when processing the representative image base so that the object is emphasized.

In the video display system according to any one of claims 1 to 10,
The display unit displays a representative image in a plurality of video period units having the same background so that each object associated with each video period unit is superimposed on one background image. Characteristic video display system.