JP5437928B2

JP5437928B2 - METADATA ADDING DEVICE, VIDEO SEARCH DEVICE, METHOD, AND PROGRAM

Info

Publication number: JP5437928B2
Application number: JP2010142433A
Authority: JP
Inventors: 敬之須山; 泰恵岸野; 卓也前川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-06-23
Filing date: 2010-06-23
Publication date: 2014-03-12
Anticipated expiration: 2030-06-23
Also published as: JP2012008683A

Description

本発明は、センサおよびセンサネットワークの技術を用いて映像に言語化されたメタデータを付与するメタデータ付与装置、および検索したい事象をキーワードもしくはセンサデータで表現した検索鍵を入力として、当該事象を含む映像を検索する映像検索装置に関するものである。 The present invention provides a metadata adding apparatus that adds verbalized metadata to video using a sensor and sensor network technology, and a search key that expresses an event to be searched with a keyword or sensor data as an input. The present invention relates to a video search apparatus for searching for a video including the video.

映像を検索する仕組みとして、これまで様々な方式が考案されている。その代表的な方式の一つは、映像区間そのものを検索鍵（目的信号）として別の映像集合（蓄積信号）から同一、もしくは類似した映像を検索する方法である（非特許文献１参照）。このような方式の場合は処理の高速化を実現するため、映像を直接比較するのではなく、目的信号と蓄積信号とから特徴の抽出を行い、特徴量を比較することで検索することが多い。また、うまく特徴を抽出することで、歪みやノイズに対してある程度の耐性を持たせることができる。 Various schemes have been devised so far for searching video. One of the typical methods is a method of searching for the same or similar video from another video set (accumulated signal) using the video section itself as a search key (target signal) (see Non-Patent Document 1). In the case of such a method, in order to realize high-speed processing, it is often the case that the feature is extracted from the target signal and the accumulated signal and the search is performed by comparing the feature amount, instead of directly comparing the videos. . In addition, by extracting features well, it is possible to have a certain degree of resistance against distortion and noise.

他に考えられる方式は、画像処理技術などを用いてコンテンツの内容を解釈することで映像を検索する方法である。簡易的には音声の情報によりシーンの盛り上がり状況を判別する方式がある。人手でコンテンツの内容を確認することで、コンテンツに対して場面毎にメタデータを付与して検索する方式などが考えられる。 Another possible method is a method of searching for a video by interpreting the content of the content using an image processing technique or the like. A simple method is to discriminate a scene's excitement from audio information. A method of searching by adding metadata for each scene to the content by manually checking the contents of the content can be considered.

黒住隆行，永野秀尚，柏野邦夫，“実環境で収録された映像断片をキーとする一致映像探索”，電子情報通信学会論文誌Ｄ，Ｖｏｌ．Ｊ９０−Ｄ，Ｎｏ．８，ｐｐ．２２２３−２２３１，２００７Takayuki Kurosumi, Hidenao Nagano, Kunio Kanno, “Search for Matched Video Using Video Fragments Recorded in Real Environment”, IEICE Transactions D, Vol. J90-D, No. 8, pp. 2223-2231, 2007

非特許文献１に開示された、映像そのものを検索鍵（目的信号）として検索する方式では、映像の表面的な比較を行っており、内容については解釈を行っていない。そのため、映像の中で何が起きているかなどと言う内容に踏み込んだ映像検索を行うことができないという問題点があった。
また、音声を用いる方法では、特定の場面（例えばスポーツ中継など）でしか映像を判別することができないという問題点があった。人手でメタデータを付与する方式は、人手がかかりすぎるため、現実的ではない。 In the method disclosed in Non-Patent Document 1 for searching for a video itself as a search key (target signal), the video is superficially compared and the content is not interpreted. For this reason, there has been a problem that it is not possible to perform a video search that takes into account what is happening in the video.
In addition, the method using audio has a problem that the video can be discriminated only in a specific scene (for example, sports broadcast). The method of manually assigning metadata is not realistic because it takes too much work.

本発明は、上記課題を解決するためになされたもので、被写体の動作に基づいた映像検索を実現するために、映像にメタデータを付与することができるメタデータ付与装置、方法およびプログラムを提供することを目的とする。
また、本発明は、被写体の動作に基づいた映像検索を実現することができる映像検索装置、方法およびプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and provides a metadata adding apparatus, method, and program capable of giving metadata to video in order to realize video search based on the motion of a subject. The purpose is to do.
It is another object of the present invention to provide a video search apparatus, method, and program capable of realizing video search based on the motion of a subject.

本発明のメタデータ付与装置は、カメラによって撮影された被写体の映像を取得する映像取得手段と、前記撮影時に被写体に装着されたセンサによって計測されたセンサデータを取得するセンサデータ取得手段と、被写体に関する事象を表す事象記述子と、センサデータを用いた式で表現した当該事象の物理量表現との対応関係を予め記憶する事象記述子辞書と、前記センサデータ取得手段が取得したセンサデータに該当する物理量表現を持つ事象が前記事象記述子辞書に登録されている場合に、この事象を表す事象記述子をメタデータとし、前記映像取得手段が取得した映像に前記メタデータを付与するメタデータ付与手段とを備えることを特徴とするものである。 The metadata providing apparatus of the present invention includes a video acquisition unit that acquires a video of a subject photographed by a camera, a sensor data acquisition unit that acquires sensor data measured by a sensor attached to the subject at the time of the photographing, Corresponds to the event descriptor dictionary that stores in advance the correspondence between the event descriptor representing the event related to the event and the physical quantity representation of the event expressed by an expression using the sensor data, and the sensor data acquired by the sensor data acquisition means When an event having a physical quantity expression is registered in the event descriptor dictionary, the event descriptor representing the event is used as metadata, and metadata is added to the video acquired by the video acquisition unit. Means.

また、本発明の映像検索装置は、被写体に起きた事象を表すメタデータが付与された映像を予め記憶する記憶手段と、検索したい事象をキーワードもしくはセンサデータで表現した検索鍵を入力とし、この検索鍵の種別を判定する鍵種別判定手段と、被写体に関する事象を表す事象記述子と、センサデータを用いた式で表現した当該事象の物理量表現との対応関係を予め記憶する事象記述子辞書と、前記検索鍵が事象をセンサデータで表現した検索鍵であって、このセンサデータに該当する物理量表現を持つ事象が前記事象記述子辞書に登録されている場合に、この事象を表す事象記述子をメタデータとして抽出するメタデータ抽出手段と、前記検索鍵が事象をキーワードで表現した検索鍵であって、当該検索鍵が前記事象記述子辞書に登録されている事象記述子以外のキーワードである場合には、当該検索鍵に対応する事象記述子を入力メタデータとし、当該検索鍵が前記事象記述子辞書に登録されている事象記述子である場合には、当該事象記述子を入力メタデータとし、前記検索鍵が事象をセンサデータで表現した検索鍵である場合には、前記メタデータ抽出手段が抽出したメタデータを入力メタデータとし、入力メタデータと一致するメタデータが付与された映像を、前記記憶手段に記憶された映像の中から検索する検索手段とを備えることを特徴とするものである。 Further, the video search apparatus of the present invention receives as input a storage means for storing in advance video provided with metadata representing an event that has occurred in a subject, and a search key that expresses the event to be searched with keywords or sensor data. A key type determining means for determining the type of the search key, an event descriptor dictionary for storing in advance a correspondence relationship between an event descriptor representing an event related to the subject and a physical quantity expression of the event expressed by an expression using sensor data; When the search key is a search key representing an event with sensor data, and an event having a physical quantity representation corresponding to the sensor data is registered in the event descriptor dictionary, an event description representing the event Metadata extraction means for extracting a child as metadata, and the search key is a search key expressing an event as a keyword, and the search key is registered in the event descriptor dictionary If the keyword is a keyword other than the event descriptor, the event descriptor corresponding to the search key is input metadata, and the search key is an event descriptor registered in the event descriptor dictionary. In this case, the event descriptor is input metadata, and when the search key is a search key representing an event by sensor data, the metadata extracted by the metadata extraction unit is input metadata and input. The image processing apparatus is characterized by further comprising search means for searching for a video to which metadata matching the metadata is given from the video stored in the storage means.

また、本発明は、ＣＰＵと記憶装置とを備えたコンピュータにおいて映像にメタデータを付与するメタデータ付与方法であって、カメラによって撮影された被写体の映像を取得する映像取得ステップと、前記撮影時に被写体に装着されたセンサによって計測されたセンサデータを取得するセンサデータ取得ステップと、被写体に関する事象を表す事象記述子とセンサデータを用いた式で表現した当該事象の物理量表現との対応関係を予め記憶する事象記述子辞書に、前記センサデータ取得ステップで取得したセンサデータに該当する物理量表現を持つ事象が登録されている場合に、この事象を表す事象記述子をメタデータとし、前記映像取得ステップで取得した映像に前記メタデータを付与するメタデータ付与ステップとを、前記記憶装置に記憶されたプログラムに従って前記ＣＰＵに実行させることを特徴とするものである。 The present invention is also a metadata providing method for adding metadata to video in a computer having a CPU and a storage device, the video acquisition step for acquiring video of a subject shot by a camera, and at the time of shooting A correspondence relationship between a sensor data acquisition step of acquiring sensor data measured by a sensor attached to a subject and a physical quantity expression of the event expressed by an expression using an event descriptor and sensor data related to the subject in advance When an event having a physical quantity expression corresponding to the sensor data acquired in the sensor data acquisition step is registered in the event descriptor dictionary to be stored, the event descriptor representing the event is metadata, and the video acquisition step and metadata imparting step of imparting the metadata obtained video in, in the storage device It is characterized in that to be executed by the CPU in accordance憶program.

また、本発明は、ＣＰＵと記憶装置とを備えたコンピュータにおいて映像を検索する映像検索方法であって、検索したい事象をキーワードもしくはセンサデータで表現した検索鍵を入力とし、この検索鍵の種別を判定する鍵種別判定ステップと、前記検索鍵が事象をセンサデータで表現した検索鍵であって、被写体に関する事象を表す事象記述子とセンサデータを用いた式で表現した当該事象の物理量表現との対応関係を予め記憶する事象記述子辞書に、前記検索鍵のセンサデータに該当する物理量表現を持つ事象が登録されている場合に、この事象を表す事象記述子をメタデータとして抽出するメタデータ抽出ステップと、前記検索鍵が事象をキーワードで表現した検索鍵であって、当該検索鍵が前記事象記述子辞書に登録されている事象記述子以外のキーワードである場合には、当該検索鍵に対応する事象記述子を入力メタデータとし、当該検索鍵が前記事象記述子辞書に登録されている事象記述子である場合には、当該事象記述子を入力メタデータとし、前記検索鍵が事象をセンサデータで表現した検索鍵である場合には、前記メタデータ抽出ステップで抽出したメタデータを入力メタデータとし、入力メタデータと一致するメタデータが付与された映像を、記憶手段に記憶されたメタデータ付映像の中から検索する検索ステップとを、前記記憶装置に記憶されたプログラムに従って前記ＣＰＵに実行させることを特徴とするものである。
また、本発明のプログラムは、メタデータ付与方法または映像検索方法の各ステップをコンピュータに実行させることを特徴とするものである。 The present invention is also a video search method for searching for video in a computer having a CPU and a storage device, wherein a search key expressing an event to be searched as a keyword or sensor data is input, and the type of the search key is set. A key type determination step for determining, a search key in which the search key represents an event in sensor data, and an event descriptor representing an event related to a subject and a physical quantity representation of the event expressed in an expression using sensor data Metadata extraction that extracts an event descriptor representing this event as metadata when an event having a physical quantity expression corresponding to the sensor data of the search key is registered in an event descriptor dictionary that stores correspondence in advance And an event in which the search key is a search key expressing an event as a keyword, and the search key is registered in the event descriptor dictionary If it is a keyword other than a predicate, an event descriptor corresponding to the search key is input metadata, and if the search key is an event descriptor registered in the event descriptor dictionary, If the event descriptor is input metadata, and the search key is a search key expressing the event as sensor data, the metadata extracted in the metadata extraction step is input metadata, and matches the input metadata. A search step for searching for a video with metadata to be searched from videos with metadata stored in a storage unit according to a program stored in the storage device. It is.
The program of the present invention is characterized by causing a computer to execute each step of the metadata providing method or the video search method.

本発明によれば、カメラによって撮影された被写体の映像を取得すると同時に、被写体に装着されたセンサによって計測されたセンサデータを取得し、センサデータ取得手段が取得したセンサデータに該当する物理量表現を持つ事象が事象記述子辞書に登録されている場合に、この事象を表す事象記述子をメタデータとし、映像取得手段が取得した映像にメタデータを付与することにより、メタデータの付与を人手をかけることなく自動的に実現することができる。その結果、本発明では、映像の中に映っている被写体の動作に基づいた映像検索を実現することができる。 According to the present invention, the physical quantity expression corresponding to the sensor data acquired by the sensor data acquisition unit is acquired by acquiring sensor data measured by the sensor attached to the subject at the same time as acquiring the video of the subject photographed by the camera. When an event has been registered in the event descriptor dictionary, the event descriptor representing this event is used as metadata, and metadata is added to the video acquired by the video acquisition means, so that the addition of metadata is manually performed. It can be realized automatically without spending time. As a result, according to the present invention, it is possible to realize video search based on the motion of the subject shown in the video.

また、本発明では、検索鍵が事象をキーワードで表現した検索鍵であって、当該検索鍵が事象記述子辞書に登録されている事象記述子以外のキーワードである場合には、当該検索鍵に対応する事象記述子を入力メタデータとし、当該検索鍵が事象記述子辞書に登録されている事象記述子である場合には、当該事象記述子を入力メタデータとし、検索鍵が事象をセンサデータで表現した検索鍵であって、センサデータに該当する物理量表現を持つ事象が事象記述子辞書に登録されている場合には、事象を表す事象記述子を入力メタデータとして抽出し、入力メタデータと一致するメタデータが付与された映像を、記憶手段に記憶された映像の中から検索することにより、従来の手法では実現できなかった被写体の動作に基づいた映像検索を実現することができる。 Further, in the present invention, when the search key is a search key expressing an event as a keyword, and the search key is a keyword other than the event descriptor registered in the event descriptor dictionary, the search key is If the corresponding event descriptor is the input metadata and the search key is an event descriptor registered in the event descriptor dictionary, the event descriptor is the input metadata, and the search key is the sensor data. If an event having a physical quantity representation corresponding to sensor data is registered in the event descriptor dictionary, the event descriptor representing the event is extracted as input metadata, and the input metadata By searching for videos with metadata that matches, from the video stored in the storage means, video search based on subject movement that could not be achieved with conventional methods was realized. Rukoto can.

センサデータと映像との関係を示す図である。It is a figure which shows the relationship between sensor data and an image | video. 本発明の実施の形態に係るメタデータ付与装置の構成を示すブロック図である。It is a block diagram which shows the structure of the metadata provision apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るメタデータ付与装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the metadata provision apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るメタデータ付与装置の事象記述子辞書の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the event descriptor dictionary of the metadata provision apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るメタデータ付与装置の事象記述子辞書の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the event descriptor dictionary of the metadata provision apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video search apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像検索装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the image | video search device which concerns on embodiment of this invention. 本発明の実施の形態に係る映像検索処理を説明する図である。It is a figure explaining the video search process which concerns on embodiment of this invention.

本発明は、センサ及びセンサネットワークの技術を用いて、言語化されたメタデータをコンテンツに付与するメタデータ付与装置、及び、メタデータ付与装置で生成したメタデータが付与されたコンテンツを用いて、所望の事象を表す言語情報または所望の事象を含む映像を検索キーとして、当該事象を含む蓄積信号中の区間を検索する映像検索装置を提供する。以下、一定の時間単位を区間と呼び、メタデータの付与および映像の検索は、この区間単位で行われるものとする。 The present invention uses a sensor and sensor network technology, a metadata providing device that provides verbalized metadata to content, and a content provided with metadata generated by the metadata providing device, Provided is a video search device for searching a section in an accumulated signal including an event using language information representing the desired event or an image including the desired event as a search key. Hereinafter, a certain time unit is referred to as a section, and the addition of metadata and video search are performed in this section unit.

映像を撮影する際に、その映像に写っている物や人にセンサを取り付けることで、映像と同時にセンサデータを取得し、映像とその中に写っている物や人の動きに関するセンサデータとを関連づけたものを記憶部に記憶しておく。 When shooting a video, a sensor is attached to the object or person shown in the video, and sensor data is acquired at the same time as the video. The associated items are stored in the storage unit.

本発明では、センサデータで取得した単なる時系列の波形そのものをメタデータとして映像に付与するのではなく、センサデータの内容を解釈することによりセンサデータの示す事象を言語化したものをメタデータとして映像に付与し、その上で言語化されたメタデータを用いて映像を検索することを特徴としている。 In the present invention, the mere time-series waveform itself acquired with sensor data is not added to the video as metadata, but the event indicated by the sensor data is interpreted as metadata by interpreting the contents of the sensor data as metadata. It is characterized in that a video is retrieved using metadata that is given to the video and verbalized on the video.

例えば通常状態であれば常にかかっている重力加速度がかからない状態を判別すると、その状態は物体が落下している状態と判定することができる。また、言葉で表現しにくいような動作であっても動作で表現することで検索が可能になる。
図１にセンサデータと映像との関係を示す。図１における１００〜１０３は、後述するセンサノードが装着された物体を人が下に落とす様子を撮影した一連の映像を示している。図１の１０４は、この一連の動きに対してセンサノードから出力されるセンサデータを示している。このセンサデータを解析した結果、映像を撮影した区間で起きた事象が判定され、この事象を表す言葉（例えば「fall-vertically」）がメタデータとして得られる。 For example, if it is determined that the gravitational acceleration that is always applied is not applied in the normal state, it can be determined that the object is falling. Moreover, even if the operation is difficult to express in words, the search can be performed by expressing it in the operation.
FIG. 1 shows the relationship between sensor data and video. Reference numerals 100 to 103 in FIG. 1 indicate a series of images obtained by photographing a person dropping an object to which a sensor node described later is attached. Reference numeral 104 in FIG. 1 indicates sensor data output from the sensor node with respect to this series of movements. As a result of analyzing the sensor data, an event that occurred in the section where the video was shot is determined, and a word (for example, “fall-vertically”) representing this event is obtained as metadata.

［メタデータ付与装置］
次に、映像とセンサデータ（波形）から、センサデータを言語化したメタデータを抽出して、メタデータを映像に付与するメタデータ付与装置について詳細に説明する。図２は本発明の実施の形態に係るメタデータ付与装置の構成を示すブロック図である。メタデータ付与装置１は、映像取得部１０と、蓄積信号記憶部１１と、センサデータ取得部１２と、蓄積センサデータ記憶部１３と、事象区間検出部１４と、データ補正部１５と、メタデータ付与部１６と、事象記述子辞書１７と、メタデータ付蓄積信号記憶部１８とを有する。 [Metadata adding device]
Next, a metadata adding apparatus that extracts metadata obtained by verbalizing sensor data from video and sensor data (waveform) and applies the metadata to the video will be described in detail. FIG. 2 is a block diagram showing the configuration of the metadata providing apparatus according to the embodiment of the present invention. The metadata providing apparatus 1 includes a video acquisition unit 10, an accumulation signal storage unit 11, a sensor data acquisition unit 12, an accumulation sensor data storage unit 13, an event section detection unit 14, a data correction unit 15, and metadata. It has an assigning unit 16, an event descriptor dictionary 17, and an accumulated signal storage unit 18 with metadata.

図３はメタデータ付与装置１の動作を示すフローチャートである。まず、映像取得部１０は、カメラ２によって撮影された検索対象となる映像を取得し、この映像を蓄積信号として蓄積信号記憶部１１に蓄積する（図３ステップＳ１００）。このとき、被写体となる物体や人にはセンサノード３が装着されている。センサノード３は、加速度センサ、照度センサ、方位センサ等のセンサと、ＣＰＵと、メモリと、無線でデータ通信を行う通信機能部などを備えている。センサノード３のＣＰＵは、センサで計測されたセンサデータを、通信機能部を通じてセンサネットワーク４に送信する。 FIG. 3 is a flowchart showing the operation of the metadata providing apparatus 1. First, the video acquisition unit 10 acquires a video to be searched that is captured by the camera 2, and stores this video in the storage signal storage unit 11 as a storage signal (step S100 in FIG. 3). At this time, the sensor node 3 is attached to an object or person as a subject. The sensor node 3 includes sensors such as an acceleration sensor, an illuminance sensor, and an orientation sensor, a CPU, a memory, and a communication function unit that performs data communication wirelessly. The CPU of the sensor node 3 transmits sensor data measured by the sensor to the sensor network 4 through the communication function unit.

センサデータ取得部１２は、映像取得部１０による映像の取得と同時に、被写体の状態を示すセンサデータをセンサノード３からセンサネットワーク４を通じて取得し、このセンサデータを蓄積センサデータ記憶部１３に蓄積する（図３ステップＳ１０１）。 The sensor data acquisition unit 12 acquires sensor data indicating the state of the subject from the sensor node 3 through the sensor network 4 simultaneously with the acquisition of the video by the video acquisition unit 10 and stores the sensor data in the storage sensor data storage unit 13. (FIG. 3, step S101).

被写体に取り付けられた加速度センサなどを考えると、被写体が動いた場合にのみセンサが反応するため、多くの時間帯でセンサが動作していないことが多い。そのため、映像を撮影している全ての区間でどのような事象（イベント）が起きているかを判定することはロスが多い。よって、本実施の形態のメタデータ付与装置１において、事象区間検出部１４は、蓄積センサデータ記憶部１３に記憶されたセンサデータを解析し、事象が起きている区間を事象区間として検出する（図３ステップＳ１０２）。 Considering an acceleration sensor or the like attached to a subject, the sensor reacts only when the subject moves, so the sensor is often not operating in many time zones. For this reason, it is lossy to determine what kind of event has occurred in all sections in which video is being shot. Therefore, in the metadata providing apparatus 1 of the present embodiment, the event section detection unit 14 analyzes the sensor data stored in the accumulated sensor data storage unit 13 and detects a section in which an event occurs as an event section ( FIG. 3 step S102).

事象区間検出部１４は、例えばセンサデータが示す物理量の値が所定のしきい値以上のときに、事象が起きていると判断すればよい。例えば加速度センサの場合には、加速度がしきい値以上のときに、事象が起きていると判断すればよい。また、物理量の単位時間あたりの変化が所定のしきい値以上のときに、事象が起きていると判断してもよい。 The event section detection unit 14 may determine that an event has occurred, for example, when the physical quantity value indicated by the sensor data is equal to or greater than a predetermined threshold value. For example, in the case of an acceleration sensor, it may be determined that an event has occurred when the acceleration is equal to or greater than a threshold value. Alternatively, it may be determined that an event has occurred when a change in physical quantity per unit time is equal to or greater than a predetermined threshold.

加速度センサは３軸（Ｘ軸、Ｙ軸、Ｚ軸）の情報を持っているが、センサノード３が被写体に装着されている方向が不明である。そこで、重力加速度を用いて、実際に加速度センサがどの方向を向いているかを補正する必要がある。また、センサから出力されるセンサデータにはノイズが乗っていることが多いので、できるだけノイズを除去する。具体的には、データ補正部１５は、３軸の加速度データに含まれているノイズを例えばフィルタ処理によって除去する。さらに、データ補正部１５は、ノイズ除去後の３軸の加速度データを基に重力加速度を検出して加速度センサがどの方向に傾いているかを検出し、検出した加速度センサの向きとノイズ除去後の３軸の加速度データとから、鉛直方向の加速度データとそれに垂直な水平面内での加速度データとを計算する（図３ステップＳ１０３）。こうして、データを補正することができる。なお、加速度センサ以外の他のセンサのセンサデータについては、ノイズ除去を行えばよい。 The acceleration sensor has information on three axes (X axis, Y axis, and Z axis), but the direction in which the sensor node 3 is attached to the subject is unknown. Therefore, it is necessary to correct which direction the acceleration sensor is actually facing by using gravitational acceleration. Also, since sensor data output from the sensor often includes noise, noise is removed as much as possible. Specifically, the data correction unit 15 removes noise included in the triaxial acceleration data by, for example, filtering. Further, the data correction unit 15 detects the gravitational acceleration based on the triaxial acceleration data after noise removal, detects in which direction the acceleration sensor is tilted, and detects the direction of the detected acceleration sensor and the noise-removed direction. From the triaxial acceleration data, vertical acceleration data and acceleration data in a horizontal plane perpendicular to the vertical acceleration data are calculated (step S103 in FIG. 3). In this way, data can be corrected. In addition, what is necessary is just to perform noise removal about the sensor data of sensors other than an acceleration sensor.

次に、メタデータ付与部１６は、事象区間検出部１４で検出した事象区間を対象として、実際にどのような事象が起きたかを判定する（図３ステップＳ１０４）。事象を判定する方法については、例えば特開２００７−３０４６９２号公報に開示されている。以下、特開２００７−３０４６９２号公報に開示された技術を基に、事象を判定する方法について説明する。 Next, the metadata adding unit 16 determines what event has actually occurred for the event section detected by the event section detecting unit 14 (step S104 in FIG. 3). A method for determining an event is disclosed in, for example, Japanese Patent Application Laid-Open No. 2007-304692. Hereinafter, a method for determining an event will be described based on the technique disclosed in Japanese Patent Application Laid-Open No. 2007-304692.

事象記述子辞書１７は、人や物体に関する事象を示す動詞からなる述語と、当該述語が属するグループを示すラベルからなる事象記述子と、センサデータを用いた式で表現した当該事象の物理量表現とを対応付けて予め記憶するデータベースである。
図４を参照して、事象記述子辞書１７について詳細に説明する。図４は事象記述子辞書１７の具体例を示す説明図である。 The event descriptor dictionary 17 includes a predicate including a verb indicating an event related to a person or an object, an event descriptor including a label indicating a group to which the predicate belongs, and a physical quantity expression of the event expressed by an expression using sensor data. Is a database that stores them in advance in association with each other.
The event descriptor dictionary 17 will be described in detail with reference to FIG. FIG. 4 is an explanatory diagram showing a specific example of the event descriptor dictionary 17.

事象記述子辞書１７は、次のようにして予め生成される。まず、シソーラス体系を用いて、「運動」に関係する基本的な動詞と「スカラー物理量」に関係する基本的な動詞とを種として、シソーラスの関係にあるリンクを辿っていき、物理現象と対応付けできる述語を可能な限り集める。
ここでは、英語の電子シソーラス体系ＷｏｒｄＮｅｔ（例えば文献「C.Fellbaut，WordNet：an electronic lexical database，MIT Press，1998」参照）を利用して、「運動」に関係する動詞「move,reach,pass,exit,touch,enter」と、「スカラー物理量」に関係する動詞「rise,drop,increase,keep,remain」とを種として、シソーラスの関係にあるリンクを次々に辿っていき、物理現象と対応付けできる述語をすべて集める。 The event descriptor dictionary 17 is generated in advance as follows. First, using the thesaurus system, the basic verbs related to “movement” and the basic verbs related to “scalar physical quantity” are used as seeds to follow the links related to the thesaurus, and to deal with physical phenomena. Collect as many predicates as possible.
Here, the verb “move, reach, pass,” related to “exercise” is utilized using the English electronic thesaurus system WordNet (see, for example, the document “C. Fellbaut, WordNet: an electronic lexical database, MIT Press, 1998”). `` exit, touch, enter '' and verbs `` rise, drop, increase, keep, remain '' related to `` scalar physical quantity '' as seeds, follow links in the thesaurus one after another and associate with physical phenomena Collect all possible predicates.

その際、厳密に同一の意味を持つ複数の述語は１つのグループとして、ＷｏｒｄＮｅｔ中に記載されているその意味を表現する節を利用してそのグループにラベル付けする。例えば、動詞「shift,dislodge,reposition」は、「change direction（方向を変える）」という共通の意味を持つ１つのグループを形成し、そのグループに対して「change-direction」というラベルを付ける。このようにして、ラベル付けされた述語グループをそれぞれ作成する。 At this time, a plurality of predicates having exactly the same meaning are labeled as one group by using a clause expressing the meaning described in WordNet. For example, the verb “shift, dislodge, reposition” forms one group with a common meaning “change direction” and labels the group “change-direction”. In this way, each labeled predicate group is created.

次に、これら述語グループごとに、他動詞か自動詞かの区別をして、ラベルに補助記号を導入する。例えば、ラベル「change-location」については、他動詞「move,travel,locomote,go」のグループの場合、「change-location(a,o)」のように２つの引数を付け、引数aが他動詞における主語を表現し、引数oが目的語を表現できるようにする。一方、自動詞の場合は、「change-location(o)」と主語のみを明示できるようにするか、あるいはそれを省略して「change-location」と表現する。 Next, for each of these predicate groups, a distinction is made between transitive verbs and intransitive verbs, and auxiliary symbols are introduced into the labels. For example, for the label “change-location”, in the case of the group of the transitive verb “move, travel, locomote, go”, two arguments are added like “change-location (a, o)”, and the argument “a” in the transitive verb Represents the subject and allows the argument o to represent the object. On the other hand, in the case of an intransitive verb, only “change-location (o)” and the subject can be clearly indicated, or it is omitted and expressed as “change-location”.

さらに、時間や場所などの情報をその意味として内包する単語があり、それらの単語群に対するラベルにその情報を明示する引数を付与する表現を導入する。すなわち、「reach-destination」というラベルを貼られた自動詞単語群「read,arrive at,gain」は、到達すべき「場所」あるいは「物体」があることをその意味に内包しているので、「reach-destination(b:object)」、あるいは正確には「(reach-destination(o))(b:object)」と表記する。この引数を持ったラベルを事象記述子と呼び、事象記述子をアルファベット順に並べて電子化した事象記述子辞書１７を作成する。また、そのラベルに対する単語群はその事象記述子に関連付けられる。 Furthermore, there is a word that includes information such as time and place as its meaning, and an expression that adds an argument specifying the information to a label for the word group is introduced. In other words, the intransitive word group “read, arrive at, gain” labeled “reach-destination” has the meaning of “location” or “object” to be reached. reach-destination (b: object) ", or more precisely" (reach-destination (o)) (b: object) ". A label having this argument is called an event descriptor, and an event descriptor dictionary 17 is created by arranging the event descriptors in alphabetical order and digitizing them. A group of words for the label is associated with the event descriptor.

事象記述子辞書１７においては、図４に示すように、対象となる人や物体に関する事象を示す述語１７Ａに対して、事象記述子１７Ｂと物理量表現１７Ｃが対応付けられている。物理量表現１７Ｃは、当該事象を時間分割して設けたそれぞれの区間を示す時間制約式からなる時間制約式部１７Ｅと、当該区間における当該事象を３次元座標や速度、温度、湿度、加速度などの任意の物理量で表現した物理式（P1,P2,P3）からなる物理式部１７Ｄとの組から構成されている。 In the event descriptor dictionary 17, as shown in FIG. 4, an event descriptor 17B and a physical quantity expression 17C are associated with a predicate 17A indicating an event related to a target person or object. The physical quantity expression 17C includes a time constraint expression unit 17E including a time constraint expression indicating each section provided by dividing the event in time, and the event in the section such as three-dimensional coordinates, speed, temperature, humidity, and acceleration. It consists of a pair with a physical formula part 17D consisting of physical formulas (P1, P2, P3) expressed by an arbitrary physical quantity.

例えば、図４（Ａ）の例では、「move,travel,locomote」という各述語１７Ａに対して、「change-location(a,o)」という事象記述子１７Ｂと、
P1:do not care (tO≦t＜t1),
P2:|v|＞0 (t1≦t≦t2),
P3:do not care (t2＜t≦t3).
という物理量表現１７Ｃとが関連付けられている。ここで、物理量表現１７Ｃのうち、時間制約式部１７Ｅにおけるt0,t1,t2,t3は実数値を取る時刻を表わすパラメータであり、時刻t1が当該事象の発生時刻に相当し、時刻t2が当該事象の終了時刻に相当する。 For example, in the example of FIG. 4A, for each predicate 17A “move, travel, locomote”, an event descriptor 17B “change-location (a, o)”;
P1: do not care (tO ≦ t <t1),
P2: | v |> 0 (t1 ≦ t ≦ t2),
P3: do not care (t2 <t ≦ t3).
Is associated with the physical quantity expression 17C. Here, in the physical quantity expression 17C, t0, t1, t2, and t3 in the time constraint expression unit 17E are parameters that represent real time values, the time t1 corresponds to the occurrence time of the event, and the time t2 corresponds to the time. Corresponds to the end time of the event.

したがって、物理式部１７Ｄのうち式P1が事象発生直前の区間における対象物の状態を示し、式P2が事象発生中の区間における対象物の状態を示し、式P3が事象発生直後の区間における対象物の状態を示している。
図４（Ａ）の例では、時刻t1から時刻t2の事象発生中区間における条件として、対象物oの速度ベクトルvの大きさ|v|が０より大きいという条件、すなわち対象物oが移動していることが物理式部１７Ｄで規定されていることになる。 Therefore, in the physical formula part 17D, the expression P1 indicates the state of the object in the section immediately before the event occurs, the expression P2 indicates the state of the object in the section in which the event occurs, and the expression P3 indicates the object in the section immediately after the event occurs. Indicates the state of the object.
In the example of FIG. 4A, as a condition in the event occurrence period from time t1 to time t2, a condition that the magnitude | v | of the velocity vector v of the object o is larger than 0, that is, the object o moves. This is defined by the physical formula part 17D.

また、図４（ｂ）の例では、「move」という述語１７Ａに対して、「(go-from-region-to-region(o))(r1:region,r2:region)」という事象記述子１７Ｂと、
D(x,r1)＝0 (t0≦t＜t1),
|v|＞0 & D(x,r2)＞0 (t1≦t≦t2),
D(x,r2)＝0 (t2＜t≦t3).
という物理量表現１７Ｃとが関連付けられている。ここで、物理量表現１７Ｃのうち、xは対象物oの３次元座標であり、D(r1,r2)は、r1とr2のユークリッド距離を表わし、vは対象物oの速度ベクトルを表わす。この場合、対象物oは時刻t1以前の区間において領域region1にあり、時刻t1から動きが始まって時刻t2で領域region2に到達し、時刻t2より後の区間ではregion2にあることを表現している。&は論理積を示す。 In the example of FIG. 4B, an event descriptor “(go-from-region-to-region (o)) (r1: region, r2: region)” is associated with the predicate 17A “move”. 17B,
D (x, r1) = 0 (t0 ≦ t <t1),
| v |> 0 & D (x, r2)> 0 (t1 ≦ t ≦ t2),
D (x, r2) = 0 (t2 <t ≦ t3).
Is associated with the physical quantity expression 17C. Here, in the physical quantity expression 17C, x is the three-dimensional coordinate of the object o, D (r1, r2) represents the Euclidean distance between r1 and r2, and v represents the velocity vector of the object o. In this case, the object o is in the region region1 in the section before the time t1, starts moving from the time t1, reaches the region region2 at the time t2, and expresses that it is in the region 2 in the section after the time t2. . & Indicates a logical product.

また、図４（ｃ）の例では、「drop」という述語１７Ａに対して、「fall-vertically(o)」という事象記述子１７Ｂと、
do not care (tO≦t＜t1),
(v/|v|)・(g/|g|)＝1 & a＝9 (t1≦t≦t2),
do not care (t2＜t≦t3).
という物理量表現１７Ｃとが関連付けられている。ここで、物理量表現１７Ｃのうち、gは重力定数ベクトル、「・」は内積、aは対象物oの加速度を表わす。この場合、対象物oは時刻t1から時刻t2の区間において加速度gで鉛直下に向かって動いていることを表している。このようにして、各述語１７Ａを代表する事象記述子１７Ｂの各々についての物理量表現１７Ｃを策定し、事象記述子辞書１７に保存しておく。図５は、事象記述子と物理量表現の具体例を示す説明図である。 Further, in the example of FIG. 4C, for the predicate 17A “drop”, an event descriptor 17B “fall-vertically (o)”,
do not care (tO ≦ t <t1),
(v / | v |) ・ (g / | g |) = 1 & a = 9 (t1 ≦ t ≦ t2),
do not care (t2 <t ≦ t3).
Is associated with the physical quantity expression 17C. Here, in the physical quantity expression 17C, g represents a gravity constant vector, “·” represents an inner product, and a represents the acceleration of the object o. In this case, the object o indicates that the object is moving vertically downward at an acceleration g in a section from time t1 to time t2. In this way, a physical quantity representation 17C for each event descriptor 17B representing each predicate 17A is formulated and stored in the event descriptor dictionary 17. FIG. 5 is an explanatory diagram showing a specific example of event descriptors and physical quantity expressions.

メタデータ付与部１６は、以上のような事象記述子辞書１７を参照し、事象区間検出部１４で検出した事象区間内のセンサデータに該当する物理量表現を持つ事象が事象記述子辞書１７に登録されている場合には、当該事象が事象区間内で起きたと判定する（図３ステップＳ１０４）。 The metadata adding unit 16 refers to the event descriptor dictionary 17 as described above, and an event having a physical quantity expression corresponding to the sensor data in the event section detected by the event section detecting unit 14 is registered in the event descriptor dictionary 17. If so, it is determined that the event has occurred within the event section (step S104 in FIG. 3).

例えば図５に示すように「drop」という述語に対して、「fall-vertically(o)」という事象記述子と、
do not care (tO≦t＜t1);0＜t1-t0,
(v/|v|)・(g/|g|)＝1 & a＝g (t1≦t≦t2);200msec≦t2-t1＜800msec,
do not care (t2＜t≦t3);0＜t3-t2.
という物理量表現とが関連付けられて事象記述子辞書１７に登録されているとする。ここで、データ補正部１５によって補正された、事象区間内のセンサデータに、２００ｍｓｅｃから８００ｍｓｅｃ未満の経過時間で加速度gで鉛直下に向かって動いたことを示すセンサデータが含まれている場合、図５に示した物理量表現に該当するので、メタデータ付与部１６は、「fall-vertically(o)」という事象記述子で表現される事象、すなわち「落ちる」という言葉で表現される事象が事象区間内で起きたと判定する。 For example, as shown in FIG. 5, for the predicate “drop”, an event descriptor “fall-vertically (o)”
do not care (tO ≦ t <t1); 0 <t1-t0,
(v / | v |) ・ (g / | g |) = 1 & a = g (t1 ≦ t ≦ t2); 200msec ≦ t2-t1 <800msec,
do not care (t2 <t ≦ t3); 0 <t3-t2.
Is associated with the physical quantity expression and registered in the event descriptor dictionary 17. Here, when the sensor data within the event section corrected by the data correction unit 15 includes sensor data indicating that the sensor data has moved downward vertically at an acceleration g in an elapsed time from 200 msec to less than 800 msec, Since it corresponds to the physical quantity expression shown in FIG. 5, the metadata adding unit 16 determines that an event expressed by an event descriptor “fall-vertically (o)”, that is, an event expressed by the word “fall” is an event. It is determined that it occurred in the section.

なお、条件「(v/|v|)・(g/|g|)＝1 & a＝g」が成立するかどうかを判定するためには、速度ベクトルvが必要である。速度ベクトルvを算出するには、初速を「０（ゼロ）」とおいて、加速度を数値積分することにより各時刻における速度ベクトルvを算出することができる。 In order to determine whether or not the condition “(v / | v |) · (g / | g |) = 1 & a = g” is satisfied, the velocity vector v is necessary. In order to calculate the velocity vector v, it is possible to calculate the velocity vector v at each time by numerically integrating the acceleration with the initial velocity set to “0 (zero)”.

次に、メタデータ付与部１６は、事象区間内で起きた事象を判定できた場合に、この事象を表す言葉（例えば「fall-vertically」という事象記述子）をメタデータとして、このメタデータを当該事象区間に相当する時間区間の蓄積信号に付与し、メタデータを付与した蓄積信号をメタデータ付蓄積信号記憶部１８に蓄積する（図４ステップＳ１０５）。
以上で、メタデータ付与装置１の処理が終了する。 Next, when the event that has occurred in the event section can be determined, the metadata adding unit 16 uses the word representing the event (for example, an event descriptor of “fall-vertically”) as metadata, and uses the metadata. The accumulated signal to which the metadata is added and accumulated in the accumulated signal in the time interval corresponding to the event interval is accumulated in the accumulated signal storage unit with metadata 18 (step S105 in FIG. 4).
Thus, the process of the metadata providing apparatus 1 ends.

［映像検索装置］
次に、メタデータ付与装置１で生成されたメタデータ付蓄積信号を用いて、映像検索を行う映像検索装置について説明する。図６は本発明の実施の形態に係る映像検索装置の構成を示すブロック図である。映像検索装置５は、メタデータ付蓄積信号記憶部５０と、映像取得部５１と、センサデータ取得部５２と、検索鍵入力部５３と、鍵種別判定部５４と、メタデータ抽出部５５と、事象記述子辞書５６と、蓄積信号検索部５７と、出力部５８とを有する。 [Video search device]
Next, a video search device that performs video search using the metadata-added accumulation signal generated by the metadata adding device 1 will be described. FIG. 6 is a block diagram showing the configuration of the video search apparatus according to the embodiment of the present invention. The video search device 5 includes a storage signal storage unit 50 with metadata, a video acquisition unit 51, a sensor data acquisition unit 52, a search key input unit 53, a key type determination unit 54, a metadata extraction unit 55, It has an event descriptor dictionary 56, an accumulated signal search unit 57, and an output unit 58.

図７は映像検索装置５の動作を示すフローチャートである。メタデータ付蓄積信号記憶部５０には、メタデータ付与装置１で生成されたメタデータ付蓄積信号が格納される。
検索鍵入力部５３には、映像検索装置５の使用者によって検索鍵が入力される（図７ステップＳ２００）。検索鍵としては、（１）人や物体の検索したい動きを言葉で表現したもの、（２）人や物体の検索したい動きを映像とセンサデータとの組で表現したもの、が考えられる。 FIG. 7 is a flowchart showing the operation of the video search device 5. The metadata-added accumulation signal storage unit 50 stores the metadata-added accumulation signal generated by the metadata providing apparatus 1.
A search key is input to the search key input unit 53 by the user of the video search device 5 (step S200 in FIG. 7). As the search key, (1) a motion that the person or object wants to search is expressed in words, and (2) a motion that the person or object wants to search is expressed by a set of video and sensor data.

検索鍵として人や物体の検索したい動きを言葉で表現したキーワードを用いる場合には、例えば「drop」や「fall-vertically」といったキーワードを検索鍵として検索を行うことで、人や物体が「落ちる」動作をしている蓄積信号中の区間を検索することができる。 When using a keyword that expresses the movement of a person or object as a search key in words, for example, by searching for a keyword such as “drop” or “fall-vertically” It is possible to search for an interval in the accumulated signal that is operating.

また、検索鍵として人や物体の検索したい動きを映像とセンサデータとの組で表現したものを用いる場合には、映像検索装置５の使用者が例えばカメラの前でセンサノードを付けた状態で検索したい動作を実際に行い、取得した映像とセンサデータとを検索鍵とする。この場合は、映像取得部５１がカメラ２によって撮影された映像を取得すると同時に、センサデータ取得部５２がセンサデータをセンサノード３からセンサネットワーク４を通じて取得する。これにより、使用者が行った動作に類似する動作を含む蓄積信号中の区間を検索することができる。なお、検索鍵として、メタデータ付蓄積信号記憶部５０に記憶された一区間の蓄積信号とセンサデータとを利用しても良い。（１）、（２）のどちらの場合でも、全く同じ映像を検索するというよりは、検索鍵で指定された動きと同じような動きを含む映像を探索することができることが本実施の形態の特徴である。 In addition, when using a search key that expresses a motion to search for a person or object as a set of video and sensor data, the user of the video search device 5 attaches a sensor node in front of the camera, for example. The operation to be searched is actually performed, and the acquired video and sensor data are used as search keys. In this case, at the same time that the video acquisition unit 51 acquires the video captured by the camera 2, the sensor data acquisition unit 52 acquires the sensor data from the sensor node 3 through the sensor network 4. Thereby, it is possible to search for a section in the accumulated signal including an operation similar to the operation performed by the user. In addition, you may utilize the accumulation signal and sensor data of one area memorize | stored in the accumulation signal memory | storage part 50 with metadata as a search key. In either of the cases (1) and (2), it is possible to search for a video including a motion similar to the motion specified by the search key, rather than searching for the exact same video. It is a feature.

続いて、鍵種別判定部５４は、検索鍵入力部５３から入力された検索鍵の種別が、上記の（１）人や物体の検索したい動きを言葉で表現したもの、（２）人や物体の検索したい動きを映像とセンサデータとの組で表現したもののどちらであるかを判別する（図７ステップＳ２０１）。鍵種別判定部５４は、入力された検索鍵が、人や物体の検索したい動きを言葉で表現したキーワード（たとえば「drop」）であれば、この検索鍵を対応する事象記述子（例えば「fall-vertically」）に変換した上で、メタデータとして蓄積信号検索部５６へ送信する（図７ステップＳ２０２）。なお、検索鍵が事象記述子の場合には、変換は必要ないのでそのままメタデータとして蓄積信号検索部５６へ送信してよい。また、鍵種別判定部５４は、入力された検索鍵が、人や物体の検索したい動きを映像とセンサデータとの組で表現したものであれば、この検索鍵をメタデータ抽出部５５へ送信する（図７ステップＳ２０３）。 Subsequently, the key type determination unit 54 indicates that the type of the search key input from the search key input unit 53 expresses the above-described (1) movement of the person or object to be searched in words, (2) person or object It is discriminated whether the motion to be searched is represented by a set of video and sensor data (step S201 in FIG. 7). If the input search key is a keyword (for example, “drop”) that expresses a motion to search for a person or an object in words, the key type determination unit 54 uses this search key as a corresponding event descriptor (for example, “fall”). -vertically ") and then transmitted as metadata to the accumulated signal search unit 56 (step S202 in FIG. 7). If the search key is an event descriptor, no conversion is necessary, and the search key may be sent as it is to the stored signal search unit 56 as metadata. Also, the key type determination unit 54 transmits the search key to the metadata extraction unit 55 if the input search key represents a motion to search for a person or an object as a set of video and sensor data. (Step S203 in FIG. 7).

メタデータ抽出部５５は、鍵種別判定部５４から入力された検索鍵であるセンサデータの区間を事象区間として、メタデータ付与装置１におけるデータ補正部１５およびメタデータ付与部１６と同様の処理を行い、検索鍵であるセンサデータに対応する事象を判定して、この事象を表す言葉をメタデータとして蓄積信号検索部５６へ送信する（図７ステップＳ２０４）。具体的には、メタデータ抽出部５５は、センサデータに対してデータ補正部１５と同様のノイズ除去処理および補正処理を行った後、事象記述子辞書５６を参照し、補正したセンサデータに該当する物理量表現を持つ事象が事象記述子辞書５６に登録されている場合には、当該事象が事象区間内で起きたと判定し、この事象を表す言葉をメタデータとする。 The metadata extracting unit 55 performs the same processing as the data correcting unit 15 and the metadata adding unit 16 in the metadata adding apparatus 1 using the sensor data section that is the search key input from the key type determining unit 54 as an event section. Then, an event corresponding to the sensor data as the search key is determined, and a word representing this event is transmitted as metadata to the accumulated signal search unit 56 (step S204 in FIG. 7). Specifically, the metadata extraction unit 55 performs noise removal processing and correction processing similar to the data correction unit 15 on the sensor data, and then refers to the event descriptor dictionary 56 and corresponds to the corrected sensor data. When an event having a physical quantity expression to be registered is registered in the event descriptor dictionary 56, it is determined that the event has occurred within the event section, and a word representing this event is used as metadata.

蓄積信号検索部５６は、鍵種別判定部５４またはメタデータ抽出部５５から入力されたメタデータを用いて、メタデータ付蓄積信号記憶部５０に記憶された蓄積信号を検索する（図７ステップＳ２０５）。具体的には、蓄積信号検索部５６は、入力されたメタデータと一致するメタデータが付与された時間区間の蓄積信号を、メタデータ付蓄積信号記憶部５０から検索し、検索した時間区間の蓄積信号を検索結果の映像として出力部５８へ出力する。出力部５８は、この検索結果の映像を表示する。
以上で、映像検索装置５の処理が終了する。 The accumulated signal search unit 56 uses the metadata input from the key type determination unit 54 or the metadata extraction unit 55 to search the accumulated signal stored in the accumulated signal storage unit 50 with metadata (step S205 in FIG. 7). ). Specifically, the accumulation signal search unit 56 searches the accumulation signal storage unit with metadata 50 for an accumulation signal in a time interval to which metadata that matches the input metadata is assigned, and searches for the retrieved time interval. The accumulated signal is output to the output unit 58 as a search result image. The output unit 58 displays the video of the search result.
Thus, the processing of the video search device 5 ends.

以上の検索処理のイメージを図８に示す。図８の８００は、センサノードが装着された物体を人が下に落とす様子を撮影した映像を示している。図８の８０１は、この動きに対してセンサノード３から出力されたセンサデータを示している。メタデータ付与装置１がセンサデータを解析した結果、映像を撮影した区間で起きた事象が判定され、この事象を表す「fall-vertically」というメタデータが得られ、このメタデータが映像に付与される。メタデータが付与された映像は、メタデータ付与装置１を通じて映像検索装置５のメタデータ付蓄積信号記憶部５０に格納される。 An image of the above search processing is shown in FIG. Reference numeral 800 in FIG. 8 indicates an image obtained by photographing a person dropping an object to which a sensor node is attached. Reference numeral 801 in FIG. 8 indicates sensor data output from the sensor node 3 with respect to this movement. As a result of analyzing the sensor data by the metadata adding device 1, an event that occurred in the section where the video was shot is determined, metadata “fall-vertically” representing this event is obtained, and this metadata is added to the video The The video to which the metadata is added is stored in the accumulated signal storage unit 50 with metadata of the video search device 5 through the metadata adding device 1.

次に、図８の８０２は、センサノードを手に装着した使用者が検索鍵を入力するために振り上げた手を下げる様子を撮影した映像を示している。図８の８０３は、この動きに対してセンサノード３から出力されたセンサデータを示している。映像検索装置５がセンサデータを解析した結果、映像を撮影した区間で起きた事象が判定され、この事象を表す「fall-vertically」というメタデータが得られる。蓄積信号検索部５６は、このメタデータを用いて、メタデータ付蓄積信号記憶部５０に記憶された蓄積信号を検索する。 Next, reference numeral 802 in FIG. 8 shows an image of a situation where a user wearing a sensor node lowers his / her hand raised to input a search key. Reference numeral 803 in FIG. 8 indicates sensor data output from the sensor node 3 with respect to this movement. As a result of analyzing the sensor data by the video search device 5, an event that occurred in the section where the video was shot is determined, and metadata “fall-vertically” representing this event is obtained. The accumulated signal search unit 56 searches for the accumulated signal stored in the accumulated signal storage unit 50 with metadata using the metadata.

以上のように本実施の形態では、映像にメタデータを自動的に付与することができ、また映像にメタデータを付与することにより、従来の手法では実現できなかった被写体の動作に基づいた映像検索を実現することができる。 As described above, in the present embodiment, metadata can be automatically assigned to a video, and by adding metadata to a video, a video based on the motion of a subject that could not be realized by a conventional method. Search can be realized.

なお、本実施の形態のメタデータ付与装置１と映像検索装置５の各々は、それぞれＣＰＵ、記憶装置および外部とのインタフェースを備えたコンピュータと、これらのハードウェア資源を制御するプログラムによって実現することができる。このようなコンピュータにおいて、本発明を実現させるためのプログラムは、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、メモリカードなどの記録媒体に記録された状態で提供される。各々の装置のＣＰＵは、記録媒体から読み込んだプログラムを記憶装置に書き込み、プログラムに従って本実施の形態で説明した処理を実行する。
また、本実施の形態では、メタデータ付与装置１と映像検索装置５とを別々の装置としているが、メタデータ付与装置１と映像検索装置５とを一体の装置としてもよい。 Each of the metadata providing apparatus 1 and the video search apparatus 5 according to the present embodiment is realized by a computer having a CPU, a storage device, and an external interface, and a program for controlling these hardware resources. Can do. In such a computer, a program for realizing the present invention is provided in a state of being recorded on a recording medium such as a flexible disk, a CD-ROM, a DVD-ROM, or a memory card. The CPU of each device writes the program read from the recording medium to the storage device, and executes the processing described in this embodiment according to the program.
In the present embodiment, the metadata providing device 1 and the video search device 5 are separate devices, but the metadata providing device 1 and the video search device 5 may be integrated.

本発明は、センサおよびセンサネットワークの技術を用いて映像に言語化されたメタデータを付与する技術、および検索したい事象をキーワードもしくはセンサデータで表現した検索鍵を入力として、当該事象を含む映像を検索する技術に適用することができる。 The present invention provides a technique for adding verbalized metadata to video using the technology of sensors and sensor networks, and a search key that expresses an event to be searched with a keyword or sensor data as an input. It can be applied to search technology.

１…メタデータ付与装置、２…カメラ、３…センサノード、４…センサネットワーク、５…映像検索装置、１０…映像取得部、１１…蓄積信号記憶部、１２…センサデータ取得部、１３…蓄積センサデータ記憶部、１４…事象区間検出部、１５…データ補正部、１６…メタデータ付与部、１７…事象記述子辞書、１８…メタデータ付蓄積信号記憶部、５０…メタデータ付蓄積信号記憶部、５１…映像取得部、５２…センサデータ取得部、５３…検索鍵入力部、５４…鍵種別判定部、５５…メタデータ抽出部、５６…事象記述子辞書、５７…蓄積信号検索部、５８…出力部。 DESCRIPTION OF SYMBOLS 1 ... Metadata provision apparatus, 2 ... Camera, 3 ... Sensor node, 4 ... Sensor network, 5 ... Image | video search apparatus, 10 ... Image | video acquisition part, 11 ... Accumulation signal storage part, 12 ... Sensor data acquisition part, 13 ... Accumulation Sensor data storage unit, 14 ... event section detection unit, 15 ... data correction unit, 16 ... metadata addition unit, 17 ... event descriptor dictionary, 18 ... accumulation signal storage unit with metadata, 50 ... accumulation signal storage with metadata , 51 ... Video acquisition unit, 52 ... Sensor data acquisition unit, 53 ... Search key input unit, 54 ... Key type determination unit, 55 ... Metadata extraction unit, 56 ... Event descriptor dictionary, 57 ... Accumulated signal search unit, 58: Output section.

Claims

Video acquisition means for acquiring video of the subject photographed by the camera;
Sensor data acquisition means for acquiring sensor data measured by a sensor attached to the subject at the time of shooting;
An event descriptor dictionary that stores in advance a correspondence relationship between an event descriptor representing an event related to a subject and a physical quantity representation of the event expressed by an expression using sensor data;
When an event having a physical quantity expression corresponding to the sensor data acquired by the sensor data acquisition means is registered in the event descriptor dictionary, the event descriptor representing this event is set as metadata, and the video acquisition means A metadata providing apparatus, comprising: a metadata adding unit that adds the metadata to the acquired video.

Storage means for storing in advance a video to which metadata representing an event that has occurred in a subject is attached;
A key type determination unit that inputs a search key expressing a phenomenon to be searched by a keyword or sensor data and determines the type of the search key;
An event descriptor dictionary that stores in advance a correspondence relationship between an event descriptor representing an event related to a subject and a physical quantity representation of the event expressed by an expression using sensor data;
When the search key is a search key expressing an event with sensor data, and an event having a physical quantity expression corresponding to the sensor data is registered in the event descriptor dictionary, an event descriptor representing this event Metadata extraction means for extracting as metadata,
When the search key is a search key expressing an event as a keyword, and the search key is a keyword other than the event descriptor registered in the event descriptor dictionary, the event corresponding to the search key If the descriptor is input metadata and the search key is an event descriptor registered in the event descriptor dictionary, the event descriptor is input metadata, and the search key uses the event as sensor data. In the case of the search key expressed in the above, the metadata extracted by the metadata extraction unit is used as input metadata, and a video provided with metadata that matches the input metadata is stored in the video stored in the storage unit. A video search apparatus comprising: search means for searching from among the above.

A metadata providing method for adding metadata to video in a computer including a CPU and a storage device,
An image acquisition step for acquiring an image of a subject photographed by the camera;
A sensor data acquisition step of acquiring sensor data measured by a sensor attached to the subject at the time of shooting;
An event descriptor dictionary that stores in advance a correspondence relationship between an event descriptor representing an event related to a subject and a physical quantity expression of the event expressed by an expression using sensor data, and corresponds to the sensor data acquired in the sensor data acquisition step. When an event having a physical quantity expression is registered, an event descriptor representing the event is used as metadata, and a metadata adding step for adding the metadata to the video acquired in the video acquiring step includes the storage device. A method for assigning metadata, wherein the CPU executes the program according to a program stored in the program .

A video search method for searching video in a computer having a CPU and a storage device,
A key type determination step for determining a type of the search key, using a search key expressing the event to be searched as a keyword or sensor data as an input,
An event description in which the search key is a search key that represents an event in sensor data, and stores in advance a correspondence relationship between an event descriptor that represents an event related to a subject and a physical quantity representation of the event represented by an expression using sensor data A metadata extraction step of extracting an event descriptor representing this event as metadata when an event having a physical quantity expression corresponding to the sensor data of the search key is registered in the child dictionary;
When the search key is a search key expressing an event as a keyword, and the search key is a keyword other than the event descriptor registered in the event descriptor dictionary, the event corresponding to the search key If the descriptor is input metadata and the search key is an event descriptor registered in the event descriptor dictionary, the event descriptor is input metadata, and the search key uses the event as sensor data. In the case of the search key expressed in (4), the metadata extracted in the metadata extraction step is used as input metadata, and the video to which metadata matching the input metadata is assigned is stored in the storage means. A video search method, comprising: causing the CPU to execute a search step for searching from an attached video according to a program stored in the storage device .

A program for causing a computer to execute each step according to claim 3 or 4.