JP2010198111A

JP2010198111A - Metadata extraction server, metadata extraction method and program

Info

Publication number: JP2010198111A
Application number: JP2009039529A
Authority: JP
Inventors: Satoru Kondo; 悟近藤; Takeshi Ogawa; 猛志小川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-02-23
Filing date: 2009-02-23
Publication date: 2010-09-09
Anticipated expiration: 2029-02-23
Also published as: JP5072880B2

Abstract

<P>PROBLEM TO BE SOLVED: To automatically generate and update a proper recognizer from information on a Web. <P>SOLUTION: A teacher information collection part 234 collects content from a Web server 240. A characteristic amount calculation part 204 calculates a characteristic amount matrix representing the content of the collected content. A characteristic amount DB 201 clusters the characteristic amount matrixes by group when storing them at predetermined time intervals, and determines a subspace for each group of the characteristic amount matrixes. A metadata arrangement part 203 morphologically analyzes the collected content, and generates a metadata matrix representing occurrence frequency of a morphologically-analyzed word. A correlation coefficient calculation part 214 calculates a correlation coefficient between the group of the characteristic amount matrixes and a group of the metadata matrixes in each subspace. A correlation coefficient DB 205 stores the correlation coefficient calculated for each subspace. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、主に、映像や音楽などで構成されるメディアコンテンツの内容を認識し、多様な情報を含むメディアデータから、メディアデータ同士を関連付けるメタデータを抽出するメタデータ抽出サーバ、メタデータ抽出方法およびプログラムに関する。 The present invention mainly recognizes the contents of media content composed of video, music, etc., and extracts metadata for associating media data with each other from media data including various information, and metadata extraction It relates to a method and a program.

Ｗｅｂ上におけるメディアデータへの情報付与技術としては、Ｗｅｂ上の動画閲覧サーバを介して複数のユーザから動画に付与された様々なアノテーションに対し、該ユーザらから良否の判定を統計的に収集しこれに構文的な整合性及び映像特徴量を加味することで、適切なアノテーションを選別して表示する技術が知られている（例えば、非特許文献１参照）。 As a technology for adding information to media data on the Web, it is possible to statistically collect pass / fail judgments from various users for various annotations given to videos from a plurality of users via a video browsing server on the Web. A technique for selecting and displaying an appropriate annotation by adding syntactic consistency and video feature amount to this is known (for example, see Non-Patent Document 1).

また、ネットワーク上のメディアデータから情報を抽出する技術としては、ネットワーク上の特殊ブリッジを操作し、所望の処理をグラフ状に組み合わせることで、ネットワーク上に散在する処理サーバに、任意のストリームを迂回経由させて、様々な処理を適用することが可能な複数のストリームを処理する技術が知られている（例えば、非特許文献２参照）。 In addition, as a technique for extracting information from media data on the network, a special bridge on the network is operated and desired processing is combined in a graph to bypass any stream to processing servers scattered on the network. A technique for processing a plurality of streams to which various processes can be applied via a route is known (for example, see Non-Patent Document 2).

山本大介、長尾確”閲覧者によるオンラインビデオコンテンツへのアノテーションとその応用”,人工知能学会論文誌2005Daisuke Yamamoto, Satoshi Nagao “Annotation and Online Video Content Annotation by Viewers”, Transactions of the Japanese Society for Artificial Intelligence 2005 Satoshi Kondoh, Takaaki Moriya, Hiroyuki Ohnishi, and Miki Hirano, “A method of bridging and processing media stream on network”, IEEE Globecom 2008 CS01M1-5.Satoshi Kondoh, Takaaki Moriya, Hiroyuki Ohnishi, and Miki Hirano, “A method of bridging and processing media stream on network”, IEEE Globecom 2008 CS01M1-5.

しかし、非特許文献１に記載の統計的アノテーション方法では、同閲覧システム内だけを対象とし、かつ信頼性の高いアノテーションを選別することが主目的であり、動画認識部分は確定していないため、そのままでは他の何らかのシステムに対し認識機能を提供することはできなかった。 However, in the statistical annotation method described in Non-Patent Document 1, the main purpose is to select highly reliable annotations that are only targeted within the browsing system, and the video recognition portion has not been determined. As it is, the recognition function could not be provided to some other system.

また、非特許文献２に記載の技術では、実際のメディア認識サーバは様々な用途に特化しており、用途が不明なメディアストリームだけが転送されて来ることを想定する場合にはメディアデータの内容に応じて認識器を振り分ける必要があるが、事前に内容（コンテクスト）が把握できないため、適切な認識器を選択することができないという問題があった。 In the technique described in Non-Patent Document 2, the actual media recognition server is specialized for various uses. When it is assumed that only media streams whose use is unknown are transferred, the contents of the media data However, there is a problem that an appropriate recognizer cannot be selected because the contents (context) cannot be grasped in advance.

本発明は、このような事情を考慮してなされたものであり、その目的は、Ｗｅｂ上に存在する情報から適切な認識器を自動的に生成、及び更新することができ、また、メディアストリーム自体から適切な認識器を選択することができるメタデータ抽出サーバ、メタデータ抽出方法およびプログラムを提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to automatically generate and update an appropriate recognizer from information existing on the Web. An object is to provide a metadata extraction server, a metadata extraction method, and a program capable of selecting an appropriate recognizer from itself.

上記問題を解決するために、本発明は、メディアデータに基づき、メディアデータの内容(コンテクスト)を認識するためメタデータを抽出するメタデータ抽出サーバにおいて、ネットワーク上のサイトからコンテンツを収集する教師情報収集部と、前記教師情報収集部により収集されたコンテンツの内容を表す特徴量行列を算出する特徴量算出部と、前記特徴量算出部により算出された特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出部により算出された特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する特徴量保存部と、前記教師情報収集部により収集されたコンテンツに対して形態素解析を行い、形態素解析された語の発生頻度を表すメタデータ行列を生成するメタデータ整理部と、前記特徴量保存部により決定された部分空間毎に、前記特徴量行列の集合と前記メタデータ整理部により生成されたメタデータ行列の集合との相関係数を算出する相関係数算出部と、前記部分空間毎に前記相関係数算出部により算出された相関係数を記憶する相関係数記憶部とを有することを特徴とするメタデータ抽出サーバである。 In order to solve the above problem, the present invention provides teacher information for collecting content from a site on a network in a metadata extraction server that extracts metadata to recognize the content (context) of media data based on media data. A collection unit; a feature amount calculation unit that calculates a feature amount matrix representing the content of the content collected by the teacher information collection unit; and a feature amount matrix calculated by the feature amount calculation unit is stored at regular time intervals. In this case, clustering is performed on a set of feature amount matrices calculated by the feature amount calculation unit, and a feature amount storage unit that determines a partial space for each set of feature amount matrices, and content collected by the teacher information collection unit A metadata organizing unit that performs a morphological analysis on the image and generates a metadata matrix that represents the frequency of occurrence of the morphologically analyzed word; A correlation coefficient calculating unit that calculates a correlation coefficient between the set of feature matrixes and the set of metadata matrices generated by the metadata organizing unit for each partial space determined by the collection amount storing unit; A metadata extraction server comprising a correlation coefficient storage unit that stores a correlation coefficient calculated by the correlation coefficient calculation unit for each partial space.

また、本発明は、ネットワーク上で送受信されるメディアストリームを取得するメディアストリーム取得部を更に有し、前記特徴量算出部は、前記メディアストリーム取得部により取得されたメディアストリームの内容を表す特徴量行列を算出し、前記特徴量保存部は、前記教師情報収集部により収集されたコンテンツの内容を表す特徴量行列に加えて、前記メディアストリーム取得部により取得されたメディアストリームの内容を表す特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出部により算出された、前記メディアストリームの内容を表す特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する、ことを特徴とするメタデータ抽出サーバである。 The present invention further includes a media stream acquisition unit that acquires a media stream transmitted and received on a network, and the feature amount calculation unit represents a feature amount that represents the content of the media stream acquired by the media stream acquisition unit. The feature amount storage unit calculates a matrix, and in addition to the feature amount matrix representing the content content collected by the teacher information collection unit, the feature amount representing the content of the media stream acquired by the media stream acquisition unit When storing a matrix at regular time intervals, clustering is performed on a set of feature matrices representing the contents of the media stream calculated by the feature calculator, and a subspace is determined for each set of feature matrices. This is a metadata extraction server characterized by that.

また、本発明は、前記特徴量保存部により決定された部分空間に基づいて、前記メディアストリームの内容を表す特徴量行列が含まれる部分空間を特定し、前記相関係数算出部により算出された、前記特定された部分空間に対応する相関係数を適用して前記特徴量ベクトルのメタデータベクトルを算出し、該メタデータベクトルから適切な語として判定される集合を１つのメタデータとして選別する相関適用部と、前記相関適用部により選別された前記メタデータを、該メタデータ自体を送信するか、前記メディアストリームに埋め込んで送信するかを判定する送信設定部と、前記送信判定部により前記メタデータ自体を送信すると判定された場合に、前記メタデータを指定宛先のメタデータ受信端末に送信するメタデータ送出部と、前記送信判定部により前記メタデータを前記メディアストリームに埋め込んで送信すると判定された場合に、前記メディアストリームに前記メタデータを埋め込むメタデータ埋め込み部と、前記メタデータ埋め込み部により前記メタデータが埋め込まれたメディアストリームを指定宛先のメディアストリーム受信端末に送信するメディアストリーム送出部とを有することを特徴とするメタデータ抽出サーバである。 Further, the present invention specifies a subspace including a feature amount matrix representing the content of the media stream based on the subspace determined by the feature amount storage unit, and is calculated by the correlation coefficient calculation unit Applying a correlation coefficient corresponding to the specified subspace to calculate a metadata vector of the feature vector, and selecting a set determined as an appropriate word from the metadata vector as one metadata A correlation setting unit; a transmission setting unit that determines whether the metadata selected by the correlation applying unit is transmitted as the metadata itself or embedded in the media stream; and When it is determined that the metadata itself is to be transmitted, a metadata transmission unit that transmits the metadata to a metadata receiving terminal that is a specified destination; A metadata embedding unit that embeds the metadata in the media stream, and a medium in which the metadata is embedded by the metadata embedding unit when it is determined by the determination unit to embed the metadata in the media stream for transmission; A metadata extraction server comprising: a media stream sending unit that sends a stream to a media stream receiving terminal that is a designated destination.

また、本発明は、前記送信設定部は、前記指定宛先として、前記メディアストリームの送信元、または前記送信元とは別の送信先のいずれかを選択的に決定する、ことを特徴とするメタデータ抽出サーバである。 The transmission setting unit may selectively determine, as the designated destination, either a transmission source of the media stream or a transmission destination different from the transmission source. It is a data extraction server.

また、本発明は、メディアデータに基づき、メディアデータの内容(コンテクスト)を認識するためメタデータを抽出するメタデータ抽出サーバにおけるメタデータ抽出方法において、教師情報収集部が、ネットワーク上のサイトからコンテンツを収集する教師情報収集ステップと、特徴量算出部が、前記教師情報収集ステップで収集されたコンテンツの内容を表す特徴量行列を算出する特徴量算出ステップと、特徴量保存部が、前記特徴量算出ステップで算出された特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出ステップで算出された特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する特徴量保存ステップと、メタデータ整理部が、前記教師情報収集ステップで収集されたコンテンツに対して形態素解析を行い、形態素解析された語の発生頻度を表すメタデータ行列を生成するメタデータ整理ステップと、相関係数算出部が、前記特徴量保存ステップで決定された部分空間毎に、前記特徴量行列の集合と前記メタデータ整理ステップで生成されたメタデータ行列の集合との相関係数を算出する相関係数算出ステップと、相関係数記憶部が、前記部分空間毎に前記相関係数算出ステップで算出された相関係数を記憶する相関係数記憶ステップと、を含むことを特徴とするメタデータ抽出方法である。 Further, the present invention provides a metadata extraction method in a metadata extraction server that extracts metadata for recognizing the content (context) of media data based on the media data. A teacher information collection step for collecting the feature amount, a feature amount calculation unit for calculating a feature amount matrix representing the contents of the content collected in the teacher information collection step, and a feature amount storage unit for the feature amount When storing the feature quantity matrix calculated in the calculation step at regular time intervals, clustering is performed on the feature quantity matrix set calculated in the feature quantity calculation step, and a subspace is determined for each feature quantity matrix set. A feature amount storing step and a metadata organizing unit for the content collected in the teacher information collecting step. A metadata organizing step that performs morphological analysis and generates a metadata matrix that represents the frequency of occurrence of words subjected to morphological analysis, and a correlation coefficient calculation unit, for each subspace determined in the feature amount storing step, A correlation coefficient calculating step for calculating a correlation coefficient between the set of quantity matrices and the set of metadata matrices generated in the metadata organizing step; and a correlation coefficient storage unit for each of the subspaces, the correlation coefficient And a correlation coefficient storage step for storing the correlation coefficient calculated in the calculation step.

また、本発明は、メディアデータに基づき、メディアデータの内容(コンテクスト)を認識するためメタデータを抽出するメタデータ抽出サーバとして動作するコンピュータを、ネットワーク上のサイトからコンテンツを収集する教師情報収集手段と、前記教師情報収集手段により収集されたコンテンツの内容を表す特徴量行列を算出する特徴量算出手段と、前記特徴量算出手段により算出された特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出手段により算出された特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する特徴量保存手段と、前記教師情報収集手段により収集されたコンテンツに対して形態素解析を行い、形態素解析された語の発生頻度を表すメタデータ行列を生成するメタデータ整理手段と、前記特徴量保存手段により決定された部分空間毎に、前記特徴量行列の集合と前記メタデータ整理手段により生成されたメタデータ行列の集合との相関係数を算出する相関係数算出手段と、前記部分空間毎に前記相関係数算出手段により算出された相関係数を記憶する相関係数記憶手段として動作させるためのプログラムである。 Further, the present invention provides a teacher information collecting means for collecting content from a site on a network using a computer operating as a metadata extraction server for extracting metadata for recognizing the content (context) of the media data based on the media data. And a feature amount calculating means for calculating a feature amount matrix representing the contents collected by the teacher information collecting means, and a feature amount matrix calculated by the feature amount calculating means when storing the feature amount matrix at a constant time interval. Clustering with a set of feature quantity matrices calculated by the feature quantity calculation means, and a feature quantity storage means for determining a subspace for each set of feature quantity matrices; and for contents collected by the teacher information collection means To generate a metadata matrix that represents the frequency of occurrence of words that have undergone morphological analysis And a correlation coefficient for calculating a correlation coefficient between the feature matrix set and the metadata matrix set generated by the metadata organizing unit for each partial space determined by the processing unit and the feature storage unit A program for operating as a calculation means and a correlation coefficient storage means for storing the correlation coefficient calculated by the correlation coefficient calculation means for each partial space.

この発明によれば、メタデータ抽出サーバは、ネットワーク上のサイトからコンテンツを収集する教師情報収集部と、前記教師情報収集部により収集されたコンテンツの内容を表す特徴量行列を算出する特徴量算出部と、前記特徴量算出部により算出された特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出部により算出された特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する特徴量保存部と、前記教師情報収集部により収集されたコンテンツに対して形態素解析を行い、形態素解析された語の発生頻度を表すメタデータ行列を生成するメタデータ整理部と、前記特徴量保存部により決定された部分空間毎に、前記特徴量行列の集合と前記メタデータ整理部により生成されたメタデータ行列の集合との相関係数を算出する相関係数算出部と、前記部分空間毎に前記相関係数算出部により算出された相関係数を記憶する相関係数記憶部とを有する。この構成により、Ｗｅｂ上に存在する情報から適切な認識器を自動的に生成、および更新することができる。 According to the present invention, the metadata extraction server includes a teacher information collection unit that collects content from a site on the network, and a feature amount calculation that calculates a feature amount matrix representing the content of the content collected by the teacher information collection unit. And the feature amount matrix calculated by the feature amount calculation unit when the feature amount matrix calculated by the feature amount calculation unit is stored at a predetermined time interval, and the feature amount matrix set is clustered. A metadata storage unit that performs a morpheme analysis on the content collected by the feature information storage unit and the teacher information collection unit that determines a partial space for each time, and generates a metadata matrix that represents the frequency of occurrence of words subjected to the morpheme analysis And a set of feature matrixes and a metadata matrix generated by the metadata organizing unit for each partial space determined by the feature storage unit It has a correlation coefficient calculation unit for calculating a correlation coefficient between case and a correlation coefficient storage unit for storing a correlation coefficient calculated by the correlation coefficient calculation unit for each of the subspace. With this configuration, an appropriate recognizer can be automatically generated and updated from information existing on the Web.

また、この発明によれば、ネットワーク上で送受信されるメディアストリームを取得するメディアストリーム取得部を更に有し、前記特徴量算出部は、前記メディアストリーム取得部により取得されたメディアストリームの内容を表す特徴量行列を算出し、前記特徴量保存部は、前記教師情報収集部により収集されたコンテンツの内容を表す特徴量行列に加えて、前記メディアストリーム取得部により取得されたメディアストリームの内容を表す特徴量行列を一定の時間間隔で保存する際に、前記特徴量算出部により算出された、前記メディアストリームの内容を表す特徴量行列の集合でクラスタリングし、前記特徴量行列の集合毎に部分空間を決定する。この構成により、メディアストリーム自体から適切な認識器を選択することができる。 In addition, according to the present invention, the image processing apparatus further includes a media stream acquisition unit that acquires a media stream transmitted and received on the network, and the feature amount calculation unit represents the content of the media stream acquired by the media stream acquisition unit. A feature amount matrix is calculated, and the feature amount storage unit represents the content of the media stream acquired by the media stream acquisition unit in addition to the feature amount matrix that represents the content of the content collected by the teacher information collection unit. When the feature amount matrix is stored at regular time intervals, clustering is performed on a set of feature amount matrices representing the contents of the media stream calculated by the feature amount calculation unit, and a subspace is set for each set of feature amount matrices. To decide. With this configuration, an appropriate recognizer can be selected from the media stream itself.

本発明の一実施形態におけるＷｅｂ連携メタデータ抽出システム全体の構成を示すブロック図である。It is a block diagram which shows the structure of the whole web cooperation metadata extraction system in one Embodiment of this invention. 本実施形態におけるメタデータ抽出サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the metadata extraction server in this embodiment. 本実施形態のメタデータ抽出サーバによる、メディアストリームからメタデータを抽出する手順を示したシーケンス図である。It is the sequence diagram which showed the procedure which extracts metadata from a media stream by the metadata extraction server of this embodiment. 本実施形態においてメタデータ変換子と相関係数とを取得する手順を示したシーケンス図である。It is the sequence diagram which showed the procedure which acquires a metadata converter and a correlation coefficient in this embodiment. 本発明のメタデータ抽出サーバを検索サービスに応用した場合の構成および動作を示した概念図である。It is the conceptual diagram which showed the structure and operation | movement at the time of applying the metadata extraction server of this invention to search service. 本発明のメタデータ抽出サーバを、ブリッジと連携させてメタデータ抽出ブリッジとして構成した場合の構成および動作を示した概念図である。It is the conceptual diagram which showed the structure and operation | movement at the time of comprising the metadata extraction server of this invention in cooperation with a bridge as a metadata extraction bridge. 本発明のメタデータ抽出サーバを、ネットワークによるストリームバックアップサービスに用いた場合の構成および動作を示した概念図である。It is the conceptual diagram which showed the structure and operation | movement at the time of using the metadata extraction server of this invention for the stream backup service by a network.

（第１の実施形態）
以下、本発明の一実施形態について、図面を参照して説明する。図１は、本実施形態によるＷｅｂ連携メタデータ抽出システム全体の構成を示すブロック図である。図において、Ｗｅｂ連携メタデータ抽出サーバ１０１（以降、メタデータ抽出サーバと記す）は、通信可能なネットワーク１００を介して、ユーザ端末１０１−１，１０１−２と接続している。メタデータ抽出サーバ１−１は、ユーザが用いるユーザ端末１０２−１から送信されるメディアストリームを受信する。メタデータ抽出サーバ１０１は、受信したメディアストリーム１０３から、内容を記述する情報を抽出し、別のデータストリーム１０４を生成し、それを予め別途指定された宛先であるユーザ端末１０２−２に対して送信する。 (First embodiment)
Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the entire Web collaboration metadata extraction system according to this embodiment. In the figure, a Web cooperation metadata extraction server 101 (hereinafter referred to as a metadata extraction server) is connected to user terminals 101-1 and 101-2 via a communicable network 100. The metadata extraction server 1-1 receives a media stream transmitted from the user terminal 102-1 used by the user. The metadata extraction server 101 extracts information describing the contents from the received media stream 103, generates another data stream 104, and sends it to the user terminal 102-2, which is a destination specified separately in advance. Send.

次に、図２を参照して、メタデータ抽出サーバ１０１の構成について説明する。図２は、本実施形態におけるメタデータ抽出サーバ１０１の構成を示すブロック図である。メタデータ抽出サーバ１０１は、メディアストリーム取得部２０８と、特徴量算出部２０４と、特徴量ＤＢ２０１と、メタデータＤＢ２０２と、メタデータ整理部２０３と、相関係数ＤＢ２０５と、相関係数算出部２１４と、相関係数適用部２０７と、送信設定部２３０と、メタデータ送出部２３１と、メタデータ埋め込み部２３２と、メディアストリーム送出部２３３と、教師情報収集部２３４とを備えている。 Next, the configuration of the metadata extraction server 101 will be described with reference to FIG. FIG. 2 is a block diagram showing the configuration of the metadata extraction server 101 in this embodiment. The metadata extraction server 101 includes a media stream acquisition unit 208, a feature amount calculation unit 204, a feature amount DB 201, a metadata DB 202, a metadata organization unit 203, a correlation coefficient DB 205, and a correlation coefficient calculation unit 214. A correlation coefficient application unit 207, a transmission setting unit 230, a metadata transmission unit 231, a metadata embedding unit 232, a media stream transmission unit 233, and a teacher information collection unit 234.

メディアストリーム取得部２０８は、ネットワーク１００上のメディアストリーム送信端末２１０から送信されたメディアストリームを取得し、特徴量算出部２０４と、メタデータ埋め込み部２３２とに入力する。特徴量算出部２０４は、入力されたメディアストリームから特徴量ベクトルを算出し、算出した特徴量ベクトルを、相関係数適用部２０７に入力する。特徴量ＤＢ２０１は、算出された特徴量ベクトルを、一定の時間間隔で蓄積する。 The media stream acquisition unit 208 acquires a media stream transmitted from the media stream transmission terminal 210 on the network 100 and inputs the media stream to the feature amount calculation unit 204 and the metadata embedding unit 232. The feature amount calculation unit 204 calculates a feature amount vector from the input media stream, and inputs the calculated feature amount vector to the correlation coefficient application unit 207. The feature value DB 201 stores the calculated feature value vectors at regular time intervals.

教師情報収集部２３４は、Ｗｅｂサイト２４０（動画投稿サイト２４０−１、プログサイト２４０−２）にアクセスし、任意のコンテンツを取得し、メディアデータとして特徴量算出部２０４に入力する。また、教師情報収集部２３４は、メディアデータをメタデータ整理部２０３に入力する。メタデータ整理部２０３は、入力されたメディアデータを形態素解析する。相関係数算出部２１４は、特徴量の平均化に用いる時間幅Δｔで、メタデータ整理部２０３が形態素解析した語の発生頻度を表すメタデータベクトルを作成する。メタデータＤＢ２０２は、相関係数算出部２１４が作成したメタデータベクトルを記憶する。 The teacher information collection unit 234 accesses the Web site 240 (the moving image posting site 240-1 and the blog site 240-2), acquires arbitrary content, and inputs it to the feature amount calculation unit 204 as media data. In addition, the teacher information collection unit 234 inputs media data to the metadata organization unit 203. The metadata organizing unit 203 performs morphological analysis on the input media data. The correlation coefficient calculation unit 214 creates a metadata vector representing the frequency of occurrence of words morphologically analyzed by the metadata organizing unit 203 with a time width Δt used for averaging feature quantities. The metadata DB 202 stores the metadata vector created by the correlation coefficient calculation unit 214.

また、相関係数算出部２１４は、特徴量ＤＢ２０１が記憶する特徴量ベクトルを、時間幅△ｔで平均化して平均化特徴量ベクトルを算出するとともに、特徴量の変化から、シーンの区切れ目も検出し、上記特徴量ベクトルをメディアデータのフレーム分算出し、ここで得られた特徴量ベクトルを、相関係数ＤＢ２０５に入力する。相関係数ＤＢ２０５は、部分空間ＩＤと対応するメタデータ変換子Ｒ、および相関係数Ｗを記憶する。 In addition, the correlation coefficient calculation unit 214 calculates the average feature vector by averaging the feature vectors stored in the feature DB 201 with the time width Δt, and also detects scene breaks based on changes in the feature. Then, the feature vector is calculated for the media data frame, and the obtained feature vector is input to the correlation coefficient DB 205. The correlation coefficient DB 205 stores the metadata converter R corresponding to the partial space ID and the correlation coefficient W.

相関係数適用部２０７は、相関係数ＤＢ２０５から、部分空間ＩＤと対応するメタデータ変換子Ｒ、および相関係数Ｗとを読み出し、特徴量ベクトルがどの部分空間に入るかを特定する。また、相関係数適用部２０７は、特徴量ベクトルがどの部分空間に入るかを特定した後、その部分空間に対応する相関係数Ｗを、この特徴量ベクトルに適用してメタデータベクトルを算出する。また、相関係数適用部２０７は、算出したメタデータベクトルから適切な語として判定される集合を選別して、１つのメタデータとし、送信設定部２３０に入力する。 The correlation coefficient application unit 207 reads the metadata converter R corresponding to the partial space ID, the correlation coefficient W, and the correlation coefficient W from the correlation coefficient DB 205, and identifies which partial space the feature quantity vector belongs to. In addition, correlation coefficient application section 207 specifies which partial space the feature quantity vector enters, and then applies correlation coefficient W corresponding to the partial space to the feature quantity vector to calculate a metadata vector. To do. Further, the correlation coefficient application unit 207 selects a set determined as an appropriate word from the calculated metadata vector, and inputs it to the transmission setting unit 230 as one metadata.

メディアストリームにメタデータを埋め込んで出力する場合、送信設定部２３０は、メタデータをメディアストリーム埋め込み部２３２に入力する。メディアストリーム埋め込み部２３２は、メディアストリーム取得部２０８から入力されたメディアストリームに、送信設定部２３０から入力されたメタデータを埋め込み、メディアストリーム送出部２３３に入力する。メディアストリーム送出部２３３は、メタデータが埋め込まれたメディアストリームを、指定宛先のメディアストリーム受信端末２５０に送信する。 When the metadata is embedded in the media stream for output, the transmission setting unit 230 inputs the metadata to the media stream embedding unit 232. The media stream embedding unit 232 embeds the metadata input from the transmission setting unit 230 in the media stream input from the media stream acquisition unit 208 and inputs the metadata to the media stream transmission unit 233. The media stream sending unit 233 sends the media stream in which the metadata is embedded to the designated destination media stream receiving terminal 250.

一方、メタデータを単独で出力する場合には、送信設定部２３０は、メタデータをメタデータ送出部２３１に入力する。メタデータ送出部２３１は、送信設定部２３０から入力されたメタデータに、パケットヘッダを付加して、指定宛先のメタデータ受信端末２５１に送信する。 On the other hand, when outputting metadata alone, the transmission setting unit 230 inputs the metadata to the metadata sending unit 231. The metadata transmission unit 231 adds a packet header to the metadata input from the transmission setting unit 230 and transmits the metadata to the metadata reception terminal 251 that is the designated destination.

このように、送信設定部２３０は、メタデータをメディアストリームに埋め込んで送信するか、メタデータを単体で出力するかを設定することができる。また、送信先を指定することができる。例えば、メディアストリームを送信したユーザ端末に、メタデータを埋め込んだメディアストリームまたはメタデータを送信するか、別のユーザ端末に、メタデータを埋め込んだメディアストリームまたはメタデータを送信するかを指定することができる。 As described above, the transmission setting unit 230 can set whether to embed metadata in a media stream for transmission or to output metadata alone. In addition, a transmission destination can be designated. For example, specifying whether to send a media stream or metadata with embedded metadata to the user terminal that sent the media stream, or to send a media stream or metadata with embedded metadata to another user terminal Can do.

なお、メタデータは、例えば、ＭＰＥＧ−７などで規定されているＸＭＬ（eXtensible Markup Language）形式のデータである。 The metadata is, for example, data in XML (eXtensible Markup Language) format defined by MPEG-7 or the like.

メディアストリーム受信端末２５０は、メタデータが埋め込まれたメディアストリームを受信することができる端末である。メタデータ受信端末２５１は、メタデータを受信することができる端末である。 The media stream receiving terminal 250 is a terminal that can receive a media stream in which metadata is embedded. The metadata receiving terminal 251 is a terminal that can receive metadata.

Ｗｅｂサイト２４０は、例えば、動画投稿サイト２４０−１や、ブログサイト２４０−２である。動画投稿サイト２４０−１は、動画コンテンツを提供するサイトである。ブログサイト２４０−２は、テキストコンテンツを提供するサイトである。 The web site 240 is, for example, a video posting site 240-1 or a blog site 240-2. The video posting site 240-1 is a site that provides video content. The blog site 240-2 is a site that provides text content.

次に、本実施形態のメタデータ抽出サーバの動作について図３を参照して説明する。図３は、本実施形態のメタデータ抽出サーバ１０１による、メディアストリームからメタデータを抽出する際の手順を示したシーケンス図である。 Next, the operation of the metadata extraction server of this embodiment will be described with reference to FIG. FIG. 3 is a sequence diagram showing a procedure for extracting metadata from a media stream by the metadata extraction server 101 of this embodiment.

メディアストリーム取得部２０８は、まず、送信設定部２３０に送信設定を行った後（ステップＳ１）、メディアストリーム送信端末２１０から送信されたメディアストリームを受信し、特徴量算出部２０４に入力する（ステップＳ２）。特徴量算出部２０４は、入力されたメディアストリームをメタデータ埋め込み部２３２に入力する（ステップＳ３）。また、特徴量算出部２０４は、入力されたメディアストリームから特徴量ベクトルを算出し、算出した特徴量ベクトルを、相関係数適用部２０７に入力する（ステップＳ４）。 The media stream acquisition unit 208 first performs transmission setting in the transmission setting unit 230 (step S1), and then receives the media stream transmitted from the media stream transmission terminal 210 and inputs it to the feature amount calculation unit 204 (step S1). S2). The feature amount calculation unit 204 inputs the input media stream to the metadata embedding unit 232 (step S3). Also, the feature amount calculation unit 204 calculates a feature amount vector from the input media stream, and inputs the calculated feature amount vector to the correlation coefficient application unit 207 (step S4).

ここで、上記メディアストリームとは、ＲＴＰ（Real-time Transport Protocol）、ＲＴＳＰ（Real Time Streaming Protocol）、ＲＴＭＰ（Real Time Message Protocol）などのメディア対応プロトコルで通信されるデータのことを指すものとする。また、算出される特徴量ベクトルは、事前にメタデータ抽出サーバ１０１内で決められたものを利用するものとする。この際、特徴量ＤＢ２０１は、算出された特徴量ベクトルを、一定の時間間隔で蓄積する（ステップＳ５）。 Here, the media stream refers to data communicated by a media compatible protocol such as RTP (Real-time Transport Protocol), RTSP (Real Time Streaming Protocol), RTMP (Real Time Message Protocol). . In addition, it is assumed that a feature amount vector calculated is determined in advance in the metadata extraction server 101. At this time, the feature quantity DB 201 accumulates the calculated feature quantity vectors at regular time intervals (step S5).

相関係数適用部２０７は、相関係数ＤＢ２０５から、部分空間ＩＤと対応するメタデータ変換子Ｒ、および相関係数Ｗを取得する（ステップＳ６、Ｓ７）。また、相関係数適用部２０７は、特徴量算出部２０４が算出した特徴量ベクトルがどの部分空間に入るかを特定する。相関係数ＤＢ２０５が記憶しているメタデータ変換子Ｒと相関係数Ｗについては後述する。 The correlation coefficient application unit 207 acquires the metadata converter R corresponding to the partial space ID and the correlation coefficient W from the correlation coefficient DB 205 (steps S6 and S7). In addition, the correlation coefficient application unit 207 specifies which partial space the feature amount vector calculated by the feature amount calculation unit 204 enters. The metadata converter R and the correlation coefficient W stored in the correlation coefficient DB 205 will be described later.

続いて、相関係数適用部２０７は、特徴量算出部２０４が算出した特徴量ベクトルがどの部分空間に入るかを特定した後、特定した部分空間に対応する相関係数Ｗを、この特徴量ベクトルに適用してメタデータベクトルを算出する。続いて、相関係数適用部２０７は、算出したメタデータベクトルから適切な語として判定される集合を選別して、１つのメタデータを生成し、生成したメタデータを送信設定部２３０に入力する（ステップＳ８）。相関係数適用部２０７が生成するメタデータは、先述したとおりＭＰＥＧ−７などで規定されているＸＭＬ形式のデータである。 Subsequently, after the correlation coefficient application unit 207 identifies which partial space the feature quantity vector calculated by the feature quantity calculation unit 204 enters, the correlation coefficient W corresponding to the identified partial space is determined as the feature quantity. Apply to the vector to calculate the metadata vector. Subsequently, the correlation coefficient application unit 207 selects a set determined as an appropriate word from the calculated metadata vector, generates one metadata, and inputs the generated metadata to the transmission setting unit 230. (Step S8). The metadata generated by the correlation coefficient application unit 207 is XML format data defined by MPEG-7 or the like as described above.

送信設定部２３０は、パケットヘッダを作成して、メタデータ自体を送信するか、元のメディアストリームに埋め込んで送信するかを判定する。送信設定部２３０は、先述したとおり、出力結果の形態と、送信先の形態とをそれぞれ２通りの形態から選択する。例えば、出力結果の形態は、メタデータをメディアストリームに埋め込んで出力する形態、または、メタデータを単独で出力する形態の２通りの形態から１つの形態を選択する。また、送信先の形態は、メディアストリームを送信したユーザ端末に、メタデータを埋め込んだメディアストリームまたはメタデータを送信する形態、または、メディアストリームを送信したユーザ端末とは別のユーザ端末に、メタデータを埋め込んだメディアストリームまたはメタデータを送信する形態の２通りの形態から１つの形態を選択する。 The transmission setting unit 230 creates a packet header and determines whether to transmit the metadata itself or embed it in the original media stream. As described above, the transmission setting unit 230 selects the output result form and the transmission destination form from two forms. For example, as a form of output results, one form is selected from two forms: a form in which metadata is embedded in a media stream and a form in which metadata is output alone. Also, the form of the transmission destination is the form of transmitting the media stream or metadata in which the metadata is embedded to the user terminal that has transmitted the media stream, or the user terminal that is different from the user terminal that has transmitted the media stream. One form is selected from two forms of transmitting a media stream or metadata in which data is embedded.

送信設定部２３０が、メタデータをメディアストリームに埋め込んで出力する形態を選択した場合、送信設定部２３０は、メタデータをメディアストリーム埋め込み部２３２に入力する（ステップＳ９）。メディアストリーム埋め込み部２３２は、ステップＳ３でメディアストリーム取得部２０８から入力されたメディアストリームに、ステップＳ９で送信選択部２３０から入力されたメタデータを埋め込み、メディアストリーム送出部２３３に入力する（ステップＳ１０）。 When the transmission setting unit 230 selects a form in which the metadata is embedded in the media stream for output, the transmission setting unit 230 inputs the metadata to the media stream embedding unit 232 (step S9). The media stream embedding unit 232 embeds the metadata input from the transmission selection unit 230 in step S9 into the media stream input from the media stream acquisition unit 208 in step S3, and inputs it to the media stream transmission unit 233 (step S10). ).

続いて、メディアストリーム送出部２３３は、送信設定部２３０が選択した送信先（メディアストリーム受信端末２５０）に送信する（ステップＳ１１）。なお、メディアストリーム埋め込み部２３２は、メディアストリームのＲＴＰの動的ペイロードタイプに利用されていない番号を設定し、対応するペイロードにメタデータを挿入する。 Subsequently, the media stream transmission unit 233 transmits the transmission destination (media stream reception terminal 250) selected by the transmission setting unit 230 (step S11). Note that the media stream embedding unit 232 sets a number that is not used for the RTP dynamic payload type of the media stream, and inserts metadata into the corresponding payload.

一方、送信設定部２３０が、メタデータを単独で出力する形態を選択した場合、送信設定部２３０は、メタデータをメタデータ送出部２３１に入力する（ステップＳ１２）。メタデータ送出部２３１は、入力されたメタデータにパケットヘッダを付し、送信設定部２３０が設定した送信先（メタデータ受信端末２５１）に送信する（ステップＳ１３）。 On the other hand, when the transmission setting unit 230 selects a mode for outputting metadata alone, the transmission setting unit 230 inputs the metadata to the metadata transmission unit 231 (step S12). The metadata transmission unit 231 attaches a packet header to the input metadata and transmits the packet header to the transmission destination (metadata reception terminal 251) set by the transmission setting unit 230 (step S13).

次に、先述した、相関係数ＤＢ２０５が記憶しているメタデータ変換子Ｒと相関係数Ｗとの取得方法について図４を参照して説明する。図４は、メタデータ変換子Ｒと相関係数Ｗとの取得手順を示したシーケンス図である。 Next, a method for obtaining the metadata converter R and the correlation coefficient W stored in the correlation coefficient DB 205 will be described with reference to FIG. FIG. 4 is a sequence diagram illustrating a procedure for obtaining the metadata converter R and the correlation coefficient W.

教師情報収集部２３４は、Ｗｅｂサイト２４０にアクセスし、任意のコンテンツを取得する（ステップＳ２１）。教師情報取得部２３４は、取得したコンテンツをメディアデータとして特徴量算出部２０４に入力する（ステップＳ２２）。特徴量算出部２０４は、入力されたメディアデータから特徴量ベクトルを算出し、算出した特徴量ベクトルを特徴量ＤＢ２０１に入力する（ステップＳ２３）。特徴量ベクトルは、入力されたメディアデータの特徴量を、ある時間Δｔで平均化し、（１）式を用いて算出される。ここで、ｋは、量子化番号であり、ｍは、特徴量次元数である。 The teacher information collection unit 234 accesses the Web site 240 and acquires arbitrary content (step S21). The teacher information acquisition unit 234 inputs the acquired content as media data to the feature amount calculation unit 204 (step S22). The feature amount calculation unit 204 calculates a feature amount vector from the input media data, and inputs the calculated feature amount vector to the feature amount DB 201 (step S23). The feature quantity vector is calculated using the formula (1) by averaging the feature quantities of the input media data over a certain time Δt. Here, k is a quantization number, and m is the number of feature dimensions.

なお、特徴量算出部２０４は、入力されたメディアデータの特徴量の変化に基づいて、メディアデータのシーンの区切れ目を検出する。また、特徴量算出部２０４は、特徴量ベクトルをメディアデータのフレーム毎に算出し、算出した特徴量ベクトルを特徴量ＤＢ２０１に入力する。 Note that the feature amount calculation unit 204 detects a scene break in the media data based on a change in the feature amount of the input media data. The feature amount calculation unit 204 calculates a feature amount vector for each frame of media data, and inputs the calculated feature amount vector to the feature amount DB 201.

特徴量ＤＢ２０１は、入力された特徴量ベクトルが、過去のクラスタに分類可能な特徴量を持つ場合には、入力された特徴量ベクトルを記憶する。また、特徴量ＤＢ２０１は、入力された特徴量ベクトルが、過去のクラスタに分類可能な特徴量を持っていない場合には、入力された特徴量ベクトルを破棄する。また、特徴量ＤＢ２０１は、入力された特徴量ベクトルを保存したか破棄したかを示す情報を、教師情報収集部２３４に入力する（ステップＳ２４）。 The feature DB 201 stores the input feature vector when the input feature vector has a feature that can be classified into a past cluster. Further, the feature amount DB 201 discards the input feature amount vector when the input feature amount vector does not have a feature amount that can be classified into a past cluster. Further, the feature amount DB 201 inputs information indicating whether the input feature amount vector is stored or discarded to the teacher information collecting unit 234 (step S24).

また、特徴量ＤＢ２０１は、一定の時間間隔で、自身が記憶している特徴量ベクトルのクラスタリングを行う。特徴量ＤＢ２０１は、クラスタリングをＸ−ｍｅａｎｓ法などの教師なし学習による分類法によって行う。また、特徴量ＤＢ２０１は、クラスタリングによって得た部分空間をｋｄ木などの木構造により整理し、各部分空間には部分空間ＩＤを振る。また、特徴量ＤＢ２０１は、クラスタリングによって得た部分空間の特徴量ベクトルの集合を［ｖ_０，・・・，ｖ_ｎ−１］とし、この特徴量ベクトルの集合に対して主成分分析などを行って次元数を削減し、［ｖ’_０，・・・，ｖ’_ｎ−１］を作成する。 Further, the feature amount DB 201 performs clustering of feature amount vectors stored in the feature amount DB 201 at regular time intervals. The feature amount DB 201 performs clustering by a classification method based on unsupervised learning such as an X-means method. The feature DB 201 organizes the partial spaces obtained by clustering using a tree structure such as a kd tree, and assigns a partial space ID to each partial space. In addition, the feature value DB 201 sets a set of feature vectors in the subspace obtained by clustering as [v ₀ ,..., V _n−1 ], and performs principal component analysis on the set of feature vectors. The number of dimensions is reduced to create [v ′ ₀ ,..., V ′ _n−1 ].

教師情報収集部２３４は、特徴量ＤＢ２０１が特徴量ベクトルを保存したか破棄したかを示す情報に基づいて、特徴量ＤＢ２０１が特徴量ベクトルを保存した場合のみ、メディアデータのテキストデータをメタデータ整理部２０３に入力する（ステップＳ２５）。メタデータ整理部２０３は、テキストデータの形態素解析を行い、形態素解析された語から、特徴量ベクトルの平均化に用いる時間幅Δｔ内での語の発生頻度を表すメタデータベクトルｙｉを（２）式に従って算出する。また、メタデータ整理部２０３は、算出したメタデータベクトルをメタデータＤＢ２０２に入力する（ステップＳ２６）。 The teacher information collection unit 234 organizes the text data of the media data as metadata only when the feature value DB 201 stores the feature value vector based on information indicating whether the feature value DB 201 stores or discards the feature value vector. Input to the unit 203 (step S25). The metadata organizing unit 203 performs morphological analysis of text data, and generates a metadata vector yi representing the occurrence frequency of words within the time width Δt used for averaging feature quantity vectors from the words subjected to morphological analysis (2). Calculate according to the formula. Further, the metadata organizing unit 203 inputs the calculated metadata vector to the metadata DB 202 (step S26).

ここで、ｉは、語の種類数である。語の発生頻度の値は、発生時間における回数を１倍として、そこからガウス分布に広がるものとする。また、分布は、前述のシーンの区切れ目で、それ以上は拡散しないものとする。 Here, i is the number of types of words. The value of the word occurrence frequency is assumed to be a Gaussian distribution with the number of occurrence times being multiplied by 1. In addition, the distribution is assumed to be the above-mentioned scene break and no further diffusion.

また、メタデータＤＢ２０２は、メタデータベクトルの集合を［ｙ_０，・・・，ｙ_ｎ−１］とし、このメタデータベクトルの集合における主成分分析を行って次元数を削減し、［ｙ’_０，・・・，ｙ’_ｎ−１］を作成する。 Further, the metadata DB 202 sets a set of metadata vectors as [y ₀ ,..., Y _n-1 ], and performs a principal component analysis on the set of metadata vectors to reduce the number of dimensions. ₀ ,..., Y ′ _n−1 ] are created.

相関係数算出部２１４は、特徴量ＤＢ２０１とメタデータＤＢ２０２とから、部分空間ＩＤと、特徴量ベクトルの集合と、メタデータベクトルの集合とを読み出す（ステップＳ２７〜ステップＳ３０）。また、相関係数算出部２１４は、特徴量ベクトルの集合と、メタデータベクトルの集合とから、変換行列Ｒ（メタデータ変換子Ｒ）を算出する。また、相関係数算出部２１４は、平均特徴量ベクトルｖ’を入力、メタデータベクトルｙ’を出力としたモデルで、（３）式を用いてロジスティック回帰分析などの教師あり学習を行う。但し、ロジスティック回帰の場合は、ｆ（ｘ）＝ａ／（１−ｅｘｐ（−ｘ））とし、ａは、メタデータベクトルの要素の最大値とする。 The correlation coefficient calculation unit 214 reads the partial space ID, the set of feature vectors, and the set of metadata vectors from the feature DB 201 and the metadata DB 202 (Steps S27 to S30). Further, the correlation coefficient calculation unit 214 calculates a transformation matrix R (metadata transformer R) from the set of feature vectors and the set of metadata vectors. In addition, the correlation coefficient calculation unit 214 performs supervised learning such as logistic regression analysis using the equation (3) in a model in which the average feature vector v ′ is input and the metadata vector y ′ is output. However, in the case of logistic regression, f (x) = a / (1-exp (−x)) is set, and a is the maximum value of the elements of the metadata vector.

相関係数算出部２１４は、回帰係数の行列を相関係数Ｗとして算出する。続いて、相関係数算出部２１４は、メタデータ変換子Ｒと、相関係数Ｗと、部分空間ＩＤとを相関係数ＤＢ２０５に入力する。 The correlation coefficient calculation unit 214 calculates a regression coefficient matrix as the correlation coefficient W. Subsequently, the correlation coefficient calculation unit 214 inputs the metadata converter R, the correlation coefficient W, and the partial space ID into the correlation coefficient DB 205.

相関係数ＤＢ２０５は、入力されたメタデータ変換子Ｒと、相関係数Ｗと、部分空間ＩＤとを関連付けて記憶する。 The correlation coefficient DB 205 stores the input metadata converter R, the correlation coefficient W, and the partial space ID in association with each other.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第２の実施形態は、第１の実施形態で説明したメタデータ抽出サーバ１０１を、検索サービスに応用した例である。図５は、メタデータ抽出サーバ１０１を検索サービスに応用した際の構成および動作を示した概念図である。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. The second embodiment is an example in which the metadata extraction server 101 described in the first embodiment is applied to a search service. FIG. 5 is a conceptual diagram showing the configuration and operation when the metadata extraction server 101 is applied to a search service.

ユーザは、様々な場所においてユーザ端末１０２−３を用いて撮影した映像ストリームを、メタデータ抽出サーバ１０１に送信する（ステップＳ４１）。メタデータ抽出サーバ１０１は、受信した映像ストリームからリアルタイムにメタデータを抽出し、抽出したメタデータを受信した映像ストリームに埋め込んだデータをデータストリームとして、ユーザ端末１０２−３に送信する（ステップＳ４２）。さらに、メタデータ抽出サーバ１０１は、抽出したメタデータを検索ＤＢ２６０に送信する。検索ＤＢは、送信されたメタデータに基づいて情報を検索し、検索結果をメタデータ抽出サーバに送信する。メタデータ抽出サーバ１０１は、受信した検索結果を、ユーザ端末１０２−３に送信する（ステップＳ４３）。 The user transmits a video stream shot using the user terminal 102-3 in various places to the metadata extraction server 101 (step S41). The metadata extraction server 101 extracts metadata in real time from the received video stream, and transmits the data embedded in the received video stream as a data stream to the user terminal 102-3 (step S42). . Further, the metadata extraction server 101 transmits the extracted metadata to the search DB 260. The search DB searches for information based on the transmitted metadata, and transmits the search result to the metadata extraction server. The metadata extraction server 101 transmits the received search result to the user terminal 102-3 (step S43).

これにより、ユーザは、携帯電話などのカメラを備えたユーザ端末１０２−３で周囲を撮影するだけで、撮影した映像に関係するＷｅｂ上のキーワードを取得し、検索することができる。これにより、検索入力数が削減できるだけでなく、明確な検索キーワードが思いつかない場合でも、検索キーワードをユーザに推薦することが可能となる。 Thus, the user can acquire and search keywords on the Web related to the captured video simply by capturing the surroundings with the user terminal 102-3 equipped with a camera such as a mobile phone. This not only reduces the number of search inputs, but also makes it possible to recommend search keywords to the user even when a clear search keyword cannot be conceived.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。図６は、本発明のメタデータ抽出サーバを、ブリッジと連携させてメタデータ抽出ブリッジとして構成した場合の構成および動作を示した概念図である。なお、第３の実施形態におけるメタデータ抽出サーバ１０１は、第１の実施形態で説明したメタデータ抽出サーバ１０１と同じ構成である。 (Third embodiment)
Next, a third embodiment of the present invention will be described. FIG. 6 is a conceptual diagram showing the configuration and operation when the metadata extraction server of the present invention is configured as a metadata extraction bridge in cooperation with a bridge. Note that the metadata extraction server 101 in the third embodiment has the same configuration as the metadata extraction server 101 described in the first embodiment.

ここでは、メタデータ抽出サーバ１０１のストリーム返信先を、ストリーム送信元アドレスに設定しておき、返信ストリームは、メタデータ埋め込みメディアストリームに設定する。また、ブリッジとして、コントロールブリッジ（Ｃｏｎｔｒｏｌｂｒｉｄｇｅ：非特許文献１を参照）を用いるものとし、ネットワーク１００上にこのコントロールブリッジ２７０を１つまたは多数配備するものとする。なお、図示する例では、コントロールブリッジを１つのみ配備した例を示している。 Here, the stream reply destination of the metadata extraction server 101 is set to the stream transmission source address, and the reply stream is set to the metadata embedded media stream. In addition, a control bridge (refer to Non-Patent Document 1) is used as a bridge, and one or many control bridges 270 are provided on the network 100. In the illustrated example, only one control bridge is provided.

また、街頭などに設置されたライブカメラ２８０による映像や、ユーザ端末１０２−４に備えられたテレビ電話機能などからの映像は、ネットワーク１００を介して、終端のユーザ端末１０２−５に送信されているものとする。 Also, the video from the live camera 280 installed on the street or the video from the video phone function provided in the user terminal 102-4 is transmitted to the terminal user terminal 102-5 via the network 100. It shall be.

コントロールブリッジ２７０は、フォーワーディング（Ｆｏｗａｒｄｉｎｇ）機能を利用して、特定のメディアストリームを捕捉し、メディアストリームを構成するパケットヘッダを書き換えて、メタデータ抽出サーバ１０１に該メディアストリームを転送する（ステップＳ５１）。この場合、特定のメディアストリームとしては、街頭などに設置されたライブカメラ２８０や、ユーザ端末１０２−４に備えられたテレビ電話機能などからの映像を想定している。つまり、終端のユーザ端末１０２−５は、上記ライブカメラ２８０の映像の閲覧や、ユーザ端末１０２−４と通信（テレビ電話による通話）を行っている。 The control bridge 270 captures a specific media stream by using a forwarding function, rewrites a packet header constituting the media stream, and transfers the media stream to the metadata extraction server 101 (step). S51). In this case, the specific media stream is assumed to be a video from a live camera 280 installed on the street or the video phone function provided in the user terminal 102-4. That is, the user terminal 102-5 at the end is browsing the video of the live camera 280 and communicating with the user terminal 102-4 (telephone call).

次に、メディアストリームを受信したメタデータ抽出サーバ１０１は、メディアストリームにメタデータを埋め込み（ステップＳ５２）、該メタデータ埋め込みメディアストリームをコントロールブリッジ２７０に返信する（ステップＳ５３）。最後に、コントロールブリッジ２７０は、返信されたストリームのパケットヘッダを元のものに戻し、再度、ネットワーク１００上に流す（ステップＳ５４）。これにより、終端のユーザ端末１０２−５では、複雑な処理を行わなくても、ネットワーク１００で通信をしている間に、メディアストリームに自動的にメタデータが付与されるので、処理性能が低い携帯端末などから発信されたストリームであっても、メディアストリームに適合するキーワードを提供することができる。 Next, the metadata extraction server 101 that has received the media stream embeds metadata in the media stream (step S52), and returns the metadata-embedded media stream to the control bridge 270 (step S53). Finally, the control bridge 270 restores the packet header of the returned stream to the original one and sends it again on the network 100 (step S54). As a result, the user terminal 102-5 at the end has low processing performance because metadata is automatically added to the media stream during communication over the network 100 without performing complicated processing. Even for a stream transmitted from a mobile terminal or the like, a keyword suitable for the media stream can be provided.

また、メタデータ抽出サーバ１０１から、監視者のユーザ端末１０２−６に予めメタデータのペイロードタイプを通知しておくことで（ステップＳ５５）、ユーザ端末１０２−６から、監視キーワードをプロキシ／ＰＢＸ２９０に送信し（ステップＳ５６）、メタデータ埋め込みメディアストリームから該当する検知情報を取得することで（ステップＳ５７）、ネットワーク１００上でストリームを監視することが可能となる。 Also, by notifying the user terminal 102-6 of the supervisor in advance of the metadata payload type from the metadata extraction server 101 (step S55), the monitoring keyword is sent from the user terminal 102-6 to the proxy / PBX 290. By transmitting (step S56) and acquiring corresponding detection information from the metadata embedded media stream (step S57), the stream can be monitored on the network 100.

（第４の実施形態）
次に、本発明の第４の実施形態について説明する。図７は、本発明のメタデータ抽出サーバ１０１を、ネットワークによるストリームバックアップサービスに用いた場合の構成および動作を示した概念図である。なお、第４の実施形態におけるメタデータ抽出サーバ１０１は、第１の実施形態において説明したメタデータ抽出サーバと同じ構成を有している。 (Fourth embodiment)
Next, a fourth embodiment of the present invention will be described. FIG. 7 is a conceptual diagram showing the configuration and operation when the metadata extraction server 101 of the present invention is used for a stream backup service over a network. Note that the metadata extraction server 101 in the fourth embodiment has the same configuration as the metadata extraction server described in the first embodiment.

まず、ネットワーク１００上に前述したコントロールブリッジ２７０を配備する。このコントロールブリッジ２７０は、ミラーリング（Ｍｉｒｒｏｒｉｎｇ）機能を用いて、ライブカメラ２８０や、ユーザ端末１０２−４に備えられたテレビ電話機能からのメディアストリームを捕捉し、複製してメタデータ抽出サーバ１０１と、メディアＤＢ３００とに送信する（ステップＳ６１、Ｓ６２）。その後、メタデータ抽出サーバ１０１は、抽出したメタデータをメディアＤＢ３００に送信する（ステップＳ６３）。メディアＤＢ３００は、対応するメディアデータと対にして蓄積する。 First, the control bridge 270 described above is provided on the network 100. The control bridge 270 uses a mirroring function to capture a media stream from the live camera 280 or the videophone function provided in the user terminal 102-4, and duplicates the metadata stream to the metadata extraction server 101. It transmits to media DB300 (step S61, S62). Thereafter, the metadata extraction server 101 transmits the extracted metadata to the media DB 300 (step S63). The media DB 300 stores a pair with corresponding media data.

メディアＤＢ３００は、一定時間で、このメディアデータとメタデータとを廃棄するものとする。また、メディアＤＢ３００は、キーワードでメタデータを検索できるようになっており、対応するメディアの時間範囲が分かるものとする。これにより、ユーザ端末１０２−７は、メディアＤＢ３００に対して検索キーワードを送信することで（ステップＳ６４）、記録容量が小さい端末同士の通信の記録や、端末で記録し忘れた過去のメディアストリームの検索が可能となり（ステップＳ６５）、利便性を向上させることが可能となる。 It is assumed that the media DB 300 discards the media data and metadata in a certain time. Further, the media DB 300 can search for metadata by a keyword, and the time range of the corresponding media can be known. As a result, the user terminal 102-7 transmits a search keyword to the media DB 300 (step S64), thereby recording communication between terminals having a small recording capacity, and past media streams that have been forgotten to be recorded by the terminal. Search is possible (step S65), and convenience can be improved.

上述した実施形態よれば、部分空間毎に認識器を作成して認識することにより、より精度を向上させることが可能となる。また、部分空間に区切ることにより、特徴量の本質的次元が小さくなるだけでなく、収集されるコメントの種類も自ずと限定されるため、出力されるメタデータの種類も限定され、相関係数の次元数も小さくなり、メディアストリームからのメタデータ抽出時にかかる計算量も削減することができる。また、メディアストリームに何らかの情報を付随させる必要が無くなるため、ネットワーク側の機器で認識するような別の付随情報を期待できない場合であっても可能となる。 According to the above-described embodiment, it is possible to improve accuracy by creating and recognizing a recognizer for each partial space. Moreover, by dividing into subspaces, not only the essential dimension of the feature quantity is reduced, but also the types of collected comments are naturally limited, so the types of output metadata are also limited, and the correlation coefficient The number of dimensions is also reduced, and the amount of calculation required when extracting metadata from the media stream can be reduced. In addition, since it is not necessary to attach any information to the media stream, it is possible even when other accompanying information that can be recognized by a device on the network side cannot be expected.

なお、上述のメタデータ抽出サーバ１０１の動作の過程は、コンピュータに実行させるためのプログラムや、このプログラムとしてコンピュータ読み取り可能な記録媒体として利用可能であり、コンピュータシステムが読み出して実行することによって、上記処理が行われる。なお、ここでいう「コンピュータシステム」とは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 The above-described process of operation of the metadata extraction server 101 can be used as a program to be executed by a computer or a computer-readable recording medium as the program. Processing is performed. The “computer system” here includes a CPU, various memories, an OS, and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に記憶したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。
さらに、前述した機能をコンピュータシステムに既に記録されているプログラムとの組合せで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above.
Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

１００・・・ネットワーク、１０１・・・メタデータ抽出サーバ、１０２−１〜１０２−７・・・ユーザ端末、２０１・・・特徴量ＤＢ、２０２・・・メタデータＤＢ、２０３・・・メタデータ整理部、２０４・・・特徴量算出部、２０５・・・相関係数ＤＢ、２０７・・・相関係数適用部、２０８・・・メディアストリーム取得部、２１０・・・メディアストリーム送信端末、２１４・・・相関係数算出部、２３０・・・送信設定部、２３１・・・メタデータ送出部、２３２・・・メタデータ埋め込み部、２３３・・・メディアストリーム送出部、２３４・・・教師情報収集部、２４０・・・Ｗｅｂサーバ、２４０−１・・・動画投稿サイト、２４０−２・・・ブログサイト、２５０・・・メディアストリーム受信端末、２５１・・・メタデータ受信端末、２６０・・・検索ＤＢ、２７０・・・コントロールブリッジ、２８０・・・ライブカメラ、２９０・・・プロシキ／ＰＢＸ、３００・・・メディアＤＢ DESCRIPTION OF SYMBOLS 100 ... Network, 101 ... Metadata extraction server, 102-1 to 102-7 ... User terminal, 201 ... Feature-value DB, 202 ... Metadata DB, 203 ... Metadata Arrangement unit, 204 ... feature amount calculation unit, 205 ... correlation coefficient DB, 207 ... correlation coefficient application unit, 208 ... media stream acquisition unit, 210 ... media stream transmission terminal, 214 ... correlation coefficient calculation unit, 230 ... transmission setting unit, 231 ... metadata sending unit, 232 ... metadata embedding unit, 233 ... media stream sending unit, 234 ... teacher information Collection unit, 240 ... Web server, 240-1 ... Video posting site, 240-2 ... Blog site, 250 ... Media stream receiving terminal, 251 ... Meta Over data receiving terminal, 260 ... search DB, 270 ··· control bridge, 280 ... live camera, 290 ... Puroshiki / PBX, 300 ··· media DB

Claims

In a metadata extraction server that extracts metadata to recognize the content of media data based on media data,
A teacher information collection unit that collects content from sites on the network;
A feature amount calculation unit that calculates a feature amount matrix representing the contents of the content collected by the teacher information collection unit;
When storing the feature amount matrix calculated by the feature amount calculation unit at a constant time interval, clustering is performed on the set of feature amount matrices calculated by the feature amount calculation unit, and a part of each feature amount matrix set is stored. A feature storage unit for determining a space;
A metadata organizing unit that performs morphological analysis on the content collected by the teacher information collecting unit and generates a metadata matrix that represents the frequency of occurrence of words subjected to morphological analysis;
A correlation coefficient calculating unit that calculates a correlation coefficient between the set of feature matrixes and the set of metadata matrices generated by the metadata organizing unit for each partial space determined by the feature storage unit;
A metadata extraction server, comprising: a correlation coefficient storage unit that stores the correlation coefficient calculated by the correlation coefficient calculation unit for each partial space.

A media stream acquisition unit that acquires media streams transmitted and received on the network;
The feature amount calculation unit includes:
Calculating a feature amount matrix representing the contents of the media stream acquired by the media stream acquisition unit;
The feature amount storage unit includes:
In addition to the feature amount matrix that represents the content of the content collected by the teacher information collection unit, when storing the feature amount matrix that represents the content of the media stream acquired by the media stream acquisition unit at a predetermined time interval, Clustering with a set of feature amount matrices representing the content of the media stream calculated by the feature amount calculation unit, and determining a subspace for each set of feature amount matrices;
The metadata extraction server according to claim 1, wherein:

Based on the partial space determined by the feature quantity storage unit, a partial space including a feature quantity matrix representing the content of the media stream is specified, and the specified part calculated by the correlation coefficient calculation unit A correlation application unit that calculates a metadata vector of the feature vector by applying a correlation coefficient corresponding to a space, and selects a set determined as an appropriate word from the metadata vector as one metadata;
A transmission setting unit that determines whether the metadata selected by the correlation application unit is transmitted as the metadata itself or embedded in the media stream;
A metadata sending unit that sends the metadata to a specified destination metadata receiving terminal when the transmission judging unit decides to send the metadata itself;
A metadata embedding unit that embeds the metadata in the media stream when the transmission determination unit determines to embed and transmit the metadata in the media stream;
The metadata extraction server according to claim 2, further comprising: a media stream sending unit that sends the media stream in which the metadata is embedded by the metadata embedding unit to a media stream receiving terminal that is a designated destination.

The transmission setting unit
As the designated destination, either the transmission source of the media stream or a transmission destination different from the transmission source is selectively determined.
The metadata extraction server according to claim 3.

In a metadata extraction method in a metadata extraction server that extracts metadata to recognize the content of media data based on media data,
A teacher information collection unit for collecting content from a site on the network;
A feature amount calculating unit that calculates a feature amount matrix representing the content of the content collected in the teacher information collecting step;
When the feature amount storage unit stores the feature amount matrix calculated in the feature amount calculation step at a constant time interval, the feature amount storage unit performs clustering on the set of feature amount matrices calculated in the feature amount calculation step, and the feature amount A feature storage step for determining a subspace for each set of matrices;
A metadata organizing step for performing a morphological analysis on the content collected in the teacher information collecting step, and generating a metadata matrix representing a frequency of occurrence of words subjected to the morphological analysis;
A correlation coefficient calculation unit calculates a correlation coefficient between the set of feature amount matrices and the set of metadata matrices generated in the metadata organization step for each partial space determined in the feature amount storage step. A correlation coefficient calculating step;
A correlation coefficient storage unit that stores the correlation coefficient calculated in the correlation coefficient calculation step for each partial space;
A metadata extraction method comprising:

A computer that operates as a metadata extraction server that extracts metadata for recognizing the contents of media data based on the media data.
Teacher information collection means to collect content from sites on the network;
A feature quantity calculating means for calculating a feature quantity matrix representing the contents of the content collected by the teacher information collecting means;
When storing the feature quantity matrix calculated by the feature quantity calculation means at a fixed time interval, clustering is performed on the set of feature quantity matrices calculated by the feature quantity calculation means, and a part is set for each set of feature quantity matrices. Feature quantity storage means for determining space;
Metadata organizing means for performing morphological analysis on the content collected by the teacher information collecting means, and generating a metadata matrix representing the frequency of occurrence of words subjected to morphological analysis;
Correlation coefficient calculating means for calculating a correlation coefficient between the set of feature quantity matrices and the set of metadata matrices generated by the metadata organizing means for each subspace determined by the feature quantity storage means;
The program for making it operate | move as a correlation coefficient memory | storage means to memorize | store the correlation coefficient calculated by the said correlation coefficient calculation means for every said partial space.