JP3090233B2

JP3090233B2 - A method for identifying associations between complex information

Info

Publication number: JP3090233B2
Application number: JP04031069A
Authority: JP
Inventors: 英昭小澤; 透中川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-02-18
Filing date: 1992-02-18
Publication date: 2000-09-18
Anticipated expiration: 2015-09-18
Also published as: JPH05233719A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、映像として提供される
情報、活字として提供される情報、その他の情報伝達媒
体を介して提供される複合的な情報を利用するデータベ
ースシステムにおいて、各情報間の関連度の識別に用い
られる複合的な情報間の関連性識別方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database system utilizing information provided as video, information provided as print, and complex information provided through other information transmission media. The present invention relates to a method for identifying the association between complex information used for identifying the degree of association.

【０００２】[0002]

【従来の技術】従来のマルチメディア情報を扱うデータ
ベースシステムでは、扱うメディアの種類や対象によっ
てデータベース中で利用するデータの構造やデータ間の
関係が異なるために、それらをデータベースの作成時に
定義していた。したがって、個々のデータベースで扱え
る情報は、地図の情報やドキュメントの情報などのよう
に当初に計画されていたアプリケーションに依存し、デ
ータベース固有に定義されたデータ間の関連性のもとで
しか利用できなかった。2. Description of the Related Art In a conventional database system for handling multimedia information, the structure of data used in the database and the relationship between the data differ depending on the type and target of the media to be handled. Was. Therefore, the information that can be handled by each database depends on the originally planned application, such as map information and document information, and can be used only with the relationship between data defined in the database. Did not.

【０００３】さらに、関連性の表現には主にキーワード
が用いられているが、必要なデータを検索するために適
当なキーワードや属性をデータベースに与える作業は、
ユーザにとって大きな負担になっていた。[0003] Furthermore, keywords are mainly used in the expression of relevance, but the task of giving appropriate keywords and attributes to a database in order to search for necessary data is as follows.
This was a heavy burden for the user.

【０００４】[0004]

【発明が解決しようとする課題】ところで、地図のデー
タベースやドキュメントのデータベースといった個々に
独立に作成されたデータベース間において、例えばドキ
ュメントのデータから地図のデータを参照したいという
ように、個々のメディアやデータベースを越えたマルチ
メディアのデータを利用したいという要求がある。By the way, between individually created databases such as a map database and a document database, for example, it is desired to refer to map data from document data by using individual media or database. There is a demand to use multimedia data that exceeds the standard.

【０００５】しかし、従来技術では、例えば２つのデー
タベース間で新たに必要となった関連性を表現するため
には、新しい関係を検索することが可能なキーワードを
付加した上で、新しい参照関係を示すデータベースを作
成する必要があった。However, in the prior art, for example, in order to express a newly required relationship between two databases, a keyword capable of searching for a new relationship is added, and then a new reference relationship is created. It was necessary to create the database shown.

【０００６】本発明は、任意に作成されたマルチメディ
アのデータベースに対して関連性を識別する機構によ
り、新しい参照関係が必要になっても新しい参照関係デ
ータベースを作成することなく、任意のマルチメディア
情報間に多様な関連付けを行うことができる複合的な情
報間の関連性識別方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention provides a mechanism for identifying relevance to an arbitrarily created multimedia database, thereby creating an arbitrary multimedia database without creating a new reference database even when a new reference relationship is required. It is an object of the present invention to provide a method for identifying association between multiple pieces of information that can make various associations between pieces of information.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の発明
は、画像情報、音声情報、テキスト情報、その他各メデ
ィア情報を含む複合的な情報間の関連性識別方法におい
て、前記各情報に、個々の特徴を自然言語で表現した特
徴表現文と、その情報の発生時刻データとを付加し、前
記各情報の特徴表現文を自然言語解析して各特徴量を抽
出し、前記各情報の発生時刻データと、各情報の特徴表
現文を自然言語解析して抽出した時間に関する情報とを
用いて、その情報が実際に発生した絶対時間に変換し、
前記各情報の特徴量および絶対時間の比較により前記各
情報間の関連度を算出することを特徴とする。According to a first aspect of the present invention, there is provided a method for identifying a relationship between complex information including image information, audio information, text information, and other media information. A feature expression sentence expressing each feature in natural language and the time of occurrence of the information are added, the feature expression sentence of each information is analyzed by natural language to extract each feature amount, and the generation of each information is generated. Using the time data and information on the time extracted by performing a natural language analysis on the characteristic expression sentence of each information, the information is converted into the absolute time at which the information actually occurred,
The degree of association between the pieces of information is calculated by comparing the feature amount of each piece of information and the absolute time.

【０００８】請求項２に記載の発明は、請求項１に記載
の複合的な情報間の関連性識別方法において、各情報の
特徴表現文を自立語と修飾語の品詞ごとに分類し、品詞
ごとの単語が一致する量により品詞ごとの関連度を算出
し、各情報の特徴表現文中の副詞と日時を表す語を用い
て、各情報の発生時刻データを絶対時間に変換し、前記
各品詞の個数をｎとしたときに、各品詞ごとの単語の一
致量と各情報の絶対時間をｎ＋１次元の空間上の各軸に
それぞれ割り当て、それらの大きさから前記各情報間の
関連度をｎ＋１次元の空間上の擬似的な距離として算出
することを特徴とする。According to a second aspect of the present invention, in the method for associating multiple pieces of information according to the first aspect, the characteristic expression sentence of each information is classified into a part of speech of an independent word and a modifier, and The degree of relevance of each part of speech is calculated based on the amount of words that match each word, and the occurrence time data of each piece of information is converted into absolute time using words representing the adverb and the date and time in the characteristic expression sentence of each piece of information. And the absolute time of each piece of information is assigned to each axis in an n + 1-dimensional space, and the degree of association between the pieces of information is determined by n + 1 It is characterized in that it is calculated as a pseudo distance in a dimensional space.

【０００９】請求項３に記載の発明は、請求項２に記載
の複合的な情報間の関連性識別方法において、各品詞ご
との単語の一致量と各情報の絶対時間をｎ＋１次元の空
間上の各軸にそれぞれ割り当てるときに、所定の軸上で
扱える距離の範囲を指定してｎ＋１次元上の距離を制御
することを特徴とする。According to a third aspect of the present invention, in the method for identifying the relation between the complex information according to the second aspect, the amount of coincidence of the word for each part of speech and the absolute time of each information are determined on an n + 1-dimensional space. When assigning to each of the axes, a range of a distance that can be handled on a predetermined axis is designated to control the distance on the (n + 1) -dimensional.

【００１０】請求項４に記載の発明は、請求項２に記載
の複合的な情報間の関連性識別方法において、各品詞ご
との単語が一致する量を計算する際に、特徴表現文の係
受けを解析して情報の大意を表す部分とそれを修飾する
部分とに分離し、各情報の修飾部分で一致する量に対し
て、大意を表す部分で一致する量を加重計算することを
特徴とする。According to a fourth aspect of the present invention, in the method for identifying the relation between complex information according to the second aspect, when calculating the amount of matching of the words of each part of speech, the relation of the characteristic expression sentence is calculated. Analyzing the information and separating it into a part that expresses the meaning of the information and a part that modifies it, weights the amount that matches in the part that expresses the significance with respect to the amount that matches in the modified part of each information. And

【００１１】[0011]

【作用】本発明は、各メディアに対応する情報に個々の
特徴を自然言語で表現した特徴表現文を付加し、それぞ
れの特徴表現文を自然言語解析して抽出された各特徴量
を比較することにより、各情報間の関連度を算出する。
なお、自然言語解析の処理方法について、地域的な属性
の抽出、人物の抽出などを変化させることにより、各情
報間で複数の関連性を得ることができる。According to the present invention, a feature expression sentence in which individual features are expressed in natural language is added to information corresponding to each media, and each feature expression sentence is analyzed by natural language to compare each feature amount extracted. Thereby, the degree of association between the pieces of information is calculated.
In addition, regarding the processing method of the natural language analysis, a plurality of relevances can be obtained between pieces of information by changing extraction of a regional attribute, extraction of a person, and the like.

【００１２】また、各情報にその情報の発生時刻データ
を付加し、各情報の特徴表現文の解析と発生時刻データ
からその情報が実際に発生した絶対時間を割り出し、各
情報を時間軸上に整列させる。[0012] Further, the occurrence time data of the information is added to each information, the characteristic expression sentence of each information is analyzed, and the absolute time when the information actually occurs is calculated from the occurrence time data, and each information is plotted on a time axis. Align.

【００１３】ここで、各情報の特徴量および絶対時間を
比較することにより、各情報間の関連度を正確に得るこ
とができる。すなわち、複数のデータベース間で構造や
キーワードに捕らわれずに各情報の関連性を識別するこ
とができる。Here, by comparing the feature amount and the absolute time of each information, it is possible to accurately obtain the degree of association between each information. That is, it is possible to identify the relevance of each information between a plurality of databases without being caught by a structure or a keyword.

【００１４】[0014]

【実施例】図１は、本発明の複合的な情報間の関連性の
識別方法を実現するマルチメディアデータベースシステ
ムの実施例構成を示すブロック図である。なお、本実施
例では、データベース化された新聞とテレビのニュース
を例として、ある記事Ａに類似した記事ＢやニュースＣ
を検索する過程について説明する。FIG. 1 is a block diagram showing the configuration of an embodiment of a multimedia database system for realizing a method for identifying a relationship between complex information according to the present invention. In the present embodiment, an article B and a news C similar to a certain article A are taken as examples of newspaper and television news in a database.
The process of searching for will be described.

【００１５】図において、本実施例のマルチメディアデ
ータベースシステムは、それぞれテレビや新聞などの個
々のメディアに対応したテレビニュースデータベース１
０₁や新聞記事データベース１０₂と、牽引リンクモジ
ュール２０と、構造リンクモジュール３０と、検索条件
出力モジュール４０と、検索結果獲得モジュール５０
と、内容リンクモジュール６０と、時間リンクモジュー
ル７０と、制御モジュール８０とにより構成される。Referring to FIG. 1, a multimedia database system according to the present embodiment has a television news database 1 corresponding to individual media such as a television and a newspaper.
0 and ₁ and newspaper articles database 10 _2, the traction link module 20, the structural link module 30, a search condition output module 40, the search result acquisition module 50
, A content link module 60, a time link module 70, and a control module 80.

【００１６】牽引リンクモジュール２０は、各メディア
対応の個々のデータに付加されたキーワードによる関連
付けを行う。構造リンクモジュール３０は、各メディア
対応の個々のデータの構造を操作する。検索条件出力モ
ジュール４０は、牽引リンクモジュール２０および構造
リンクモジュール３０によって作成される検索条件で各
データベース１０₁，１０₂をアクセスする。検索結果
獲得モジュール５０は、検索条件出力モジュール４０に
よって検索された情報を受け、内容リンクモジュール６
０および時間リンクモジュール７０に送出する。内容リ
ンクモジュール６０は、各情報の特徴表現文を用いて関
連付けを行う。時間リンクモジュール７０は、各情報の
発生時刻を用いて関連付けを行う。制御モジュール８０
は、ユーザとのインタフェースをとりながら各モジュー
ルを制御し、内容リンクモジュール６０および時間リン
クモジュール７０で関連付けられた情報から最終的な検
索結果を作成する。The tow link module 20 associates individual data corresponding to each medium with a keyword added thereto. The structure link module 30 operates the structure of individual data corresponding to each medium. The search condition output module 40 accesses each of the databases 10 ₁ and 10 ₂ with the search conditions created by the towing link module 20 and the structure link module 30. The search result acquisition module 50 receives the information searched by the search condition output module 40, and receives the content link module 6
0 and time link module 70. The content link module 60 performs association using the characteristic expression sentence of each information. The time link module 70 performs association using the occurrence time of each piece of information. Control module 80
Controls each module while interfacing with the user, and creates a final search result from the information associated with the content link module 60 and the time link module 70.

【００１７】ここで、内容リンクモジュール６０および
時間リンクモジュール７０の構成を図２に示す。図にお
いて、内容リンクモジュール６０は、検索文解析部６１
と、結果文解析部６２と、形態素解析部６３と、比較モ
ジュール６４とにより構成される。検索文解析部６１
は、制御モジュール８０から与えられる検索対象の記事
Ａの特徴表現文を比較要素に基づいて解析し、その結果
を比較モジュール６４に与える。結果文解析部６２は、
検索結果獲得モジュール１５を介して各データベースの
検索結果を取り込み、同様に比較要素に基づいて記事
Ｂ，ニュースＣの特徴表現文を解析し、その結果を比較
モジュール６４に与える。形態素解析部６３は、検索文
解析部６１および結果文解析部６２における特徴表現文
の解析を助ける。比較モジュール６４は、検索文解析部
６１および結果文解析部６２での解析結果と、制御モジ
ュール８０から与えられる閾値との比較を行い、記事
Ｂ，ニュースＣの各特徴量を付与する。Here, the configurations of the content link module 60 and the time link module 70 are shown in FIG. In the figure, a content link module 60 includes a search sentence analyzing unit 61.
, A result sentence analysis unit 62, a morphological analysis unit 63, and a comparison module 64. Search sentence analyzer 61
Analyzes the characteristic expression sentence of the article A to be searched, which is provided from the control module 80, based on the comparison element, and provides the result to the comparison module 64. The result sentence analysis unit 62
The search result of each database is fetched through the search result acquisition module 15, the characteristic expression sentences of the articles B and news C are similarly analyzed based on the comparison elements, and the result is given to the comparison module 64. The morphological analysis unit 63 assists the search sentence analysis unit 61 and the result sentence analysis unit 62 in analyzing the characteristic expression sentence. The comparison module 64 compares the analysis results of the search sentence analysis unit 61 and the result sentence analysis unit 62 with the threshold value given from the control module 80, and gives each feature amount of the article B and the news C.

【００１８】時間リンクモジュール７０は、検索時間決
定部７１と、結果文解析部７２と、絶対時間演算部７３
と、比較モジュール７４とにより構成される。検索時間
決定部７１は、制御モジュール８０から与えられる検索
対象の記事Ａの特徴表現文と発生時間から検索対象の基
準となる時間を決め、その結果を比較モジュール７４に
与える。結果文解析部７２は、検索結果獲得モジュール
５０を介して各データベースの検索結果を取り込み、記
事Ｂ，ニュースＣの特徴表現文と発生時刻から各記事の
時間を決め、その結果を比較モジュール７４に与える。
絶対時間演算部７３は、検索時間決定部７１および結果
文解析部７２の各特徴表現文中から「昨日」、「今日」
等といった時間情報を抽出し、その時間を年月日などに
よる絶対時間に変換する。比較モジュール７４は、検索
された記事Ｂ，ニュースＣのデータが検索時間決定部７
１で決められた時間範囲内にあるか否かを判定し、記事
Ｂ，ニュースＣの各特徴量を付与する。The time link module 70 includes a search time determining section 71, a result sentence analyzing section 72, and an absolute time calculating section 73.
And a comparison module 74. The search time determination unit 71 determines a reference time of the search target from the characteristic expression sentence of the article A to be searched and the occurrence time provided from the control module 80, and provides the result to the comparison module 74. The result sentence analysis unit 72 takes in the search results of each database via the search result acquisition module 50, determines the time of each article from the characteristic expression sentences of the articles B and news C and the time of occurrence, and sends the result to the comparison module 74. give.
The absolute time calculation unit 73 selects “Yesterday”, “Today” from the characteristic expression sentences of the search time determination unit 71 and the result sentence analysis unit 72.
The time information such as is extracted, and the time is converted into an absolute time such as a date. The comparison module 74 converts the data of the searched article B and news C into the search time determination unit 7.
It is determined whether or not the time is within the time range determined in step 1, and the respective feature amounts of the article B and the news C are added.

【００１９】次に、制御モジュール８０の構成を図３に
示す。図において、制御モジュール８０は、検索対象文
格納部８１と、内容リンク閾値格納部８２と、時間範囲
格納部８３と、解析方法格納部８４と、比較要素決定部
８５と、関連度計算部８６と、検索結果格納部８７とに
より構成される。検索対象文格納部８１は、検索対象の
記事を格納して内容リンクモジュール６０へ与える。内
容リンク閾値格納部８２は、内容リンクモジュール６０
における特徴量計算に用いる閾値を格納して内容リンク
モジュール６０へ与える。時間範囲格納部８３は、時間
リンクモジュール７０における特徴量計算に用いる時間
範囲を格納して時間リンクモジュール７０へ与える。解
析方法格納部８４は、関連度の計算アルゴリズムを格納
して比較要素決定部８５および関連度計算部８６へ与え
る。比較要素決定部８５は、関連度の計算に必要な名
詞，固有名詞などの比較要素を決定して内容リンクモジ
ュール６０へ与える。関連度計算部８６は、内容リンク
モジュール６０および時間リンクモジュール７０によっ
て決定された特徴量を取り込み、関連度の計算アルゴリ
ズムを用いて関連度を算出して検索結果格納部８７に与
える。検索結果格納部８７は、最終的に関連度ありと判
断されたデータを格納する。Next, the configuration of the control module 80 is shown in FIG. In the figure, the control module 80 includes a search target sentence storage unit 81, a content link threshold storage unit 82, a time range storage unit 83, an analysis method storage unit 84, a comparison element determination unit 85, and an association degree calculation unit 86. And a search result storage unit 87. The search target sentence storage unit 81 stores the articles to be searched and gives them to the content link module 60. The content link threshold storage unit 82 stores the content link module 60
Are stored and given to the content link module 60. The time range storage unit 83 stores a time range used for calculating the feature amount in the time link module 70 and provides the time range to the time link module 70. The analysis method storage unit 84 stores an algorithm for calculating the degree of relevance and provides it to the comparison element determination unit 85 and the degree of relevancy calculation unit 86. The comparison element determination unit 85 determines comparison elements such as nouns and proper nouns necessary for calculating the degree of relevance and supplies the comparison elements to the content link module 60. The relevance calculating unit 86 takes in the feature amounts determined by the content link module 60 and the time link module 70, calculates the relevance using an algorithm for calculating the relevance, and provides the relevance to the search result storage unit 87. The search result storage unit 87 stores data that is finally determined to be relevant.

【００２０】以下、記事Ａに類似した記事Ｂやニュース
Ｃを検索する過程の動作手順について、以上示した各部
の構成と、図４に示すフローチャートを参照して説明す
る。まず、ユーザは制御モジュール８０に、検索対象の
記事Ａと、関連度計算の解析方法のアルゴリズムと、時
間範囲と、閾値とを設定することにより、制御モジュー
ル８０は動作を開始する。Hereinafter, an operation procedure in a process of searching for an article B or a news C similar to the article A will be described with reference to the configuration of each unit described above and the flowchart shown in FIG. First, the user sets the article A to be searched, the algorithm of the method of analyzing the relevance calculation, the time range, and the threshold value in the control module 80, and the control module 80 starts operating.

【００２１】制御モジュール８０は、比較要素決定部８
５で関連度の計算に必要な比較要素を決定し、記事Ａの
特徴表現文、閾値とともに内容リンクモジュール６０に
設定する（ステップ１）。The control module 80 includes a comparison element determining unit 8
In step 5, a comparison element required for calculating the degree of relevance is determined, and set in the content link module 60 together with the characteristic expression sentence of the article A and the threshold (step 1).

【００２２】次に、制御モジュール８０は、記事Ａの特
徴表現文、発生時間および時間範囲を時間リンクモジュ
ール７０に設定する（ステップ２）。次に、制御モジュ
ール８０は、各データベースをキーワード検索するため
のキーワードを牽引リンクモジュール２０に設定する
（ステップ３）。さらに、例えば新聞記事から写真の部
分のみを得るといった構造を操作するための条件、すな
わち抽出したい構造表現を構造リンクモジュール３０に
設定する（ステップ４）。Next, the control module 80 sets the characteristic expression sentence, occurrence time, and time range of the article A in the time link module 70 (step 2). Next, the control module 80 sets a keyword for keyword search of each database in the towing link module 20 (step 3). Further, a condition for operating a structure such as obtaining only a photograph portion from a newspaper article, that is, a structure expression to be extracted is set in the structure link module 30 (step 4).

【００２３】検索条件出力モジュール４０では、牽引リ
ンクモジュール２０および構造リンクモジュール３０に
設定された条件を用いて、テレビニュースデータベース
１０ ₁および新聞記事データベース１０₂を検索する
（ステップ５）。検索された結果は、検索結果獲得モジ
ュール５０を介して内容リンクモジュール６０および時
間リンクモジュール７０に送られる。なお、検索結果獲
得モジュール５０は、ここではバッファの役割を果たし
ている。In the search condition output module 40, the
Link module 20 and structural link module 30
Using the set conditions, TV news database
10 ₁And newspaper article database 10_TwoSearch for
(Step 5). The search result is the search result acquisition module.
Module 50 and the content link module 60
Sent to the inter-link module 70. In addition, search results
The gain module 50 serves here as a buffer.
ing.

【００２４】内容リンクモジュール６０では、検索文解
析部６１で記事Ａの特徴表現文について形態素解析した
後に、比較要素に従って解析に必要なデータを生成し、
比較モジュール６４に格納する。また、結果文解析部６
２では、検索された記事Ｂ，ニュースＣの特徴表現文に
ついて形態素解析した後に、比較要素に従って比較対象
のデータを生成し、比較モジュール６４に格納する。比
較モジュール６４では、記事Ａと記事Ｂ，ニュースＣの
各解析データを比較し、閾値よりも多くの一致があるデ
ータには0.00以上1.00未満の特徴量を付加し、閾値より
も一致が少ないデータには1.00の特徴量を付加し、各記
事のデータとともに制御モジュール８０に送出する（ス
テップ６）。In the content link module 60, after performing a morphological analysis on the characteristic expression sentence of the article A by the search sentence analyzing unit 61, data necessary for the analysis is generated according to the comparison element.
It is stored in the comparison module 64. The result sentence analysis unit 6
In step 2, after performing a morphological analysis on the feature expression sentences of the retrieved articles B and news C, data to be compared is generated according to the comparison elements and stored in the comparison module 64. The comparison module 64 compares the analysis data of the article A with the analysis data of the article B and the news C, adds a feature amount of 0.00 or more and less than 1.00 to data having more matches than the threshold value, and adds data with less matches than the threshold value. To the control module 80 together with the data of each article (step 6).

【００２５】時間リンクモジュール７０では、検索時間
決定部７１で記事Ａの特徴表現文および発生時間を用い
て、例えば「昨日」という時間情報を「1991年９月10
日」のような絶対時間に変換し、時間範囲によって「19
91年８月から1991年10月」のような時間の区間に変換
し、比較モジュール７４に格納する。また、結果文解析
部７２では、検索された記事Ｂ，ニュースＣの特徴表現
文から、例えば事件のあった日付、報道された日付のデ
ータを抽出して比較モジュール７４に格納する。比較モ
ジュール７４では、記事Ａと記事Ｂ，ニュースＣの各時
間データを比較し、時間範囲内に存在するデータには0.
00以上1.00未満の特徴量を付加し、時間範囲外のデータ
には1.00の特徴量を付加し、制御モジュール８０に送出
する（ステップ７）。In the time link module 70, the search time determination unit 71 uses the characteristic expression sentence of the article A and the occurrence time to, for example, convert the time information “yesterday” to “September 10, 1991”.
Converts to absolute time, such as days, and depending on the time range,
It is converted into a time section such as “August 1991 to October 1991” and stored in the comparison module 74. The result sentence analyzing unit 72 extracts, for example, data on the date of the incident and the date of the news report from the feature expression sentences of the retrieved articles B and news C, and stores the data in the comparison module 74. The comparison module 74 compares the time data of the article A with the time data of the article B and the news C, and sets the data existing within the time range to 0.
A feature value of not less than 00 and less than 1.00 is added, and a feature value of 1.00 is added to data outside the time range, and the data is sent to the control module 80 (step 7).

【００２６】制御モジュール８０では、関連度計算部８
６に内容リンクモジュール６０および時間リンクモジュ
ール７０によって決定された特徴量を取り込み、与えら
れた解析方法を用いて関連度を算出する（ステップ
８）。関連性判定の一例としては、内容リンクモジュー
ル６０から得られた特徴量をα_i(i＝１〜ｎ）、時間リ
ンクモジュール７０から得られた特徴量をβとし、それ
らを擬似的な距離と見なして、In the control module 80, the relevance calculating section 8
6, the feature amounts determined by the content link module 60 and the time link module 70 are fetched, and the degree of association is calculated using a given analysis method (step 8). As an example of the relevance determination, the feature amount obtained from the content link module 60 is α _i (i = 1 to n), the feature amount obtained from the time link module 70 is β, Regarding,

【００２７】[0027]

【数１】の計算式により、相対的な距離を計算する。ここで、解
析方法中に指示された所定の距離内にあるデータに関し
てのみ関連性ありと判断する。(Equation 1) The relative distance is calculated by the following formula. Here, it is determined that there is relevance only for data within a predetermined distance designated during the analysis method.

【００２８】検索結果格納部８７では、関連性ありと判
断されたデータについて、相対的な距離から関連度の高
い順に並べ替えて格納する（ステップ９）。ここで、特
徴表現文、データ構造、関連度計算のアルゴリズムにお
ける空間、時間リンクモジュール７０における特徴量の
決め方、内容リンクモジュール６０で得られる特徴量の
決め方についてその一例を説明する。The search result storage section 87 sorts and stores the data determined to be relevant in descending order of relative relevance from the relative distance (step 9). Here, an example will be described of a feature expression sentence, a data structure, a space in an algorithm for calculating the degree of association, a method of determining a feature amount in the temporal link module 70, and a method of determining a feature amount obtained in the content link module 60.

【００２９】特徴表現文は、新聞や雑誌のような活字の
メディアでは本文のデータ、テレビやラジオのような映
像・音声のメディアではスクリプトを用いることにより
実現できる。The characteristic expression sentence can be realized by using text data in print media such as newspapers and magazines, and by using scripts in video and audio media such as television and radio.

【００３０】データ構造は、テレビニュースを例にとる
と、映像、音声、特徴表現文、放映日時およびキーワー
ドからなる。キーワードには、例えば日本などの地名や
経済などのジャンルといったものが付けられる。In the case of television news, the data structure is composed of video, audio, feature expression, broadcast date and time, and keywords. The keywords include, for example, place names such as Japan and genres such as economy.

【００３１】関連度計算のアルゴリズムにおける空間
は、内容リンクモジュール６０で生成される特徴量の中
で、特徴表現文の中に多数存在し情報を大まかに示して
いる名詞による特徴量と、動作の状態を示すサ変名詞に
よる特徴量と、事件のあった場所や関わった人物などを
示す固有名詞による特徴量と、「昨日」といった時間を
表す副詞や「４月」といった時間を表す数詞から生成さ
れる特徴量とを用い、４次元の空間として表現する。The space in the relevance calculation algorithm includes, among the feature quantities generated by the content link module 60, a feature quantity by a noun which is present in a large number of feature expression sentences and roughly indicates information; It is generated from a feature quantity using a paranoun that indicates the state, a feature quantity using a proper noun that indicates the place where the incident occurred, the person involved, etc. And expressed as a four-dimensional space by using

【００３２】例えば、情報Ｘとの間の特徴量について、
名詞の特徴量が0.30、サ変名詞の特徴量が0.60、固有名
詞の特徴量が0.20であり、時間リンクモジュール７０で
得られた特徴量が0.00である情報Ｙに対して、動作手順
のステップ８で述べた (1)式により、擬似的な距離は0.
70となる。また、名詞の特徴量が0.30、サ変名詞の特徴
量が0.60、固有名詞の特徴量が0.60であり、時間リンク
モジュール７０で得られた特徴量が0.00である情報Ｚに
対して、擬似的な距離は0.90となる。したがって、情報
Ｙの方が情報Ｘに類似した情報と判定される。For example, regarding the characteristic amount between the information X and
Step 8 of the operation procedure is performed on the information Y whose feature amount of the noun is 0.30, feature amount of the sa noun is 0.60, feature amount of the proper noun is 0.20, and the feature amount obtained by the time link module 70 is 0.00. According to equation (1), the pseudo distance is 0.
It becomes 70. In addition, for information Z in which the feature quantity of the noun is 0.30, the feature quantity of the sa-variant noun is 0.60, the feature quantity of the proper noun is 0.60, and the feature quantity obtained by the time link module 70 is 0.00, The distance is 0.90. Therefore, the information Y is determined to be information similar to the information X.

【００３３】この他に各軸の特徴量を変化させることに
より、擬似的な距離を変化させることも可能である。た
とえば、 (1)式に各軸の影響度を示す軸強度μ_i(i＝１
〜ｎ）、μ_Bを加えて、In addition, it is also possible to change the pseudo distance by changing the feature amount of each axis. For example, the axial strength μ _i (i = 1) indicating the degree of influence of each axis in equation (1)
~n), added mu _B,

【００３４】[0034]

【数２】とする。ただし、μ_i、μ_Bは、0.00以上の実数で、標
準は1.0 である。この場合、例えば固有名詞の軸強度が
0.010 であったとすると、情報Ｘとの間の特徴量につい
て、名詞の特徴量が0.30、サ変名詞の特徴量が0.60、固
有名詞の特徴量が0.20であり、時間リンクモジュール７
０で得られた特徴量が0.00である情報Ｙの擬似的な距離
は0.67となる。また、名詞の特徴量が0.30、サ変名詞の
特徴量が0.60、固有名詞の特徴量が0.60であり、時間リ
ンクモジュール７０で得られた特徴量が0.00である情報
Ｚの擬似的な距離も0.67となる。したがって、情報Ｙお
よび情報Ｚの情報Ｘに対する類似度は同じと判定され
る。このように、軸強度を変えることで、例えば固有名
詞の軸の影響を弱めれば、サ変名詞が表現する事件が類
似しているといった観点での類似度が強調され、逆に固
有名詞の軸の影響を強めれば人物や場所などの類似が強
調され、関連性の識別に多様性を与えることができる。(Equation 2) And Here, μ _i and μ _B are real numbers of 0.00 or more, and the standard is 1.0. In this case, for example, the axial strength of the proper noun is
Assuming that it is 0.010, the feature amount of the noun is 0.30, the feature amount of the sa-variable noun is 0.60, the feature amount of the proper noun is 0.20, and the time link module 7
The pseudo distance of the information Y whose feature amount obtained at 0 is 0.00 is 0.67. In addition, the pseudo-distance of the information Z in which the characteristic amount of the noun is 0.30, the characteristic amount of the sa-variant noun is 0.60, the characteristic amount of the proper noun is 0.60, and the characteristic amount obtained by the time link module 70 is 0.007. Becomes Therefore, it is determined that the similarity of information Y and information Z to information X is the same. In this way, by changing the axis strength, for example, if the influence of the axis of the proper noun is weakened, the similarity in terms of the similarity of the cases represented by the sa noun is emphasized, and conversely the axis of the proper noun is emphasized. If the influence of is strengthened, the similarity of a person or a place is emphasized, and diversity can be given to the identification of relevance.

【００３５】時間リンクモジュール７０における特徴量
の決め方は、時間リンクモジュール７０が記事Ａと同日
のみの情報しか扱わない場合を0.00とし、無限に古い情
報までを扱う場合を1.00として、時間幅を双曲線関数的
に表現することで達成できる。たとえば、記事Ａとの時
間差Ｘを日数で表し、特徴量ＹをＹ＝1.00−１／(Ｘ＋１） …(3) と表すと、記事Ａと同日の記事について時間リンクモジ
ュール７０で抽出される特徴量は0.00となり、１年前の
情報は0.997 となる。The method of determining the feature quantity in the time link module 70 is 0.00 when the time link module 70 handles only information on the same day as the article A, 1.00 when it handles infinitely old information, and the time width is hyperbolic. This can be achieved by expressing it functionally. For example, if the time difference X from the article A is represented by the number of days and the feature amount Y is represented by Y = 1.00-1 / (X + 1) (3), the feature extracted by the time link module 70 for the article on the same day as the article A The amount is 0.00, and the information one year ago is 0.997.

【００３６】特徴量Ｙの他の決め方は、メディアの種類
に応じて時間リンクモジュール７０が特徴量を決定すれ
ば、テレビと新聞とのように異なるメディアに対して
は、情報が発生してから報道されるまでの時間差を吸収
することができる。たとえば、時間リンクモジュール７
０に入力された記事Ａが午後のテレビニュースであった
場合に、新聞で翌日の朝刊の記事に対しては時間リンク
モジュール７０で得られる特徴量を0.00とし、他の情報
に対しては双曲線関数で表現する。また、時間リンクモ
ジュール７０に入力された記事Ａが午前のテレビニュー
スであれば新聞の夕刊の記事、記事Ａが新聞の朝刊であ
れば前日のテレビニュース、記事Ａが新聞の夕刊であれ
ばお昼までのテレビニュースに対しては、時間リンクモ
ジュール７０で得られる特徴量を0.00とする。この結
果、新聞とテレビニュースとの間にあるメディアによる
時間差を吸収することができ、ある記事に最も類似した
記事を検索し易くすることができる。Another method of determining the characteristic amount Y is that if the time link module 70 determines the characteristic amount according to the type of the medium, the information is generated for different media such as television and newspaper after the information is generated. It can absorb the time lag before being reported. For example, time link module 7
When the article A input to 0 is TV news in the afternoon, the feature amount obtained by the time link module 70 for an article published in the morning on the next day in a newspaper is set to 0.00, and the hyperbolic curve is obtained for other information. Express with a function. Also, if the article A input to the time link module 70 is the morning TV news, an evening newspaper article, if the article A is a newspaper morning newspaper, the previous day's television news, and if the article A is an evening newspaper, lunch time For the TV news up to, the feature amount obtained by the time link module 70 is set to 0.00. As a result, it is possible to absorb the time difference between the newspaper and the television news due to the media, and to easily search for the article most similar to the article.

【００３７】内容リンクモジュール６０で得られる特徴
量の決め方は、検索対象の記事Ａの特徴表現文から得ら
れた比較対象データと、データベースからの検索文から
得られた単語とが完全に一致する場合を0.00とし、まっ
たく一致しない場合を1.00としたときに、一致した単語
数をａ、比較対象データの単語数をｂとすると、その特
徴量Ｗは、Ｗ＝1.00−ａ／ｂ …(4) として表現される。たとえば、記事Ａにおいて名詞が10
個存在したときに、検索した記事中に一致する名詞が８
個存在すれば、特徴量はO.20となる。なお、内容リンク
モジュール６０の閾値は外部から設定される。The method of determining the feature amount obtained by the content link module 60 is such that the comparison target data obtained from the feature expression sentence of the article A to be searched and the word obtained from the search sentence from the database completely match. Assuming that the number of matched words is a and the number of words of the comparison target data is b when the case is set to 0.00 and the case of no match is set to 1.00, the characteristic amount W is W = 1.00−a / b (4) ). For example, in article A, the noun is 10
When there are multiple matches, the noun that matches in the searched article is 8
If there are, the feature amount is O.20. The threshold of the content link module 60 is set from outside.

【００３８】特徴量Ｗの他の決め方は、内容リンクモジ
ュール６０の特徴量の計算において特徴表現文を係受け
解析し、文の主題を表現する文節と、それを修飾する文
節に分離する。すなわち、内容リンクモジュール６０に
入力された記事Ａと、データベースから得られた記事の
それぞれの特徴表現文を係受け解析し、主題（大意）を
表す文節から得られた単語と修飾する文節から得られた
単語とを分類する。例えば比較対象データにおいて、主
題を表す文節から名詞が10個得られ、検索した記事中の
主題を表す文節中に一致する名詞が７個あり、修飾する
文節どうしでは６個中３個の名詞が一致したとすると、
修飾する文節の影響を0.30、主題の文節の影響を0.70と
すれば、特徴量Ｗは、0.70×0.70＋0.30×0.50＝0.36と
なる。この結果、副次的情報による関連性の計算への悪
影響を排除することができる。In another method of determining the characteristic amount W, the characteristic expression sentence is subjected to the analysis in the calculation of the characteristic amount of the content link module 60, and is separated into a phrase expressing the subject of the sentence and a phrase modifying the sentence. That is, the article A input to the content link module 60 and the characteristic expression sentence of each of the articles obtained from the database are subjected to the parsing and analyzed, and the words obtained from the words representing the subject (general) and the words to be modified are obtained. Classified words and. For example, in the data to be compared, ten nouns are obtained from the phrase representing the subject, there are seven matching nouns in the phrase representing the subject in the retrieved article, and three out of six nouns are used in the modifying phrases. If they match,
Assuming that the effect of the clause to be modified is 0.30 and the effect of the subject clause is 0.70, the feature amount W is 0.70 × 0.70 + 0.30 × 0.50 = 0.36. As a result, it is possible to eliminate adverse effects on the calculation of relevance due to the side information.

【００３９】[0039]

【発明の効果】以上説明したように、本発明の複合的な
情報間の関連性識別方法では、例えば新聞記事とテレビ
ニュースのような異なるメディアの情報間に存在する関
連性を容易に識別することができる。すなわち、それぞ
れ異なる作成者により構築されたマルチメディアデータ
ベースに対して、高機能の情報検索サービスを提供する
ことができる。As described above, in the method for identifying the relationship between complex information of the present invention, the relationship existing between information of different media such as newspaper articles and television news can be easily identified. be able to. That is, a high-performance information retrieval service can be provided for multimedia databases constructed by different creators.

【００４０】さらに、本発明方法では、個々の情報に特
徴表現文と発生時刻データとを付加するだけでよく、複
数のデータベースに渡ってキーワードなどの関連性を考
慮する必要がなくなり、データベース作成者への負担を
大幅に軽減することができる。Further, in the method of the present invention, it is only necessary to add a characteristic expression sentence and occurrence time data to individual information, and it is not necessary to consider the relevance of keywords and the like across a plurality of databases. The burden on the user can be greatly reduced.

[Brief description of the drawings]

【図１】本発明の複合的な情報間の関連性の識別方法を
実現するマルチメディアデータベースシステムの実施例
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a multimedia database system for realizing a method for identifying association between complex information according to the present invention.

【図２】内容リンクモジュール６０および時間リンクモ
ジュール７０の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a content link module 60 and a time link module 70.

【図３】制御モジュール８０の構成例を示すブロック図
である。FIG. 3 is a block diagram illustrating a configuration example of a control module 80.

【図４】本発明実施例の動作手順を説明するフローチャ
ートである。FIG. 4 is a flowchart illustrating an operation procedure of the embodiment of the present invention.

[Explanation of symbols]

１０₁ テレビニュースデータベース１０₂ 新聞記事データベース２０牽引リンクモジュール３０構造リンクモジュール４０検索条件出力モジュール５０検索結果獲得モジュール６０内容リンクモジュール６１検索文解析部６２結果文解析部６３形態素解析部６４比較モジュール７０時間リンクモジュール７１検索時間決定部７２結果文解析部７３絶対時間演算部７４比較モジュール８０制御モジュール８１検索対象文格納部８２内容リンク閾値格納部８３時間範囲格納部８４解析方法格納部８５比較要素決定部８６関連度計算部８７検索結果格納部10 ₁ TV news database 10 ₂ Newspaper article database 20 Tow link module 30 Structural link module 40 Search condition output module 50 Search result acquisition module 60 Content link module 61 Search sentence analysis unit 62 Result sentence analysis unit 63 Morphological analysis unit 64 Comparison module 70 Time link module 71 Search time determination unit 72 Result sentence analysis unit 73 Absolute time operation unit 74 Comparison module 80 Control module 81 Search target sentence storage unit 82 Content link threshold storage unit 83 Time range storage unit 84 Analysis method storage unit 85 Comparison element determination Unit 86 relevance calculation unit 87 search result storage unit

フロントページの続き (56)参考文献特開平２−287876（ＪＰ，Ａ) 特開平２−287768（ＪＰ，Ａ) 特開平３−176690（ＪＰ，Ａ) 特開平４−54564（ＪＰ，Ａ) 特開平４−84271（ＪＰ，Ａ) 特開平１−243116（ＪＰ，Ａ) 特開昭61−248160（ＪＰ，Ａ) 特開平１−278171（ＪＰ，Ａ) 特開平２−98779（ＪＰ，Ａ) 特開平２−232771（ＪＰ，Ａ) 特開昭64−26225（ＪＰ，Ａ) 特開平４−24871（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-2-287876 (JP, A) JP-A-2-287768 (JP, A) JP-A-3-176690 (JP, A) JP-A-4-54564 (JP) JP-A-4-84271 (JP, A) JP-A-1-243116 (JP, A) JP-A-61-248160 (JP, A) JP-A-1-278171 (JP, A) 2-98779 (JP, A) JP-A-2-232771 (JP, A) JP-A-64-26252 (JP, A) JP-A-4-24871 (JP, A) (58) Fields investigated (Int. Cl. ^7, DB name) G06F 17/30 G06F 17/28 JICST file (JOIS)

Claims

(57) [Claims]

1. A method for identifying the association between complex information including image information, audio information, text information, and other media information, wherein each of the information includes a feature expression sentence expressing individual features in a natural language. , Adding the occurrence time data of the information, extracting the feature amounts by analyzing the characteristic expression sentence of each information in natural language, and extracting the occurrence time data of each information and the characteristic expression sentence of each information in natural language. Using the information on the time extracted by analysis, the information is converted into an absolute time at which the information actually occurs, and the degree of association between the pieces of information is calculated by comparing the feature amount of the information and the absolute time. A method for identifying the relationship between featured complex information.

2. The method according to claim 1, wherein the characteristic expression sentence of each information is classified into a part of speech of a self-sufficient word and a modifier, and an amount of matching of the word of each part of speech is obtained. , The degree of relevance of each part of speech is calculated using the adverb and the word indicating the date and time in the feature expression sentence of each information.
The occurrence time data of each piece of information is converted into absolute time, and when the number of each part of speech is n, the amount of word matching for each part of speech and the absolute time of each piece of information are plotted on each axis in an n + 1-dimensional space. A method for identifying associations between complex information, comprising: allocating and calculating a degree of association between the pieces of information from the sizes thereof as a pseudo distance in an n + 1-dimensional space.

3. The method according to claim 2, wherein the amount of coincidence between words of each part of speech and the absolute time of each piece of information are n + 1.
A method for identifying relevance between complex information, comprising specifying a range of a distance that can be handled on a predetermined axis and controlling a distance on an (n + 1) -dimensional plane when assigning to each axis in a dimensional space.

4. A method according to claim 2, wherein, when calculating the amount of matching of the words of each part of speech, the dependency of the characteristic expression sentence is analyzed by calculating the amount of matching of the words. The complex information is separated into a part that expresses the meaning and a part that modifies it, and weights the amount that matches in the part that indicates the meaning with respect to the amount that matches in the modification part of each information. Relevance identification method.