JP2012194961A

JP2012194961A - Device and method for calculating document significance

Info

Publication number: JP2012194961A
Application number: JP2011237408A
Authority: JP
Inventors: Jianqiang Li; ジェンチャンリイ; Yu Zhao; ユウジャオ; Bo Liu; ボリウ
Original assignee: NEC China Co Ltd
Current assignee: NEC China Co Ltd
Priority date: 2011-03-16
Filing date: 2011-10-28
Publication date: 2012-10-11
Anticipated expiration: 2031-10-28
Also published as: CN102682040A; JP5429944B2

Abstract

PROBLEM TO BE SOLVED: To provide a device for calculating document significance.SOLUTION: The device for calculating the document significance according to the present invention includes: a semantic relation forming section configured to form a semantic relation between a target document in a target document set and an external document in an external document set; and a document significance calculating section configured to calculate a significance score of the target document based on the semantic relation.

Description

本発明は、情報検索の分野に関し、特に、ドキュメント重要性を計算するための装置と方法に関する。 The present invention relates to the field of information retrieval, and more particularly to an apparatus and method for calculating document importance.

電子情報の絶え間ない増大に伴い、大量の多様な情報が異なる分散型システム上に存在する。このことは、ユーザが有用な情報を検索するのに多大な困難を引き起こす。 With the ever-increasing electronic information, a large amount of diverse information exists on different distributed systems. This causes great difficulty for the user to search for useful information.

情報検索（Information Retrieval：ＩＲ）技術をドキュメント集合から特定の情報を検索するために用いることができる。ＩＲ操作は、さらに以下のように細分化される。ドキュメントに含まれる情報を検索すること、ドキュメント自体を検索すること、ドキュメントを記述するメタデータを検索すること、データベース（イーサネット或いはコンテンツ／ドキュメント管理システムなどの、リレーショナル型パーソナルデータベースやハイパーテキストネットワークデータベース）からテキスト、音声、画像あるいはデータを検索すること。 Information Retrieval (IR) technology can be used to retrieve specific information from a set of documents. The IR operation is further subdivided as follows. Search for information contained in a document, search for the document itself, search for metadata describing the document, databases (relational personal databases such as Ethernet or content / document management systems, hypertext network databases) Search text, audio, images or data from.

主要な文献検索処理において、問合せが与えられると、ドキュメントは、問合せ−関連方法（query-relevant
approach）と問合せ−無関係方法（query-irrelevant
approach）の組み合わせを用いることにより順位付けされる。問合せ−関連方法は、問合せとドキュメントの間の類似度を測定する。一方、問合せ−無関係方法は、特定の問合せとのマッチング度合いと関係無い特徴を考慮してドキュメントを順位付ける。実際上、ドキュメント重要性を計算するための問合せ−無関係方法の利用は、一般的なドキュメント検索エンジンと特定の問合せ／応答あるいはデータマイニングシステムの両方において重要な役割を果たす。 In the main bibliographic search process, given a query, the document is queried as a query-relevant
approach and query-query-irrelevant
ranked by using a combination of approaches). The query-related method measures the similarity between a query and a document. On the other hand, the query-unrelated method ranks documents in consideration of features that are not related to the degree of matching with a specific query. In practice, the use of query-irrelevant methods to calculate document importance plays an important role in both general document search engines and specific query / response or data mining systems.

従来のＩＲ技術は、問合せとドキュメントの間の類似度（問合せ関連スコア）を測定するために主にドキュメントの内部情報を利用する。ウェブのハイパーリンク構造はウェブページ順位付けに重要な役割を果たす。具体例として、ＰａｇｅＲａｎｋは、ページの重要性（問合せ−無関係スコア）を決定するためにウェブのグラフ構造におけるページの位置を用いる。 Conventional IR technology mainly uses internal information of a document to measure the similarity (query related score) between a query and a document. The web hyperlink structure plays an important role in web page ranking. As a specific example, PageRank uses the position of the page in the web graph structure to determine the importance of the page (query-irrelevant score).

非特許文献１（“The PageRank citation ranking: Bringing order
to the web”, L. Page, S. Brin, R. Motwani, and T. Winograd, Technical Report,
Stanford University, 1999）は、ウェブページに重要性レベルを割り当てる方法について開示する。この方法は主に以下のステップを含む。（１）与えられたウェブページの集合を対象としてハイパーリンクを抽出する。（２）各ページがノードと見なされ、各ハイパーリンクが方向性のある辺と見なされるリンク図を構築する。（３）ウェブページの重要性を順位付けする。ここで、リンク図は、ウェブページが状態と見なされ、ページの間のリンクが状態遷移と見なされるマルコフ連鎖（Markov chain）と解釈することができる。初期の確率分布を与えることによって、マルコフ連鎖の定常確率分布を計算することが可能である。 Non-Patent Document 1 (“The PageRank citation ranking: Bringing order
to the web ”, L. Page, S. Brin, R. Motwani, and T. Winograd, Technical Report,
Stanford University, 1999) discloses how to assign importance levels to web pages. This method mainly includes the following steps. (1) Extract hyperlinks for a given set of web pages. (2) Build a link diagram where each page is considered a node and each hyperlink is considered a directional edge. (3) Ranking the importance of web pages. Here, the link diagram can be interpreted as a Markov chain in which web pages are regarded as states and links between pages are regarded as state transitions. By giving an initial probability distribution, it is possible to calculate a Markov chain stationary probability distribution.

特許文献１（米国特許第６２８５９９９Ｂ１号）は、リンクデータベースにおいてノードを順位付けする方法について記述する。リンクデータベースにおけるドキュメントのレベルは、そのドキュメントを引用する他のドキュメントのレベルに基づいて計算することが可能である。 US Pat. No. 6,285,999 B1 describes a method for ranking nodes in a link database. The level of a document in the link database can be calculated based on the level of other documents that cite the document.

米国特許第６２８５９９９Ｂ１号US Pat. No. 6,285,999 B1

“The PageRank citation ranking: Bringing order to theweb”, L. Page, S. Brin, R. Motwani, and T. Winograd, Technical Report, StanfordUniversity, 1999“The PageRank citation ranking: Bringing order to theweb”, L. Page, S. Brin, R. Motwani, and T. Winograd, Technical Report, Stanford University, 1999

しかしながら、ウェブサイトの開設者が重要性スコアを偽造することができるように、ウェブ・コンテンツの作成者も任意にウェブ内のハイパーリンクを追加しあるいは削除することが可能であるので、リンク図は信頼できるデータソースとは言えない。加えて、一般的なドキュメント集合について、ドキュメント間にハイパーリンク関係は存在しない。このため、関連技術による方法は一般的なドキュメント集合に適用することができない。 However, the link diagram can be added because web content creators can optionally add or delete hyperlinks in the web so that website publishers can fake importance scores. It's not a reliable data source. In addition, for a general document set, there is no hyperlink relationship between documents. For this reason, the related art method cannot be applied to a general document set.

上記の技術的課題を解決するため、本発明は、ドキュメント重要性を計算するために、目標ドキュメント集合内のドキュメントと外部ソースからのドキュメントとの間の一連の意味的関連を利用する。具体的には、本発明によれば、外部ドキュメント集合は、暗黙の知識ソースとして用いられる。まず、目標ドキュメントと外部ドキュメントの間の意味的関連を形成する。その後、目標ドキュメントの重要性スコアが形成した意味的関連に基づいて計算される。 In order to solve the above technical problem, the present invention utilizes a series of semantic associations between documents in the target document set and documents from external sources to calculate document importance. Specifically, according to the present invention, an external document set is used as an implicit knowledge source. First, a semantic association is formed between the target document and the external document. The importance score of the target document is then calculated based on the semantic association formed.

本発明のドキュメント重要性を計算する装置は、目標ドキュメント集合内の目標ドキュメントと外部ドキュメント集合内の外部ドキュメントとの間の意味的関連を形成するように構成された意味的関連形成部と、意味的関連に基づいて目標ドキュメントの重要性スコアを計算するように構成されたドキュメント重要性計算部とを含む。 The apparatus for calculating document importance of the present invention comprises a semantic association forming unit configured to form a semantic association between a target document in a target document set and an external document in an external document set, and a semantic And a document importance calculator configured to calculate an importance score for the target document based on the relevance.

好ましい態様によれば、意味的関連形成部は、目標ドキュメントと外部ドキュメント間のテキスト類似度を測定することにより、目標ドキュメントと外部ドキュメント間の意味的関連を形成するように構成される。 According to a preferred aspect, the semantic relation forming unit is configured to form a semantic relation between the target document and the external document by measuring text similarity between the target document and the external document.

好ましい態様によれば、意味的関連形成部は、目標ドキュメントと外部ドキュメントをそれぞれクラスドキュメントとインスタンスドキュメントと定義し、インスタンスドキュメントがクラスドキュメントに分類される確率を計算することにより、目標ドキュメントと外部ドキュメントの間の意味的関連を形成するように構成される。 According to a preferred aspect, the semantic relation forming unit defines the target document and the external document as a class document and an instance document, respectively, and calculates a probability that the instance document is classified into the class document, thereby obtaining the target document and the external document. Configured to form a semantic relationship between.

好ましい態様によれば、ドキュメント重要性計算部は、目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの重要性スコアを計算するように構成される。 According to a preferred aspect, the document importance calculator is configured to calculate an importance score for the target document based on the number of external documents associated with the target document.

好ましい態様によれば、ドキュメント重要性計算部は、目標ドキュメントをノードとして含むグラフ構造であって、グラフ構造内の２つのノードを接続する各辺が、辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、目標ドキュメントの辺の重みの和に基づいて、目標ドキュメントの重要性スコアを計算するように構成される。 According to a preferred aspect, the document importance calculator is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is associated with two nodes connected by an edge. A graph structure having a weight determined by the number of external documents is generated and configured to calculate an importance score for the target document based on the sum of the edge weights of the target document.

好ましい態様によれば、ドキュメント重要性計算部は、目標ドキュメントをノードとして含むグラフ構造であって、グラフ構造内の２つのノードを接続する各辺が、辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、目標ドキュメントの辺の重みの和と目標ドキュメントと関連する外部ドキュメントの数に基づいて、目標ドキュメントの重要性スコアを計算するように構成される。 According to a preferred aspect, the document importance calculator is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is associated with two nodes connected by an edge. Generate a graph structure with weights determined by the number of external documents, and calculate the importance score for the target document based on the sum of the edge weights of the target document and the number of external documents associated with the target document Composed.

好ましい態様によれば、ドキュメント重要性計算部は、目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの中間の重要性スコアを計算し、中間の重要性スコアと目標ドキュメントの辺の重みの和に基づいて目標ドキュメントの重要性スコアを計算するように構成される。 According to a preferred embodiment, the document importance calculation unit calculates an intermediate importance score of the target document based on the number of external documents associated with the target document, and calculates the intermediate importance score and the weight of the edge of the target document. It is configured to calculate an importance score for the target document based on the sum.

好ましい態様によれば、意味的関連形成部によって形成された意味的関連を格納するように構成された意味的関連格納部をさらに含む。 According to a preferred aspect, it further includes a semantic association store configured to store the semantic association formed by the semantic association creator.

本発明のドキュメント重要性を計算する方法は、目標ドキュメント集合内の目標ドキュメントと外部ドキュメント集合内の外部ドキュメントとの間の意味的関連を形成するステップと、意味的関連に基づいて目標ドキュメントの重要性スコアを計算するステップとを含む。 The method for calculating document importance of the present invention includes the steps of forming a semantic relationship between a target document in a target document set and an external document in an external document set, and the importance of the target document based on the semantic relationship. Calculating a sex score.

好ましい態様によれば、形成ステップは、目標ドキュメントと外部ドキュメント間のテキスト類似度を測定することにより、目標ドキュメントと外部ドキュメント間の意味的関連を形成するステップを含む。 According to a preferred aspect, the forming step includes forming a semantic association between the target document and the external document by measuring text similarity between the target document and the external document.

好ましい態様によれば、形成ステップは、目標ドキュメントと外部ドキュメントをそれぞれクラスドキュメントとインスタンスドキュメントと定義し、インスタンスドキュメントがクラスドキュメントに分類される確率を計算することにより、目標ドキュメントと外部ドキュメントの間の意味的関連を形成するステップを含む。 According to a preferred aspect, the forming step defines the target document and the external document as a class document and an instance document, respectively, and calculates the probability that the instance document is classified as a class document, thereby determining the target document and the external document. Forming a semantic association.

好ましい態様によれば、計算ステップは、目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの重要性スコアを計算するステップを含む。 According to a preferred aspect, the calculating step includes calculating an importance score for the target document based on the number of external documents associated with the target document.

好ましい態様によれば、計算ステップは、目標ドキュメントをノードとして含むグラフ構造であって、グラフ構造内の２つのノードを接続する各辺が、辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、目標ドキュメントの辺の重みの和に基づいて、目標ドキュメントの重要性スコアを計算するステップを含む。 According to a preferred aspect, the calculating step is a graph structure including the target document as a node, wherein each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge. Generating a graph structure having a weight determined by a number, and calculating an importance score for the target document based on a sum of the edge weights of the target document.

好ましい態様によれば、計算ステップは、目標ドキュメントをノードとして含むグラフ構造であって、グラフ構造内の２つのノードを接続する各辺が、辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、目標ドキュメントの辺の重みの和と目標ドキュメントと関連する外部ドキュメントの数に基づいて、目標ドキュメントの重要性スコアを計算するステップを含む。 According to a preferred aspect, the calculating step is a graph structure including the target document as a node, wherein each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge. Generating a graph structure having a weight determined by a number, and calculating an importance score for the target document based on the sum of the edge weights of the target document and the number of external documents associated with the target document.

好ましい態様によれば、計算ステップは、目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの中間の重要性スコアを計算し、中間の重要性スコアと目標ドキュメントの辺の重みの和に基づいて目標ドキュメントの重要性スコアを計算するステップを含む。 According to a preferred aspect, the calculating step calculates an intermediate importance score for the target document based on the number of external documents associated with the target document, and is based on the sum of the intermediate importance score and the edge weight of the target document. Calculating an importance score for the target document.

好ましい態様によれば、目標ドキュメントと外部ドキュメント間の意味的関連を格納するステップをさらに含む。 According to a preferred aspect, the method further includes storing a semantic association between the target document and the external document.

本発明によれば、ドキュメント間にハイパーリンクが存在しない場合に、ドキュメントの重要性スコアを計算することができ、それによって、文献検索の正確さが向上する。 According to the present invention, when there is no hyperlink between documents, the importance score of the document can be calculated, thereby improving the accuracy of the document search.

本発明の上記および他の特徴は、図面を参照した以下の詳細な説明からさらに明らかになるであろう。
本発明の一実施の形態によるドキュメント重要性を計算する装置の構成を示すブロック図である。本発明の他の実施の形態によるドキュメント重要性を計算する装置の構成を示すブロック図である。本発明の他の実施の形態によるドキュメント重要性を計算する方法を説明するフローチャートである。 The above and other features of the present invention will become more apparent from the following detailed description with reference to the drawings.
It is a block diagram which shows the structure of the apparatus which calculates the document importance by one embodiment of this invention. It is a block diagram which shows the structure of the apparatus which calculates the document importance by other embodiment of this invention. 6 is a flowchart illustrating a method for calculating document importance according to another embodiment of the present invention.

以下、本発明の原理と実施は、図面を参照した本発明の特定の実施の形態の説明からより明らかになるであろう。ただし、本発明はここで説明する特定の実施例に限定されない。さらに、簡単な内容とするため、本発明に関連する一般的な要素の説明は省略している。 The principles and practice of the present invention will become more apparent from the following description of specific embodiments of the invention with reference to the drawings. However, the present invention is not limited to the specific embodiments described herein. Furthermore, in order to make it simple, description of the general element relevant to this invention is abbreviate | omitted.

まず、この明細書において使用されるいくつかの用語について、テーブル1を参照して以下に説明する。

First, some terms used in this specification will be described below with reference to Table 1.

図1は、本発明の一実施の形態によるドキュメント重要性を計算するための装置１０を示す。図１に示すように、ドキュメント重要性を計算するための装置１０は、意味的関連形成部１２０およびドキュメント重要性計算部１３０を含む。 FIG. 1 shows an apparatus 10 for calculating document importance according to one embodiment of the present invention. As shown in FIG. 1, the apparatus 10 for calculating document importance includes a semantic relation formation unit 120 and a document importance calculation unit 130.

意味的関連形成部１２０は、目標ドキュメント集合の目標ドキュメントと外部ドキュメント集合の外部ドキュメント間の特定の関係を意味的関連として形成するように構成される。目標ドキュメント集合と外部ドキュメント集合のドキュメントは、それぞれ、ドキュメント番号、タイトル、著者、時間、語句１、語句２などの情報を含む。目標ドキュメント集合と外部ドキュメント集合は、揮発性と不揮発性メモリ（例えば、読み出し専用メモリ（ＲＯＭ）、フラッシュ・メモリ）を含む様々なタイプのメモリによって実現される１つ以上の外部メモリに格納することが可能である。しかしながら、目標ドキュメント集合と外部ドキュメント集合が、ドキュメント重要性を計算するための装置１０内の１つ以上のメモリに格納されることも理解できるであろう。 The semantic relation forming unit 120 is configured to form a specific relationship between the target document of the target document set and the external document of the external document set as a semantic relation. The documents of the target document set and the external document set include information such as document number, title, author, time, phrase 1 and phrase 2, respectively. The target document set and external document set are stored in one or more external memories implemented by various types of memory, including volatile and non-volatile memory (eg, read only memory (ROM), flash memory). Is possible. However, it will also be appreciated that the target document set and the external document set are stored in one or more memories within the apparatus 10 for calculating document importance.

一実施の形態において、意味的関連は関連度合いに基づいて形成することが可能である。具体的には、意味的関連形成部１２０は、これら２つのドキュメント間の関連度合いを測定するために目標ドキュメント集合のドキュメントと外部ドキュメント集合のドキュメントの間のテキスト類似度を用いる。 In one embodiment, the semantic association can be formed based on the degree of association. Specifically, the semantic relation forming unit 120 uses the text similarity between the document of the target document set and the document of the external document set in order to measure the degree of relation between these two documents.

他の実施の形態において、意味的関連は、インスタンス関係に基づいて形成することが可能である。具体的には、意味的関連形成部１２０は、２つのドキュメント間のクラス−インスタンス関係を形成するために分類方法を用いる。すなわち、目標ドキュメント集合のドキュメントはクラスドキュメント（テンプレート）と見なされ、外部ドキュメント集合のドキュメントはインスタンスドキュメントと見なされる。その後、インスタンスドキュメントがクラスドキュメントに分類される確率が計算される。 In other embodiments, semantic associations can be formed based on instance relationships. Specifically, the semantic association forming unit 120 uses a classification method to form a class-instance relationship between two documents. That is, the documents in the target document set are regarded as class documents (templates), and the documents in the external document set are regarded as instance documents. Thereafter, the probability that the instance document is classified as a class document is calculated.

意味的関連形成部１２０の出力は、例えば、［関連度合い（文書番号１、文書番号２）］あるいは［インスタンス関係（文書番号１、文書番号２）］である。 The output of the semantic relation forming unit 120 is, for example, [relation degree (document number 1, document number 2)] or [instance relation (document number 1, document number 2)].

ドキュメント重要性計算部１３０は、意味的関連形成部１２０によって形成された意味的関連に基づいてドキュメント重要性スコアを計算するように構成される。計算されたドキュメント重要性スコアに従って、ドキュメント重要性計算部１３０は、目標ドキュメント集合における目標ドキュメントを並び替えることが可能である。 The document importance calculator 130 is configured to calculate a document importance score based on the semantic association formed by the semantic association creator 120. According to the calculated document importance score, the document importance calculation unit 130 can rearrange the target documents in the target document set.

１つの実施の形態において、ドキュメント重要性計算部１３０は、目標ドキュメントの重要性スコアを計算するために関連基準数（すなわち、外部ソース内の高い関係ドキュメントの数／インスタンス数）を用いる。これは、関連基礎数に基づいたドキュメント重要性計算と称する。具体的には、ドキュメント重要性計算部１３０は、まず、目標ドキュメント集合の各ドキュメントについて、そのドキュメントと高い関連性を有する外部ソース内のドキュメント数／インスタンス数を計算する。次に、ドキュメント重要性計算部１３０は、最大個数の高関連ドキュメントを有するドキュメントを見つけ出し、各目標ドキュメントの高関連ドキュメント数をその最大個数で割り、その目標ドキュメントについて重要性スコアを取得する。 In one embodiment, the document importance calculator 130 uses the number of related criteria (ie, the number of high related documents / instances in the external source) to calculate the importance score of the target document. This is referred to as document importance calculation based on the relevant basis number. Specifically, the document importance calculation unit 130 first calculates, for each document in the target document set, the number of documents / instances in the external source having high relevance to the document. Next, the document importance calculation unit 130 finds a document having the maximum number of highly related documents, divides the number of highly related documents of each target document by the maximum number, and acquires an importance score for the target document.

例えば、３つの目標ドキュメントｄ１、ｄ２およびｄ３がある場合を想定する。それに応じて、これらの３つの目標ドキュメントは、それぞれ、外部ソース内１０、８および６の高関連ドキュメント／インスタンスを有する。したがって、ｄ１が、最大個数（１０）の高関連ドキュメントを有するドキュメントである。その結果、ｄ１、ｄ２およびｄ３に対する重要性スコアは、それぞれ、１０／１０の＝１、８／１０＝０．８、６／１０＝０．６と計算される。
それにより、ドキュメント重要性の順位付けは、ｄ１＞ｄ２＞ｄ３となる。 For example, assume that there are three target documents d1, d2, and d3. Accordingly, these three target documents have 10, 8 and 6 highly related documents / instances in the external source, respectively. Therefore, d1 is a document having the maximum number (10) of highly related documents. As a result, importance scores for d1, d2, and d3 are calculated as 10/10 = 1, 8/10 = 0.8, and 6/10 = 0.6, respectively.
As a result, the document importance ranking is d1>d2> d3.

他の実施例では、ドキュメント重要性計算部１３０は、目標ドキュメント間の関係をモデル化し、その後グラフ解析によってドキュメント重要性を計算するために、目標ドキュメントと外部ドキュメントの間の意味的関連に基づいてグラフ構造を生成する。これはグラフ構造に基づいたドキュメント重要性計算と称する。
具体的には、ドキュメント重要性計算部１３０は、一対の目標ドキュメント毎に、それらに共通の高関連ドキュメントの個数／インスタンスの個数を計算する。その後、ドキュメント重要性計算部１３０は、各目標ドキュメントをノードとして用い、一対の目標ドキュメント毎の共通の高関連ドキュメントの個数／インスタンスの個数を一対の目標ドキュメント（ノード）間の辺の重みとして用いることにより、グラフ構造を形成する。次に、ドキュメント重要性計算部１３０は、各ノードの辺の重みの和を計算する。最後に、ドキュメント重要性計算部１３０は、辺の重みの最大和を有する目標ドキュメント（ノード）を見つけ出し、目標ドキュメントの重要性スコアを取得するために辺の重みの最大和で各目標ドキュメントの辺の重みの和を割る。 In other embodiments, the document importance calculator 130 models the relationship between target documents and then calculates the document importance by graph analysis based on the semantic relationship between the target document and the external document. Generate a graph structure. This is called document importance calculation based on the graph structure.
Specifically, the document importance calculation unit 130 calculates the number of highly related documents / number of instances common to each pair of target documents. Thereafter, the document importance calculation unit 130 uses each target document as a node, and uses the number of common high-relevance documents / number of instances for each pair of target documents as a weight of an edge between the pair of target documents (nodes). As a result, a graph structure is formed. Next, the document importance calculation unit 130 calculates the sum of the edge weights of each node. Finally, the document importance calculation unit 130 finds the target document (node) having the maximum sum of the edge weights, and obtains the importance score of the target document by using the maximum sum of the edge weights. Divide the sum of the weights.

例えば、３つの目標ドキュメントｄ１、ｄ２およびｄ３がある場合を想定する。それに応じて、目標ドキュメントｄ１とｄ２は、外部ソースにおける共通の高関連ドキュメントの個数／インスタンスの個数５を有する。目標ドキュメントｄ２とｄ３は、外部ソースにおける共通の高関連ドキュメントの個数／インスタンスの個数６を有する。また、目標ドキュメントｄ１とｄ３は、外部ソースにおける共通の高関連ドキュメントの個数／インスタンスの個数４を有する。この場合、目標ドキュメントｄ１は、ノードとして、５＋４＝９の辺の重みの和を有する。目標ドキュメントｄ２は、ノードとして、５＋６＝１１の辺の重みの和を有する。目標ドキュメントｄ３は、ノードとして、６＋４＝１０の辺の重みの和を有する。それゆえ、目標ドキュメントｄ２が、辺の重みの最大和（１１）を有する目標ドキュメントである。それに応じて、ｄ１、ｄ２およびｄ３に対する重要性スコアが、９／１１≒０．８１、１１／１１＝１、１０／１１≒０．９として、それぞれ計算される。それにより、ドキュメント重要性の順位付けは、ｄ２＞ｄ３＞ｄ１となる。 For example, assume that there are three target documents d1, d2, and d3. Accordingly, the target documents d1 and d2 have a common highly relevant document number / instance number 5 in the external source. The target documents d2 and d3 have a common highly related document number / instance number 6 in the external source. Also, the target documents d1 and d3 have a common highly related document number / instance number 4 in the external source. In this case, the target document d1 has a sum of the weights of 5 + 4 = 9 sides as nodes. The target document d2 has a sum of weights of 5 + 6 = 11 sides as nodes. The target document d3 has 6 + 4 = 10 side weight sums as nodes. Therefore, the target document d2 is the target document having the maximum sum of edge weights (11). Accordingly, importance scores for d1, d2, and d3 are calculated as 9 / 11≈0.81, 11/11 = 1, 10 / 11≈0.9, respectively. Thereby, the document importance ranking is d2> d3> d1.

他の実施の形態において、ドキュメント重要性計算部１３０は、ドキュメント重要性スコアを計算するために、関連基準数とグラフ構造の組み合わせを用いる。その基本概念は、ドキュメント重要性を計算する（ワンステップ計算）ための関連基準数とグラフ構造のどちらかの利用がある程度まで一方的かもしれないということにある。関連基準数による方法は、目標ドキュメントについて高関連ドキュメント／インスタンスドキュメントの数だけを考慮する。一方、グラフ構造による方法は、２つのドキュメントに対する共通の主題数だけを考慮する。この場合、取得されたドキュメント重要性スコアが互いに非常に接近している場合、ドキュメントの重要性をさらに区別することが必要となる可能性がある。例えば、重要性スコアが１つの方法（例えば、関連基準数）を用いることにより計算されている目標ドキュメントを、それらの重要性スコアに基づいた多数のグループにクラスタリングし（すなわち、絶対的或いは相対的距離が比較的近い重要性スコアを有するドキュメントが互いにグループ化される）、その後、別の方法（すなわち、グラフ構造）を用いることによりグループ化されたドキュメントにさらに次の計算（ツーステップ計算）を適用することが可能である。この方法においては、第１の方法によって取得される非常に類似する重要性スコアを有するドキュメントは、目標ドキュメントの重要性を総合的判定するために、重要性についてさらに区分される。ここで、２つの任意の目標ドキュメント間の絶対的な距離は、２つの目標ドキュメントの重要性スコア間の差として定義される。差が小さければ小さいほど絶対的な距離はより短くなる。さらに、２つの任意の目標ドキュメント間の相対的な距離は、２つの重要性スコアの最大値に対する２つの目標ドキュメントの重要性スコア間の差の割合として定義される。割合が小さければ小さいほど相対的な距離はより短くなる。 In another embodiment, the document importance calculator 130 uses a combination of related reference numbers and graph structures to calculate a document importance score. The basic concept is that the use of either the associated reference number or the graph structure for calculating document importance (one-step calculation) may be unilateral to some extent. The related criterion number method only considers the number of highly related documents / instance documents for the target document. On the other hand, the graph structure method only considers a common number of subjects for two documents. In this case, if the obtained document importance scores are very close to each other, it may be necessary to further differentiate the importance of the document. For example, target documents whose importance scores are calculated by using one method (eg, number of related criteria) are clustered into multiple groups based on their importance scores (ie, absolute or relative Documents with importance scores that are relatively close in distance are grouped together), and then the next calculation (two-step calculation) is performed on the grouped documents by using another method (ie, graph structure). It is possible to apply. In this method, documents with very similar importance scores obtained by the first method are further classified for importance in order to comprehensively determine the importance of the target document. Here, the absolute distance between any two target documents is defined as the difference between the importance scores of the two target documents. The smaller the difference, the shorter the absolute distance. Furthermore, the relative distance between any two target documents is defined as the ratio of the difference between the importance scores of the two target documents to the maximum of the two importance scores. The smaller the percentage, the shorter the relative distance.

特に、関連基準数に基づいて計算された重要性スコアを有する目標ドキュメントは、２つのグループに分割される。各グループ内の全ての目標ドキュメントは同じ重要性スコアＳ１を有している。Ｓ１は、グループ内の全ての目標ドキュメントの重要性スコアの平均である。同時に、全ての目標ドキュメントは、グラフ構造に基づいて計算した各自の重要性スコアＳ２を有している。これにより、特定の目標ドキュメントの重要性スコアは、Ｓ＝Ｓ１×Ｗ＋Ｓ２として計算することができる。ここで、Ｗは所定の係数値（例えば１０）である。 In particular, target documents having importance scores calculated based on the relevant reference number are divided into two groups. All target documents in each group have the same importance score S1. S1 is the average importance score of all target documents in the group. At the same time, every target document has its own importance score S2 calculated based on the graph structure. Thereby, the importance score of a specific target document can be calculated as S = S1 × W + S2. Here, W is a predetermined coefficient value (for example, 10).

一例として、３つの目標ドキュメントｄ１、ｄ２およびｄ３があると想定する。それに応じて、これらの３つの目標ドキュメントは、それぞれ外部ソース内に、１０個、８個および６個の高関連ドキュメント／インスタンスを有する。さらに、目標ドキュメントｄ１およびｄ２は、外部ソース内に、５個の共通の高関連ドキュメント／インスタンスを有し、目標ドキュメントｄ２およびｄ３は、外部ソース内に、６個の共通の高関連ドキュメント／インスタンスを有し、目標ドキュメントｄ１およびｄ３は、外部ソース内に、４個の共通の高関連ドキュメント／インスタンスを有する。 As an example, assume there are three target documents d1, d2, and d3. Accordingly, these three target documents each have 10, 8 and 6 highly related documents / instances in the external source. Furthermore, the target documents d1 and d2 have 5 common highly related documents / instances in the external source, and the target documents d2 and d3 have 6 common highly related documents / instances in the external source. And the target documents d1 and d3 have four common highly related documents / instances in the external source.

まず、ドキュメント重要性計算部１３０は、関連基準数に基づいて、ｄ１、ｄ２およびｄ３の重要性スコアを、それぞれ１０／１０の＝１、８／１０の＝０．８、６／１０＝０．６と計算する。さらに、ドキュメント重要性計算部１３０は、グラフ構造に基づいて、ｄ１、ｄ２およびｄ３の重要性スコアを、９／１１≒０．８１、１１／１１＝１および１０／１１≒０．９と計算する。 First, the document importance calculation unit 130 sets the importance scores of d1, d2, and d3 to 10/10 = 1, 8/10 = 0.8, and 6/10 = 0 based on the related reference numbers, respectively. .6. Further, the document importance calculation unit 130 calculates importance scores of d1, d2, and d3 as 9 / 11≈0.81, 11/11 = 1, and 10 / 11≈0.9 based on the graph structure. To do.

その後、ドキュメントｄ１、ｄ２およびｄ３は、関連基準数に基づく計算結果に従って２つのグループに分割される。第１のグループがｄ１とｄ２を含み、第２のグループがｄ３を含む。具体的なグルーピング処理は以下の通りである。まず、最大の重要スコアを有するｄ１が、第１のグループに分配され、最小の重要スコアを有するｄ３が、第２のグループに分配される。その後、ｄ２とｄ１の間の相対的な距離((1-0.8)/1=0.2)がｄ２とｄ３の間の相対的な距離((0.8-0.6)/0.8=0.25)より短いので、ｄ２は第１のグループに分配される。 Thereafter, the documents d1, d2, and d3 are divided into two groups according to the calculation result based on the related reference number. The first group includes d1 and d2, and the second group includes d3. A specific grouping process is as follows. First, d1 having the highest importance score is distributed to the first group, and d3 having the lowest importance score is distributed to the second group. After that, since the relative distance between d2 and d1 ((1-0.8) /1=0.2) is shorter than the relative distance between d2 and d3 ((0.8-0.6) /0.8=0.25), d2 Are distributed to the first group.

その結果、第１のグループ中の全ての目標ドキュメントは、重要スコアＳ１＝(1+0.8)/2=0.9を有し、第２のグループ中の全ての目標ドキュメントは、重要スコアＳ２＝０．６を有する。したがって、ｄ１、ｄ２およびｄ３の重要性スコアは、組み合わせた方法では以下のように計算される。
ｄ１の重要スコア：０．９×１０＋０．８１＝９．８１
ｄ２の重要スコア：０．９×１０＋１＝１０
ｄ３の重要スコア：０．６×１０＋０．９＝６．９
従って、ドキュメント重要性の最終順位は、ｄ２＞ｄ１＞ｄ３である。 As a result, all target documents in the first group have an importance score S1 = (1 + 0.8) /2=0.9, and all target documents in the second group have an importance score S2 = 0. 6. Therefore, the importance scores for d1, d2, and d3 are calculated as follows in the combined method:
Important score for d1: 0.9 × 10 + 0.81 = 9.81
Important score of d2: 0.9 × 10 + 1 = 10
Important score for d3: 0.6 × 10 + 0.9 = 6.9
Therefore, the final ranking of document importance is d2>d1> d3.

上記の組み合わせた計算が、所定の要求（例えば、同一或いは類似の重要性スコアを有する２つ以上の目標ドキュメントがまだ存在する）を満足しない場合、組み合わせた計算の上記原理によって、要求が満足されるまで、目標ドキュメントを、計算のために、３つ、４つのあるいはそれ以上のグループに再分割することが可能であることは当業者によって容易に理解できるであろう。 If the above combined calculation does not satisfy a given requirement (eg, there are still two or more target documents with the same or similar importance score), the above principle of the combined calculation satisfies the requirement. Until then, it will be readily appreciated by those skilled in the art that the target document can be subdivided into three, four or more groups for calculation.

図２は、本発明の他の実施の形態によるドキュメント重要性を計算するための装置２０を示す。図２に示すように、ドキュメント重要性を計算する装置２０は、図１に示される意味的関連形成部１２０及びドキュメント重要性計算部１３０と同じである意味的関連形成部２２０とドキュメント重要性計算部２３０をそれぞれ備えている。さらに、ドキュメント重要性を計算する装置２０は、さらに意味的関連格納部２４０を備える。 FIG. 2 shows an apparatus 20 for calculating document importance according to another embodiment of the present invention. As shown in FIG. 2, the apparatus 20 for calculating document importance is the same as the semantic relation formation unit 120 and the document importance calculation unit 130 shown in FIG. Each unit 230 is provided. Furthermore, the apparatus 20 for calculating document importance further comprises a semantic relation storage 240.

意味的関連格納部２４０は、意味的関連形成部２２０によって形成された意味的関連を格納するように構成される。例えば、意味的関連格納部２４０は、目標ドキュメント集合からのドキュメントと外部ドキュメント集合からのドキュメント間の特定の関係（意味的関連）を、［関連度（文書番号１，文書番号２）］あるいは［インスタンス関係（文書番号１，文書番号２）］の形式で格納する。また、意味的関連格納部２４０は、揮発性と不揮発性メモリを含む様々なタイプのメモリ（例えば、読み出し専用メモリ（ＲＯＭ）、フラッシュ・メモリ等）によって実現することが可能である。 The semantic association storage unit 240 is configured to store the semantic association formed by the semantic association formation unit 220. For example, the semantic relation storage unit 240 indicates a specific relation (semantic relation) between a document from the target document set and a document from the external document set by [relation degree (document number 1, document number 2)] or [ Instance relationship (document number 1, document number 2)] is stored. In addition, the semantic relation storage unit 240 can be realized by various types of memories including volatile and nonvolatile memories (for example, read-only memory (ROM), flash memory, and the like).

意味的関連格納部２４０の利用は、形成された意味的関連を格納する性能において有効である。これにより、ドキュメント重要性を再計算する時に意味的関連を再形成する必要がなくなり、その結果、多くの動作が省かれる。 The use of the semantic relation storage unit 240 is effective in the performance of storing the formed semantic relation. This eliminates the need to recreate semantic associations when recalculating document importance, thus eliminating many actions.

図３は、本発明の他の実施の形態によるドキュメント重要性を計算する方法３０を説明するフローチャートである。この方法３０はステップＳ３１０から開始する。 FIG. 3 is a flowchart illustrating a method 30 for calculating document importance according to another embodiment of the present invention. The method 30 starts at step S310.

ステップＳ３３０で、目標ドキュメント集合の目標ドキュメントと外部ドキュメント集合の外部ドキュメント間の特定の関係を、意味的関連として形成する。ここで、外部ドキュメント集合は、目標ドキュメント集合における目標ドキュメントの重要性を計算するための暗黙の知識ソースとして用いられる。具体的には、２つのドキュメント間のテキスト類似度を２つのドキュメント間の関連度合いを測定するために用いることができ、そして、意味的関連を関連度合いに基づいて形成することができる。さらに、２つのドキュメント間のクラス−インスタンス関係を形成するために分類方法が用いられ、そして、意味的関連はインスタンス関係に基づいて形成することが可能である。すなわち、目標ドキュメント集合のドキュメントはクラスドキュメント（テンプレート）と見なされ、外部ドキュメント集合のドキュメントはインスタンスドキュメントと見なされる。その後、インスタンスドキュメントがクラスドキュメントに分類される確率が計算される。 In step S330, a specific relationship between the target document of the target document set and the external document of the external document set is formed as a semantic relationship. Here, the external document set is used as an implicit knowledge source for calculating the importance of the target document in the target document set. Specifically, text similarity between two documents can be used to measure the degree of association between the two documents, and a semantic association can be formed based on the degree of association. In addition, a classification method is used to form a class-instance relationship between two documents, and a semantic association can be formed based on the instance relationship. That is, the documents in the target document set are regarded as class documents (templates), and the documents in the external document set are regarded as instance documents. Thereafter, the probability that the instance document is classified as a class document is calculated.

選択的なステップＳ３４０で、ステップＳ３３０で形成された意味的関連を、メモリに格納することが可能である。これにより、ドキュメント重要性を再計算する時にステップＳ３３０を再度実行する必要がなくなる。その後、方法３０はステップＳ３５０に進む。Ｓ３４０が省略されれば、方法３０はステップＳ３３０の後にステップＳ３５０に直ちに進む。 In optional step S340, the semantic association formed in step S330 can be stored in memory. This eliminates the need to re-execute step S330 when recalculating document importance. Thereafter, the method 30 proceeds to step S350. If S340 is omitted, the method 30 proceeds immediately to step S350 after step S330.

ステップＳ３５０で、ドキュメントの重要性スコアを意味的関連に基づいて計算する。このステップにおいて、ワンステップ計算（例えば、上述した、関連基準数に基づくドキュメント重要性計算またはグラフ構造に基づくドキュメント重要性計算）、或いはツーステップ計算（組み合わせ計算と称する。すなわち、関連基準数とグラフ構造の組み合わせに基づいたドキュメント重要性計算）を用いることが可能である。その後、目標ドキュメント集合の目標ドキュメントは計算されたドキュメント重要性スコアに基づいてソートされる。 In step S350, a document importance score is calculated based on the semantic association. In this step, a one-step calculation (e.g., document importance calculation based on the relevant reference number or document importance calculation based on the graph structure described above) or two-step calculation (referred to as a combination calculation, i.e. related reference number and graph). Document importance calculation based on a combination of structures can be used. The target documents in the target document set are then sorted based on the calculated document importance score.

最後に、ステップＳ３６０で方法３０が終了する。 Finally, method 30 ends at step S360.

本発明によれば、ドキュメント間にハイパーリンクが存在しない場合、ドキュメントの重要性スコアを計算し、重要性スコアに基づいてキュメントを順位付けることができ、それによって、文献検索の正確さが向上する。 According to the present invention, when there is no hyperlink between documents, the importance score of the document can be calculated and the documents can be ranked based on the importance score, thereby improving the accuracy of the literature search. .

以上、本発明についてその好適な実施例を参照して説明したが、当該技術に精通した当業者には、本発明の精神と範囲から逸脱することなく他の様々な修正、変更、追加を行うことが可能なことは明らかであろう。したがって、本発明の範囲は上記の具体的な実施例に限定されず、付記した請求項によってのみ限定される。 Although the present invention has been described with reference to preferred embodiments thereof, various other modifications, changes and additions can be made by those skilled in the art without departing from the spirit and scope of the present invention. It will be clear that this is possible. Accordingly, the scope of the invention is not limited to the specific embodiments described above, but only by the appended claims.

さらに、上記実施形態の一部又は全部は、以下の付記のようにも記載されうるが、これに限定されない。 Further, a part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
目標ドキュメント集合内の目標ドキュメントと外部ドキュメント集合内の外部ドキュメントとの間の意味的関連を形成するように構成された意味的関連形成部と、前記意味的関連に基づいて前記目標ドキュメントの重要性スコアを計算するように構成されたドキュメント重要性計算部と
を備えることを特徴とするドキュメント重要性を計算する装置。 (Appendix 1)
A semantic association formation unit configured to form a semantic association between a target document in the target document set and an external document in the external document set, and the importance of the target document based on the semantic association A document importance calculator configured to calculate a score and
An apparatus for calculating document importance, comprising:

（付記２）
前記意味的関連形成部は、前記目標ドキュメントと前記外部ドキュメント間のテキスト類似度を測定することにより、前記目標ドキュメントと前記外部ドキュメント間の意味的関連を形成するように構成されることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 2)
The semantic relation forming unit is configured to form a semantic relation between the target document and the external document by measuring a text similarity between the target document and the external document. An apparatus for calculating document importance according to appendix 1.

（付記３）
前記意味的関連形成部は、前記目標ドキュメントと前記外部ドキュメントをそれぞれクラスドキュメントとインスタンスドキュメントと定義し、前記インスタンスドキュメントが前記クラスドキュメントに分類される確率を計算することにより、前記目標ドキュメントと前記外部ドキュメントの間の意味的関連を形成するように構成されることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 3)
The semantic relation forming unit defines the target document and the external document as a class document and an instance document, respectively, and calculates a probability that the instance document is classified into the class document, thereby calculating the target document and the external document. The apparatus of claim 1, wherein the apparatus is configured to form a semantic association between documents.

（付記４）
前記ドキュメント重要性計算部は、前記目標ドキュメントと関連する前記外部ドキュメントの数に基づいて前記目標ドキュメントの重要性スコアを計算するように構成されることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 4)
The document importance of claim 1, wherein the document importance calculator is configured to calculate an importance score of the target document based on the number of the external documents associated with the target document. Device to calculate.

（付記５）
前記ドキュメント重要性計算部は、前記目標ドキュメントをノードとして含むグラフ構造であって、前記グラフ構造内の２つのノードを接続する各辺が、前記辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、前記目標ドキュメントの辺の重みの和に基づいて、前記目標ドキュメントの重要性スコアを計算するように構成されることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 5)
The document importance calculation unit is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge Appendix 1 is configured to generate a graph structure having weights determined by the number of and to calculate an importance score for the target document based on a sum of edge weights of the target document A device that calculates document importance as described in.

（付記６）
前記ドキュメント重要性計算部は、前記目標ドキュメントをノードとして含むグラフ構造であって、前記グラフ構造内の２つのノードを接続する各辺が、前記辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、前記目標ドキュメントの辺の重みの和と目標ドキュメントと関連する外部ドキュメントの数に基づいて、前記目標ドキュメントの重要性スコアを計算するように構成されることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 6)
The document importance calculation unit is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge Generating a graph structure having a weight determined by a number of and calculating an importance score for the target document based on a sum of edge weights of the target document and the number of external documents associated with the target document The apparatus for calculating document importance as set forth in appendix 1, wherein the apparatus is configured.

（付記７）
前記ドキュメント重要性計算部は、前記目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの中間の重要性スコアを計算し、前記中間の重要性スコアと前記目標ドキュメントの辺の重みの和に基づいて目標ドキュメントの重要性スコアを計算するように構成されることを特徴とする付記６に記載のドキュメント重要性を計算する装置。 (Appendix 7)
The document importance calculation unit calculates an intermediate importance score of the target document based on the number of external documents associated with the target document, and calculates a sum of the intermediate importance score and the edge weight of the target document. The apparatus for calculating document importance according to claim 6, wherein the apparatus is configured to calculate an importance score of the target document based on the document.

（付記８）
前記意味的関連形成部によって形成された意味的関連を格納するように構成された意味的関連格納部をさらに備えることを特徴とする付記１に記載のドキュメント重要性を計算する装置。 (Appendix 8)
The apparatus of claim 1, further comprising a semantic association store configured to store a semantic association formed by the semantic association formation unit.

（付記９）
目標ドキュメント集合内の目標ドキュメントと外部ドキュメント集合内の外部ドキュメントとの間の意味的関連を形成するステップと、前記意味的関連に基づいて前記目標ドキュメントの重要性スコアを計算するステップと
を備えることを特徴とするドキュメント重要性を計算する方法。 (Appendix 9)
Forming a semantic association between a target document in the target document set and an external document in the external document set; calculating an importance score for the target document based on the semantic association;
A method of calculating document importance, characterized by comprising:

（付記１０）
前記形成ステップは、前記目標ドキュメントと前記外部ドキュメント間のテキスト類似度を測定することにより、前記目標ドキュメントと前記外部ドキュメント間の意味的関連を形成するステップを含むことを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 10)
The supplementary note 9, wherein the forming step includes the step of forming a semantic relationship between the target document and the external document by measuring a text similarity between the target document and the external document. How to calculate the document importance of.

（付記１１）
前記形成ステップは、前記目標ドキュメントと前記外部ドキュメントをそれぞれクラスドキュメントとインスタンスドキュメントと定義し、前記インスタンスドキュメントが前記クラスドキュメントに分類される確率を計算することにより、前記目標ドキュメントと前記外部ドキュメントの間の意味的関連を形成するステップを含むことを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 11)
The forming step defines the target document and the external document as a class document and an instance document, respectively, and calculates a probability that the instance document is classified as the class document, thereby forming the target document and the external document. The method of calculating document importance according to claim 9 including the step of forming a semantic association of:

（付記１２）
前記計算ステップは、前記目標ドキュメントと関連する前記外部ドキュメントの数に基づいて前記目標ドキュメントの重要性スコアを計算するステップを含むことを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 12)
The method of claim 9, wherein the calculating step includes calculating an importance score for the target document based on the number of the external documents associated with the target document.

（付記１３）
前記計算ステップは、前記目標ドキュメントをノードとして含むグラフ構造であって、前記グラフ構造内の２つのノードを接続する各辺が、前記辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、前記目標ドキュメントの辺の重みの和に基づいて、前記目標ドキュメントの重要性スコアを計算するステップを含むことを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 13)
The calculating step includes a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is determined by the number of external documents associated with the two nodes connected by the edge. The document importance of claim 9 including the step of generating a graph structure having determined weights and calculating an importance score for the target document based on a sum of edge weights of the target document. How to calculate gender.

（付記１４）
前記計算ステップは、前記目標ドキュメントをノードとして含むグラフ構造であって、前記グラフ構造内の２つのノードを接続する各辺が、前記辺によって接続される２つのノードと関連する外部ドキュメントの数によって決定される重みを有するグラフ構造を生成し、前記目標ドキュメントの辺の重みの和と目標ドキュメントと関連する外部ドキュメントの数に基づいて、前記目標ドキュメントの重要性スコアを計算するステップを含むことを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 14)
The calculating step includes a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is determined by the number of external documents associated with the two nodes connected by the edge. Generating a graph structure having determined weights, and calculating an importance score for the target document based on a sum of edge weights of the target document and the number of external documents associated with the target document. A method of calculating document importance according to appendix 9, which is characterized.

（付記１５）
前記計算ステップは、前記目標ドキュメントと関連する外部ドキュメントの数に基づいて目標ドキュメントの中間の重要性スコアを計算し、前記中間の重要性スコアと前記目標ドキュメントの辺の重みの和に基づいて目標ドキュメントの重要性スコアを計算するステップを含むことを特徴とする付記１４に記載のドキュメント重要性を計算する方法。 (Appendix 15)
The calculating step calculates an intermediate importance score of the target document based on the number of external documents associated with the target document, and sets the target importance based on the sum of the intermediate importance score and the weight of the edge of the target document. 15. The method for calculating document importance according to appendix 14, comprising the step of calculating a document importance score.

（付記１６）
前記目標ドキュメントと前記外部ドキュメント間の意味的関連を格納するステップをさらに備えることを特徴とする付記９に記載のドキュメント重要性を計算する方法。 (Appendix 16)
The method of claim 9, further comprising the step of storing a semantic association between the target document and the external document.

１０：ドキュメント重要性を計算する装置
１２０：意味的関連形成部
１３０：ドキュメント重要性計算部
２０：ドキュメント重要性を計算する装置
２２０：意味的関連形成部
２３０：ドキュメント重要性計算部
２４０：意味的関連格納部 10: Device for calculating document importance 120: Semantic association forming unit 130: Document importance calculating unit 20: Device for calculating document importance 220: Semantic association forming unit 230: Document importance calculating unit 240: Semantic Related storage

Claims

A semantic association generator configured to form a semantic association between a target document in the target document set and an external document in the external document set;
An apparatus for calculating document importance, comprising: a document importance calculator configured to calculate an importance score of the target document based on the semantic association.

The semantic relation forming unit is configured to form a semantic relation between the target document and the external document by measuring a text similarity between the target document and the external document. An apparatus for calculating document importance according to claim 1.

The semantic relation forming unit defines the target document and the external document as a class document and an instance document, respectively, and calculates a probability that the instance document is classified into the class document, thereby calculating the target document and the external document. The apparatus for calculating document importance of claim 1, wherein the apparatus is configured to form a semantic association between documents.

The document importance of claim 1, wherein the document importance calculator is configured to calculate an importance score of the target document based on a number of the external documents associated with the target document. A device that calculates gender.

The document importance calculation unit is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge Generating a graph structure having a weight determined by a number of and calculating an importance score for the target document based on a sum of edge weights of the target document. An apparatus for calculating document importance according to 1.

The document importance calculation unit is a graph structure including the target document as a node, and each edge connecting two nodes in the graph structure is an external document associated with the two nodes connected by the edge Generating a graph structure having a weight determined by a number of and calculating an importance score for the target document based on a sum of edge weights of the target document and the number of external documents associated with the target document The apparatus of claim 1, wherein the apparatus calculates document importance.

The document importance calculation unit calculates an intermediate importance score of the target document based on the number of external documents associated with the target document, and calculates a sum of the intermediate importance score and the edge weight of the target document. 7. The apparatus for calculating document importance of claim 6, wherein the apparatus is configured to calculate an importance score for a target document based on the document.

The apparatus of claim 1, further comprising a semantic association store configured to store a semantic association formed by the semantic association creator.

Forming a semantic association between a target document in the target document set and an external document in the external document set;
Calculating the importance score of the target document based on the semantic association.

The method of claim 9, wherein the forming step includes forming a semantic association between the target document and the external document by measuring text similarity between the target document and the external document. How to calculate the document importance described.