JP3428554B2

JP3428554B2 - Semantic network automatic creation device and computer readable recording medium

Info

Publication number: JP3428554B2
Application number: JP2000057971A
Authority: JP
Inventors: 航李; 健司山西
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-02-29
Filing date: 2000-02-29
Publication date: 2003-07-22
Anticipated expiration: 2020-02-29
Also published as: JP2001243223A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、インターネット、
電子図書館、自然言語処理システム、音声処理システ
ム、画像処理システム、知的推論システムなどにおける
知識獲得、知識処理に利用される意味ネットワークを自
動的に作成する装置に関する。The present invention relates to the Internet,
The present invention relates to a device for automatically creating a semantic network used for knowledge acquisition and knowledge processing in an electronic library, a natural language processing system, a voice processing system, an image processing system, an intelligent reasoning system, etc.

【０００２】[0002]

【従来の技術】多くの知識処理システムでは、知識を意
味ネットワーク（或いは、概念ネットワーク、連想ネッ
トワーク）と呼ばれる表現形で表現している。意味ネッ
トワークとは単語と単語（或いは、概念と概念）の想起
関係を有向グラフで表すものである（岩波情報科学辞
典、長尾真など編集、岩波書店、1990年）。グラフにお
けるノードが単語、或いは概念を表し、ノードの間のリ
ンクが単語間、あるいは概念間の想起関係を表す。図１
５が示すのは意味ネットワークの例である。このネット
ワークでは、ノードは、例えば、「car 」、「truck 」
等の単語を表す。また、ノード間のリンクは、例えば、
「car 」から「truck 」への想起が強いこと等を表す。
想起は双方向のものもある。例えば、「car 」と「aut
o」の間には双方向のリンクが存在し、両者が互いに想
起しやすいことを表す。2. Description of the Related Art In many knowledge processing systems, knowledge is expressed in an expression form called a semantic network (or concept network or associative network). A semantic network is a directed graph that represents the association between words (or concepts and concepts) (Iwanami Information Science Dictionary, edited by Makoto Nagao, Iwanami Shoten, 1990). The nodes in the graph represent words or concepts, and the links between the nodes represent association relationships between words or concepts. Figure 1
5 shows an example of a semantic network. In this network, the nodes are, for example, "car", "truck".
Represents a word such as. Also, the links between the nodes are, for example,
It means that there is a strong recollection from "car" to "truck".
Some recollections are bidirectional. For example, "car" and "aut
There is a two-way link between "o", indicating that both parties easily recall each other.

【０００３】意味ネットワークはまた、インターネッ
ト、或いは電子図書館における検索に利用することがで
きる。たとえば、ユーザがインターネットで「車に関す
るホームページ」を検索したいとする。もし、「car 」
というキーワードだけで検索すると、「car 」という単
語の入っているホームページだけが収集され、ユーザが
欲しがる情報のすべて（或いは、多く）を見つけ出すこ
とができない。しかし、意味ネットワークを用いれば、
車に関係する情報をより多く見つけ出すことができる。
具体的には、意味ネットワーク上にある「car 」からリ
ンクされた単語、つまり想起関係の強い単語を探し、た
とえば、「auto」、「motor 」、「vehicle 」、「truc
k 」、「Toyota」等の単語が見つかった場合、それらの
単語を全部キーワードとし、ホームページを検索するこ
とができる。Semantic networks are also available for searching on the Internet or in electronic libraries. For example, suppose a user wants to search the Internet for a "home page about cars." If "car"
If you search only with the keyword, only homepages that contain the word "car" will be collected, and you will not be able to find all (or a lot) of the information that the user wants. But with the semantic network,
You can find out more information related to cars.
Specifically, search for words linked from "car" on the semantic network, that is, words with a strong association, and search for "auto", "motor", "vehicle", "truc".
When words such as "k" and "Toyota" are found, the home page can be searched by using all of these words as keywords.

【０００４】意味ネットワークはまた、自然言語処理シ
ステム、音声認識システム、画像処理システムなどにお
ける曖昧性解消に利用することができる。たとえば、OC
R 文字認識システムでは、画像データを文字データに変
換する際、読み込んだ単語が「人口」なのか、「入口」
なのかを判断する必要がある。同じ文に「出生」や「統
計」等が現われれば、これを「人口」と判断できるだろ
うし、「出口」や「通過」等が現われれば、これを「入
口」と判断できるだろう。このような「人口」と「出
生」が強く関わるという知識を意味ネットワークで表現
し、曖昧性解消に利用することができる。Semantic networks can also be used for disambiguation in natural language processing systems, speech recognition systems, image processing systems and the like. For example, OC
In R character recognition system, when converting image data to character data, whether the read word is "population" or "entrance"
It is necessary to judge whether it is. If "birth" or "statistics" appears in the same sentence, it can be judged as "population", and if "exit" or "passage" appears, it can be judged as "entrance". . Such knowledge that "population" and "birth" are strongly related can be expressed in a semantic network and used for disambiguation.

【０００５】意味ネットワークは更に、知的推論システ
ムにおける推論に利用することができる。たとえば、意
味ネットワークを用いて、なぜ「風が吹けば、桶屋が儲
かる」かを推理することができる。具体的には、意味ネ
ットワークにおける「風」ノードとつながっているノー
ドを見つけ出し、たとえば、「風→砂埃→盲人→三味線
→…→桶屋」というパス、つまり、複数のつながったノ
ードの列が見つかれば、「風が吹けば、桶屋が儲かる」
理由を推理することができる。また、「風」と「桶屋」
の間のあらゆる可能なパスを見つけ出すことによって
「風が吹けば、桶屋が儲かる」のすべての理由を発見す
ることができる。Semantic networks can also be used for reasoning in intelligent reasoning systems. For example, by using a semantic network, it is possible to infer why "if the wind blows, a trough is profitable". Specifically, if a node connected to the "wind" node in the semantic network is found, for example, the path "wind → dust → blind person → shamisen → ... → Okeya", that is, if a sequence of multiple connected nodes is found. , "If the wind blows, the trough is profitable"
You can infer the reason. Also, "wind" and "Okeya"
By finding all the possible paths between, you can discover all the reasons for "if the wind blows, the trough makes."

【０００６】また、ベイジアンネットワークと呼ばれる
知識表現も提案されている。ベイジアンネットワークは
同時確率分布における確率変数の依存関係を有向グラフ
で表すものである（Judea Pearl, Probabilistic Reaso
ning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufman Publishers Inc., SanMat
eo, California, 1988.）。ベイジアンネットワークも
意味ネットワークとみなすことができる。しかし、一般
的な意味ネットワークではループがあってもよいが、ベ
イジアンネットワークでは、ループがあってはならな
い。つまり、ベイジアンネットワークは限定された意味
ネットワークである。A knowledge representation called a Bayesian network has also been proposed. Bayesian networks represent the dependence of random variables in the joint probability distribution in a directed graph (Judea Pearl, Probabilistic Reaso
ning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufman Publishers Inc., SanMat
eo, California, 1988.). Bayesian networks can also be considered as semantic networks. However, in a general semantic network, there may be loops, but in Bayesian networks, there should be no loops. That is, the Bayesian network is a limited semantic network.

【０００７】一方、情報理論や数理統計の分野では、
「情報量尺度」と呼ばれる幾つかの量が提案されてい
る。たとえば、「確率的コンプレキシティ」という尺度
がリサネンによって提案された(Jorma Rissanen, Fishe
r Information and Stochastic Complexity, IEEE Tran
sactions on Information Theory, Vol.42., No. 1, p
p.40-47, 1996) 。確率的コンプレキシティは与えられ
たデータに含まれる、ある確率モデルに対する情報の量
を表す尺度である。「記述長最小の原理」と呼ばれるも
のもリサネンによって提案され、データの確率的コンプ
レキシティのもっとも小さいモデルがそのデータを生起
した確率分布にもっとも近く、統計的推定ではそのモデ
ルを選択すべきであると主張する。確率的コンプレキシ
ティは確率モデルによるデータを記述するための最短符
号長（或いは、記述長）としても解釈できる。また、た
とえば、「拡張型確率的コンプレキシティ」という尺度
が山西によって提案された(Kenji Yamanishi, A Decisi
on-Theoretic Extension of Stochastic Complexity an
d Its Applications to Learning, IEEE Transactions
onInformation Theory, Vol.44, No.4, pp. 1424-1439,
1998) 。拡張型確率的コンプレキシティは、やはり与
えられたデータに含まれる、あるモデルに対する量であ
るが、モデルが確率分布だけでなく任意の実数値関数の
パラメトリッククラスであってもよい、かつ損失関数が
対数損失だけでなく任意の歪み関数であってもよいとい
う意味で確率的コンプレキシティの拡張になる。この他
に、「赤池の情報量尺度」(Hirotugu Akaike, A New Lo
ok at the Statistical Model Identification, IEEE T
ransactions on Automatic Control, Vol.AC-19, No.
6, pp.716-723, 1974)、「エントロピー」( 岩波情報科
学辞典、長尾真など編集、岩波書店、1990年) などの情
報量尺度がある。On the other hand, in the fields of information theory and mathematical statistics,
Several quantities have been proposed, called "information measures." For example, the measure of "probabilistic complexity" was proposed by Rissenen (Jorma Rissanen, Fishe.
r Information and Stochastic Complexity, IEEE Tran
sactions on Information Theory, Vol.42., No. 1, p
p.40-47, 1996). Probabilistic complexity is a measure of the amount of information contained in a given data for a given probabilistic model. A so-called "minimum description length principle" was also proposed by Rissenen, and the model with the smallest stochastic complexity of the data is closest to the probability distribution that gave rise to the data, and statistical estimation should select that model. Insist. Stochastic complexity can also be interpreted as the shortest code length (or description length) for describing data by a probabilistic model. Also, for example, a measure called "extended stochastic complexity" was proposed by Yamanishi (Kenji Yamanishi, A Decisi
on-Theoretic Extension of Stochastic Complexity an
d Its Applications to Learning, IEEE Transactions
onInformation Theory, Vol.44, No.4, pp. 1424-1439,
1998). Extended stochastic complexity is a quantity for a model that is also included in given data, but the model may be a parametric class of any real-valued function as well as a probability distribution, and a loss function Is an extension of stochastic complexity in the sense that can be any distortion function, not just logarithmic loss. Besides this, "Akaike's information scale" (Hirotugu Akaike, A New Lo
ok at the Statistical Model Identification, IEEE T
ransactions on Automatic Control, Vol.AC-19, No.
6, pp.716-723, 1974), "Entropy" (Iwanami Information Science Dictionary, edited by Makoto Nagao, Iwanami Shoten, 1990).

【０００８】また、従来では、確率的コンプレキシテ
ィ、つまり、記述長最小の原理を用いてベイジアンネッ
トワークを学習する方法が考え出された（例えば、鈴木
譲、大嶽康隆、平沢茂一、記述長最小基準と状態分割の
立場からみた確率モデルの選択方法について、情報処理
学会論文、Vol.33, No.11, pp. 1281-1289, 1992) 。Further, conventionally, a method of learning a Bayesian network by using stochastic complexity, that is, a principle of minimum description length has been devised (for example, Joe Suzuki, Yasutaka Otake, Shigeichi Hirasawa, minimum description length). Regarding the selection method of probabilistic models from the viewpoint of criteria and state division, IPSJ Transactions, Vol.33, No.11, pp.1281-1289, 1992).

【０００９】更に、特開平11−96177 号公報には、単語
の種々の関係を認定できるオントロジを動的に生成し
て、広い対象領域にわたる大量の文書に対しても文書の
処理に必要とされる情報を充分に含む用語辞書を生成す
るという、一種の意味ネットワーク自動生成技術が開示
されている。具体的には、文書を形態素解析して得られ
る個々の単語毎に、その出現の重要度を示す１次統計量
（例えば、全ての文書に含まれる単語の全数に対する個
々の単語の出現回数の比）を求め、１次統計量の大きい
幾つかの単語を関連単語として選択する。次に、この選
択した関連単語の各々をノードとし、対象領域を代表的
に表す単語のノードから前記関連単語の各々のノードに
対してそれぞれ有向リンクを張ったグラフを初期生成す
る。次に、この生成されたグラフのノードの全ての２つ
のノードの組み合わせについて、各組み合わせの２つの
単語の同時出現についての統計量である共起統計量を計
算する。ここで、共起統計量は、その２つの単語が出現
する文書（或いは段落、文）の延べ数に対する当該２つ
の単語が同時に出現する文書（或いは段落、文）の延べ
数の割合として計算される。そして、計算した共起統計
量に基づいて前記グラフを変換し、リンクに関係ラベル
を付与して用語辞書を生成する。Further, in Japanese Patent Laid-Open No. 11-96177, an ontology capable of recognizing various relations of words is dynamically generated, and is required for processing a large number of documents over a wide target area. A kind of semantic network automatic generation technique of generating a term dictionary sufficiently containing information to be disclosed is disclosed. Specifically, for each word obtained by morphological analysis of a document, a primary statistic indicating the importance of its appearance (for example, the number of occurrences of an individual word with respect to the total number of words included in all documents) Ratio) is obtained, and some words having large primary statistics are selected as related words. Next, using each of the selected related words as a node, a graph in which a directed link is drawn from the node of the word representatively representing the target area to each node of the related word is initially generated. Next, for all combinations of two nodes of the nodes of the generated graph, co-occurrence statistics, which are statistics for simultaneous appearance of two words of each combination, are calculated. Here, the co-occurrence statistic is calculated as a ratio of the total number of documents (or paragraphs or sentences) in which the two words simultaneously appear to the total number of documents (or paragraphs or sentences) in which the two words appear. Then, the graph is converted based on the calculated co-occurrence statistic, a relation label is attached to the link, and a term dictionary is generated.

【００１０】[0010]

【発明が解決しようとする課題】意味ネットワークは広
く利用可能な知識であるが、従来ではその構築、作成を
人手に頼っていた。それには少なくとも二つの問題点が
ある。一つの問題は意味ネットワークの規模が通常極め
て大きいので、その作成のコストが多大なことである。
もう一つの問題は人間が定義した知識にはどうしても恣
意性が多く含まれることである。Semantic networks are widely available knowledge, but in the past, their construction and creation depended on humans. It has at least two problems. One problem is that the scale of a semantic network is usually quite large, so the cost of creating it is enormous.
Another problem is that human-defined knowledge inevitably contains a lot of arbitrariness.

【００１１】また、確率的コンプレキシティを用いて、
ベイジアンネットワークを学習する方法が提案された
が、「ループがあってはならない」という限定された意
味ネットワークしか作成できなかった。Further, using stochastic complexity,
A method of learning a Bayesian network has been proposed, but it has been possible to create only a limited meaning network that "there should be no loops".

【００１２】さらに、特開平11−96177 号公報に記載す
る技術では、対象領域を代表的に表す単語のノードから
関連単語の各々のノードに対してそれぞれ有向リンクを
張ったグラフを初期生成し、計算した共起統計量に基づ
いて前記グラフを変換してリンクに関係ラベルを付与し
ていくため、代表単語をルートノードとする木構造のネ
ットワークという限定された意味ネットワークしか作成
できない。つまり、ベイジアンネットワークを学習する
方法と同様に、ループを持つ意味ネットワークは構築で
きない。また、２つの単語Ａ，Ｂが出現する文書（或い
は段落、文）の延べ数に対する当該２つの単語Ａ，Ｂが
同時に出現する文書（或いは段落、文）の延べ数の割合
を示す共起統計量は、単語Ａについても、単語Ｂについ
ても同じ値になる。特開平11−96177 号公報ではこのよ
うな共起統計量に基づいてグラフ変換を行うため、単語
Ａから単語Ｂへの有向リンクの可否と単語Ｂから単語Ａ
への有向リンクの可否を、統計的な共起の度合いに基づ
いてそれぞれ独立に決定することはできない。Further, in the technique disclosed in Japanese Patent Laid-Open No. 11-96177, a graph in which a directed link is established from a node of a word representing a target area to each node of a related word is initially generated. Since the graph is converted based on the calculated co-occurrence statistic and the relation label is given to the link, only a limited meaning network called a tree-structured network having a representative word as a root node can be created. In other words, as with the method of learning Bayesian networks, we cannot construct semantic networks with loops. Further, the co-occurrence statistic showing the ratio of the total number of documents (or paragraphs or sentences) in which the two words A and B simultaneously appear to the total number of documents (or paragraph or sentence) in which the two words A and B appear is , The value is the same for both word A and word B. In Japanese Laid-Open Patent Publication No. 11-96177, since graph conversion is performed based on such co-occurrence statistics, whether or not a directed link from word A to word B is possible and word B to word A is possible.
It is not possible to independently decide whether or not to have a directed link to each of them based on the statistical degree of co-occurrence.

【００１３】本発明の目的は、複数のテキストを統計処
理して意味ネットワークを自動的に作成する意味ネット
ワーク自動作成装置を提供することにある。An object of the present invention is to provide a semantic network automatic creation apparatus for statistically processing a plurality of texts to automatically create a semantic network.

【００１４】本発明の別の目的は、ループを持つ意味ネ
ットワークも作成することができる意味ネットワーク自
動作成装置を提供することにある。Another object of the present invention is to provide a semantic network automatic creation apparatus capable of creating a semantic network having a loop.

【００１５】本発明の他の目的は、或るノードから別の
ノードへの有向リンクおよびその逆方向の有向リンクを
統計的な共起の度合いに基づいてそれぞれ独立に作成す
る意味ネットワーク自動作成装置を提供することにあ
る。Another object of the present invention is to automatically create a directional link from one node to another node and a directional link in the opposite direction thereof independently based on the degree of statistical co-occurrence. Providing a production device.

【００１６】[0016]

【課題を解決するための手段】第１の発明にかかる意味
ネットワーク自動作成装置は、単語からなる意味ネット
ワークを記憶する記憶部と、複数のテキストを入力し、
入力されたテキストに対して形態素解析を行い、形態素
解析を行ったテキストから、単語と単語の共起頻度、単
語の出現頻度、全テキスト数を統計する統計部と、前記
統計部から単語と単語の共起頻度、単語の出現頻度、全
テキスト数を入力し、入力された単語と単語の共起頻
度、単語の出現頻度、全テキスト数を基に、各単語に対
して、その単語から他の単語への想起の強さを情報量尺
度を用いて計算する計算部と、一つの単語を一つのノー
ドとして表現し、表現された各々のノードに対して、そ
のノードの単語から他のノードの単語への想起の強さを
前記計算部から参照し、参照された想起の強さが予め定
められた閾値以上の場合、そのノードから他のノードへ
有向リンクを張り、有向リンクが張られた有向グラフを
単語からなる意味ネットワークとして前記記憶部に出力
する作成部とを備え、前記計算部は、情報量尺度として
確率的コンプレキシティを用いると共に一つのテキスト
を一つのデータと見なし、任意の第１の単語から任意の
第２の単語への想起の強さを、テキストにおいて第２の
単語が出現したかどうかに着目した場合のデータの確率
的コンプレキシティと、第１の単語が出現するテキスト
群において第２の単語が出現したかどうかに着目した場
合のデータの確率的コンプレキシティおよび第１の単語
が出現しないテキスト群において第２の単語が出現した
かどうかに着目した場合のデータの確率的コンプレキシ
ティの和との差分として計算する。 A means for automatically creating a meaning network according to a first aspect of the present invention inputs a plurality of texts into a storage part for storing a meaning network composed of words,
Morphological analysis is performed on the input text, and from the text subjected to morphological analysis, a co-occurrence frequency of words and words, a frequency of occurrence of words, a statistics unit that statistics the total number of texts, and words and words from the statistics unit. Enter the co-occurrence frequency, word occurrence frequency, and total text number, and based on the input word and word co-occurrence frequency, word appearance frequency, and total text number, A calculation unit that calculates the strength of recollection of each word using an information measure, and one word is expressed as one node, and for each expressed node, the word from that node to other nodes is expressed. Referring to the strength of the recollection to the word from the calculation unit, if the strength of the referred recollection is greater than or equal to a predetermined threshold value, a directed link is established from that node to another node, and the directed link is A directed graph that has And a creation unit configured to output to the storage unit as a network, wherein the calculation unit, as the amount of information measure
One text with probabilistic complexity
Is regarded as one data, and any first word
The strength of the second word in the text
Probability of data when focusing on whether or not a word appears
Complexity and text in which the first word appears
If we focused on whether the second word appeared in the group,
Probabilistic Complexity and First Word of Combined Data
Second word appears in a text group in which does not appear
Probabilistic Complexity of Data Focusing on Whether
Calculated as the difference from the sum of tees.

【００１７】[0017]

【００１８】第２の発明にかかる意味ネットワーク自動
作成装置は、概念からなる意味ネットワークを記憶する
記憶部と、複数のテキストを入力し、入力されたテキス
トに対して形態素解析を行い、形態素解析を行ったテキ
ストに対して語義曖昧性解消を行い、語義曖昧性解消を
行ったテキストから概念と概念の共起頻度、概念の出現
頻度、全テキスト数を統計する統計部と、前記統計部か
ら概念と概念の共起頻度、概念の出現頻度、全テキスト
数を入力し、入力された概念と概念の共起頻度、概念の
出現頻度、全テキスト数を基に、各概念に対して、その
概念から他の概念への想起の強さを情報量尺度を用いて
計算する計算部と、一つの概念を一つのノードとして表
現し、表現された各々のノードに対して、そのノードの
概念から他のノードの概念への想起の強さを前記計算部
から参照し、参照された想起の強さが予め定められた閾
値以上の場合、そのノードから他のノードへ有向リンク
を張り、有向リンクが張られた有向グラフを概念からな
る意味ネットワークとして前記記憶部に出力する作成部
とを備え、前記計算部は、情報量尺度として確率的コン
プレキシティを用いると共に一つのテキストを一つのデ
ータと見なし、任意の第１の概念から任意の第２の概念
への想起の強さを、テキストにおいて第２の概念が出現
したかどうかに着目した場合のデータの確率的コンプレ
キシティと、第１の概念が出現するテキスト群において
第２の概念が出現したかどうかに着目した場合のデータ
の確率的コンプレキシティおよび第１の概念が出現しな
いテキスト群において第２の概念が出現したかどうかに
着目した場合のデータの確率的コンプレキシティの和と
の差分として計算する。 An apparatus for automatically creating a semantic network according to a second aspect of the present invention inputs a plurality of texts into a storage unit that stores a semantic network consisting of concepts, performs morphological analysis on the input texts, and performs morphological analysis. The word sense disambiguation is performed on the performed text, and the co-occurrence frequency of concepts and concepts, the appearance frequency of the concept, and the statistical part that statistics the total number of texts from the text that has been subjected to the word sense disambiguation, and the concept from the statistical part And the concept co-occurrence frequency, the concept appearance frequency, and the total number of texts are input. Based on the input concept and concept co-occurrence frequency, the concept appearance frequency, and the total number of texts, the concept A calculation unit that calculates the strength of recollection from one to another concept using an information measure, and one concept is expressed as one node, and for each expressed node, from the concept of that node to another No Referring to the strength of recollection to the concept of from the calculation unit, if the strength of the referred recollection is greater than or equal to a predetermined threshold value, a directed link is established from that node to another node, and the directed link is A calculation unit that outputs the stretched directed graph to the storage unit as a semantic network consisting of concepts , and the calculation unit is a stochastic computer as an information measure.
Use plexity and add one text to one
Data from the first concept to the second concept.
The second concept appears in the text
Probabilistic completion of data when focusing on whether or not
In texts where Kisity and the first concept appear
Data when focusing on whether or not the second concept appears
The stochastic complexity and the first concept of
Whether a second concept appeared in a group of texts
And the sum of the probabilistic complexity of the data
Calculate as the difference of.

【００１９】[0019]

【００２０】第３の発明にかかる意味ネットワーク自動
作成装置は、第１または第２の発明の構成に加えて更
に、ユーザから意味ネットワークにおけるノードの指定
を受けるインタフェース部と、前記インタフェース部か
らユーザに指定されたノードを入力し、また、記憶部か
ら意味ネットワークを入力し、入力された意味ネットワ
ークにおけるユーザに指定されたノードを含む部分意味
ネットワークを見つけ出す探索部と、前記探索部から部
分意味ネットワークを入力し、入力された部分意味ネッ
トワークを表示する表示部とを備えることを特徴とす
る。In addition to the configuration of the first or second aspect of the invention, the semantic network automatic creation apparatus according to the third aspect of the present invention further includes an interface section for receiving a designation of a node in the semantic network from the user, and the interface section to the user. A search unit for inputting a designated node and a semantic network from a storage unit for finding a partial semantic network including a node designated by a user in the input semantic network; and a partial semantic network from the searching unit. And a display unit for displaying the input partial meaning network.

【００２１】このように構成された本発明の意味ネット
ワーク自動作成装置にあっては、自然言語のテキストか
ら単語と単語（或いは、概念と概念）の共起頻度を統計
し、情報量尺度を用いて単語と単語（或いは、概念と概
念）の間の統計的な共起の強さを計算し、これを単語と
単語（或は、概念と概念) の想起の強さとし、それを用
いて意味ネットワークを自動的に構築する。In the apparatus for automatically creating a meaning network of the present invention configured as described above, the co-occurrence frequency of words and words (or concepts and concepts) is statistically calculated from the text of natural language, and the information amount scale is used. Then, the statistical strength of co-occurrence between words (or concepts and concepts) is calculated, and this is used as the strength of recall between words (or concepts and concepts). Build a network automatically.

【００２２】意味ネットワークを自動的に構築するの
で、その作成に必要な工数を大幅に削減することができ
る。また、データを基に理論的な基礎のしっかりした尺
度で単語と単語（概念と概念）の間の想起の強さを計算
し、意味ネットワークを作成しているので、客観的に意
味ネットワークを構築することができる。Since the semantic network is automatically constructed, the number of steps required for its construction can be greatly reduced. In addition, the meaning network is constructed objectively by calculating the strength of recollection between words (concepts and concepts) based on the data on a solid scale of theoretical basis. can do.

【００２３】また、情報量尺度を用いて単語間の想起の
強さを基に意味ネットワークを作成するので、ループを
もつ意味ネットワークも作成することができる。つま
り、従来のベイジアンネットワークの学習法より一般的
な意味ネットワークを作成することができる。Further, since the semantic network is created based on the strength of recall between words using the information amount scale, it is possible to create a semantic network having a loop. That is, it is possible to create a more general semantic network than the conventional learning method of Bayesian network.

【００２４】更に、或る単語から別の単語（或る概念か
ら別の概念）への想起の強さを、その単語（概念）が出
現するテキスト群における他の単語（概念）の出現、非
出現にかかる統計量およびその単語（概念）が出現しな
いテキスト群における他の単語（概念）の出現、非出現
にかかる統計量を考慮して計算するため、或るノードか
ら別のノードへの有向リンクおよびその逆方向の有向リ
ンクを統計的な共起の度合いに基づいてそれぞれ独立に
作成することができる。Furthermore, the strength of recollection from a certain word to another word (a certain concept to another concept) is defined as the appearance / non-occurrence of another word (concept) in the text group in which the word (concept) appears. Since the statistic for appearance and the statistic for appearance and non-occurrence of other words (concepts) in the text group in which the word (concept) does not appear are taken into consideration, calculation is performed from one node to another node. The directed link and the directed link in the opposite direction can be created independently based on the statistical degree of co-occurrence.

【００２５】[0025]

【発明の実施の形態】本発明の意味ネットワーク自動作
成装置の第一の実施例について説明する。図１はその構
成を示し、図２はその処理の流れを示す。この意味ネッ
トワーク自動作成装置１０は、記憶部１、統計部２、計
算部３、作成部４から構成される。BEST MODE FOR CARRYING OUT THE INVENTION A first embodiment of an automatic network creation device of the present invention will be described. FIG. 1 shows its configuration, and FIG. 2 shows its processing flow. The semantic network automatic creation device 10 includes a storage unit 1, a statistics unit 2, a calculation unit 3, and a creation unit 4.

【００２６】統計部２は、図示しないキーボード等の入
力装置やフロッピィディスク装置等の記憶装置から複数
のテキストを入力し（ステップＳ１）、それらのテキス
トに対して形態素解析を行い（ステップＳ２）、さら
に、それらのテキストから単語と単語の共起頻度、単語
の出現頻度、全テキスト数を統計する（ステップＳ
３）。ここで、テキストは日本語や英語などの自然言語
で記述されたテキストを意味する。一つのテキストは、
例えば一文書、文書中の一段落、段落中の一文などであ
る。また、単語と単語の共起頻度とは、単語と単語が共
に出現したテキスト数のことである。また、単語の出現
頻度とは単語の出現したテキスト数のことである。The statistics section 2 inputs a plurality of texts from an input device such as a keyboard or a storage device such as a floppy disk device (not shown) (step S1), and performs morphological analysis on these texts (step S2). Further, the frequency of co-occurrence of words, the frequency of appearance of words, and the total number of texts are statistically calculated from those texts (step S
3). Here, the text means a text written in a natural language such as Japanese or English. One text is
For example, one document, one paragraph in the document, one sentence in the paragraph, and the like. Further, the word-to-word co-occurrence frequency is the number of texts in which a word and a word appear together. The word appearance frequency is the number of texts in which the word appears.

【００２７】次に計算部３は、統計部２から単語と単語
の共起頻度、単語の出現頻度、全テキスト数を入力し、
それらのデータを基に、単語と単語の統計的な共起の強
さを情報量尺度を用いて計算し、単語と単語との想起の
強さとする（ステップＳ４）。単語間の想起の強さは一
般的に非対称なものである。具体的には、単語Ａから単
語Ｂへの想起の強さは、単語Ｂから単語Ａへの想起の強
さとは通常異る。Next, the calculation unit 3 inputs the word-to-word co-occurrence frequency, the word appearance frequency, and the total number of texts from the statistical unit 2,
Based on these data, the strength of statistical co-occurrence between words is calculated using an information amount scale, and the strength of association between words is calculated (step S4). The strength of recall between words is generally asymmetric. Specifically, the strength of recollection from word A to word B is usually different from the strength of recollection from word B to word A.

【００２８】次に作成部４は、各単語をそれぞれノード
として表現し、また各ノードに対して、そのノードの単
語から他のノードの単語への想起の強さを計算部３から
参照し、想起の強さが予め定められた閾値以上の場合、
そのノードから他のノードへ有向リンクを張る（ステッ
プＳ５）。こうして作成した有向グラフを意味ネットワ
ークとして記憶部１に出力する（ステップＳ６）。Next, the creation unit 4 expresses each word as a node, and with respect to each node, the calculation unit 3 refers to the strength of recollection from the word of the node to the word of another node, When the strength of recall is equal to or greater than a predetermined threshold,
A directed link is established from that node to another node (step S5). The directed graph thus created is output to the storage unit 1 as a semantic network (step S6).

【００２９】以下、各部分をより詳しく説明する。Hereinafter, each part will be described in more detail.

【００３０】統計部２は、日本語、或は英語等で書かれ
た複数のテキストを入力とする。図３に、入力となる日
本語のテキストの一例を示す。図４は、図３のテキスト
に対して形態素解析を行った結果の例を示す。つまり、
日本語では、形態素解析によってテキストが単語に分割
される。図５に、入力となる英語のテキストの一例を示
す。図６は、図５のテキストに対して形態素解析を行っ
た結果の例を示す。つまり、英語では、形態素解析によ
って単語が原型へ変換される。The statistics section 2 inputs a plurality of texts written in Japanese, English, or the like. FIG. 3 shows an example of Japanese text to be input. FIG. 4 shows an example of the result of morphological analysis performed on the text of FIG. That is,
In Japanese, text is divided into words by morphological analysis. FIG. 5 shows an example of input English text. FIG. 6 shows an example of the result of morphological analysis performed on the text of FIG. That is, in English, a word is converted into a prototype by morphological analysis.

【００３１】統計部２は、形態素解析済みテキストを、
単語の集合とみなす。統計部２は、この形態素解析済み
テキストを統計処理して、単語と単語の共起頻度、単語
の出現頻度、全テキスト数を求める。図７に統計部２の
処理の例を示す。図７（ａ）は形態素解析済みテキスト
の例を示し、ｔｅｘｔ１からｔｅｘｔ１０まで全部で１
０テキストある。ｔｅｘｔ１〜ｔｅｘｔ１０の文字の横
に列挙したｗｏｒｄ１等はそのテキストに含まれる単語
であり、ｗｏｒｄ１からｗｏｒｄ５までの５種類ある。
この場合、統計部２は、全テキスト数として１０を求め
る。また、ｆ（ｗ）が単語ｗの出現頻度を表すとする
と、各単語につき図７（ｂ）に示すような出現頻度を求
める。更に、ｆ（ｘ，ｙ）が単語ｘと単語ｙの共起頻度
を表すとすると、図７（ｃ）に示すような共起頻度を求
める。The statistic section 2 stores the morphologically analyzed text as
Considered as a set of words. The statistical unit 2 statistically processes the morpheme-analyzed text to obtain the word-to-word co-occurrence frequency, the word appearance frequency, and the total number of texts. FIG. 7 shows an example of the processing of the statistical unit 2. FIG. 7A shows an example of the morphologically analyzed text, which is 1 in total from text1 to text10.
There is 0 text. Word1 and the like listed next to the characters text1 to text10 are words included in the text, and there are five types from word1 to word5.
In this case, the statistical unit 2 calculates 10 as the total number of texts. Further, if f (w) represents the appearance frequency of the word w, the appearance frequency as shown in FIG. 7B is obtained for each word. Furthermore, if f (x, y) represents the co-occurrence frequency of the word x and the word y, the co-occurrence frequency as shown in FIG. 7C is obtained.

【００３２】計算部３は、統計部２から単語と単語の共
起頻度、単語の出現頻度、全テキスト数を入力する。次
に、各単語ｓに対して、その単語ｓから他の単語ｗへの
想起の強さを計算する。計算する場合、情報量尺度を用
いる。The calculation unit 3 inputs the word-to-word co-occurrence frequency, the word appearance frequency, and the total number of texts from the statistical unit 2. Next, for each word s, the strength of recall from that word s to another word w is calculated. When calculating, an information measure is used.

【００３３】例として、情報量尺度として確率的コンプ
レキシティを用いる場合について、単語ｓから単語ｗへ
の想起の強さの計算方法を説明する。計算に必要なの
は、単語ｓと単語ｗの共起頻度ｆ（ｓ，ｗ）、単語ｓの
出現頻度ｆ（ｓ）、単語ｗの出現頻度ｆ（ｗ）、テキス
ト数ｎである。As an example, a method of calculating the strength of recall from the word s to the word w when the probabilistic complexity is used as the information amount scale will be described. What is required for calculation are the co-occurrence frequency f (s, w) of the words s and w, the appearance frequency f (s) of the word s, the appearance frequency f (w) of the word w, and the number of texts n.

【００３４】計算部３は、一つのテキストを一つのデー
タと見なす。まず、テキストにおいて単語ｗが出現した
かどうかに着目した場合のデータの確率的コンプレキシ
ティを計算する。これをデータの独立モデルに対する確
率的コンプレキシティという。具体的には次式で与えら
れる。ｎＨ（ｗ）＋（１／２）ｌｏｇ（ｎ／２π）＋ｌｏｇπ …（式１）ここで、Ｈ（ｗ）＝−Ｐ（ｗ）ｌｏｇＰ（ｗ）−Ｐ（¬ｗ）ｌｏｇＰ（¬ｗ） …（式２）Ｐ（ｗ）＝ｆ（ｗ）／ｎ …（式３）Ｐ（¬ｗ）＝１−Ｐ（ｗ） …（式４）但し、対数の底は２であるとし、０ｌｏｇ０＝０とする
（以下同様）。πは３．１４１６である。The calculator 3 regards one text as one data. First, the probabilistic complexity of data in the case of paying attention to whether or not the word w appears in the text is calculated. This is called probabilistic complexity for an independent model of data. Specifically, it is given by the following equation. nH (w) + (1/2) log (n / 2π) + logπ (Equation 1) where H (w) = − P (w) logP (w) −P (¬w) logP (¬w) (Equation 2) P (w) = f (w) / n (Equation 3) P (¬w) = 1-P (w) (Equation 4) However, assuming that the base of the logarithm is 2, 0log0 = 0 (same below). π is 3.1416.

【００３５】次に、テキストを単語ｓの出現したものと
単語ｓの出現しなかったものの二つのグループに分け
る。分かれたそれぞれのグループにおいて、単語ｗの出
現したかどうかに着目した場合のデータの確率的コンプ
レキシティを計算し、さらにその両者の和を計算する。
これをデータの依存モデルに対する確率的コンプレキシ
ティという。具体的には次式で与えられる。［ｆ（ｓ）Ｈ（ｗ｜ｓ）＋（１／２）ｌｏｇ｛ｆ（ｓ）／２π｝＋ｌｏｇπ］＋［（ｎ−ｆ（ｓ））Ｈ（ｗ｜¬ｓ）＋（１／２）ｌｏｇ｛（ｎ−ｆ（ｓ））／２π｝＋ｌｏｇπ］ …（式５）ここで、Ｈ（ｗ｜ｓ）＝−Ｐ（ｗ｜ｓ）ｌｏｇＰ（ｗ｜ｓ） −Ｐ（¬ｗ｜ｓ）ｌｏｇＰ（¬ｗ｜ｓ） …（式６）Ｐ（ｗ｜ｓ）＝ｆ（ｗ，ｓ）／ｆ（ｓ） …（式７）Ｐ（¬ｗ｜ｓ）＝１−Ｐ（ｗ｜ｓ） …（式８）Ｈ（ｗ｜¬ｓ）＝−Ｐ（ｗ｜¬ｓ）ｌｏｇＰ（ｗ｜¬ｓ） −Ｐ（¬ｗ｜¬ｓ）ｌｏｇＰ（¬ｗ｜¬ｓ） …（式９）Ｐ（ｗ｜¬ｓ）＝ｆ（ｗ，¬ｓ）／ｆ（¬ｓ）＝｛ｆ（ｗ）−ｆ（ｗ，ｓ）｝／｛ｎ−ｆ（ｓ）｝ …（式１０）ｐ（¬ｗ｜¬ｓ）＝１−Ｐ（ｗ｜¬ｓ） …（式１１）Next, the text is divided into two groups, one in which the word s appears and one in which the word s does not appear. In each of the divided groups, the probabilistic complexity of the data when paying attention to whether or not the word w appears is calculated, and the sum of the two is calculated.
This is called stochastic complexity for the data dependence model. Specifically, it is given by the following equation. [F (s) H (w | s) + (1/2) log {f (s) / 2π} + logπ] + [(nf (s)) H (w | ¬s) + (1/2 ) Log {(nf (s)) / 2π} + logπ] (Equation 5) where H (w | s) =-P (w | s) logP (w | s) -P (¬w | s) log P (¬w | s) (Equation 6) P (w | s) = f (w, s) / f (s) (Equation 7) P (¬w | s) = 1-P (w | S) (Equation 8) H (w | ¬s) =-P (w | ¬s) logP (w | ¬s) -P (¬w | ¬s) logP (¬w | ¬s) ( Formula 9) P (w | ¬s) = f (w, ¬s) / f (¬s) = {f (w) -f (w, s)} / {n-f (s)} (Equation 10) p (¬w | ¬s) = 1-P (w | ¬s) (Equation 11)

【００３６】次に、独立モデルに対する確率的コンプレ
キシティと依存モデルに対する確率的コンプレキシティ
の差分を計算する。この確率的コンプレキシティの差分
が大きければ大きいほど、単語ｓから単語ｗへの統計的
な共起の度合いが強いと言える。特に、Ｐ（ｗ｜ｓ）＞
Ｐ（ｗ）時は、正の共起、Ｐ（ｗ｜ｓ）＜Ｐ（ｗ）時
は、負の共起が存在することが言える。実際は、次式に
示すように、確率的コンプレキシティの差分をさらに全
テキスト数で割ったものを使う。Ｓ（ｓ→ｗ）＝（１／ｎ）［（式１）−（式５）］ …（式１２）なお、Ｐ（ｓ）＝ｆ（ｓ）／ｎ …（式１３）Ｐ（¬ｓ）＝１−Ｐ（ｓ） …（式１４）として、式１２を整理すると、次式のようになる。Ｓ（ｓ→ｗ）＝Ｈ（ｗ）−Ｐ（ｓ）Ｈ（ｗ｜ｓ）−Ｐ（¬ｓ）Ｈ（ｗ｜¬ｓ） −（１／２ｎ）ｌｏｇ［｛ｆ（ｓ）（ｎ−ｆ（ｓ））π｝／２ｎ］ …（式１５）Next, the difference between the stochastic complexity for the independent model and the stochastic complexity for the dependent model is calculated. It can be said that the greater the difference in the stochastic complexity, the stronger the degree of statistical co-occurrence from the word s to the word w. In particular, P (w | s)>
It can be said that there is a positive co-occurrence when P (w) and a negative co-occurrence when P (w | s) <P (w). In practice, the difference in stochastic complexity is further divided by the total number of texts, as shown in the following equation. S (s → w) = (1 / n) [(Equation 1)-(Equation 5)] (Equation 12) P (s) = f (s) / n (Equation 13) P (¬s ) = 1-P (s) (Equation 14) When Equation 12 is rearranged, the following equation is obtained. S (s → w) = H (w) -P (s) H (w | s) -P (¬s) H (w | ¬s)-(1 / 2n) log [{f (s) (n -F (s)) π} / 2n] (Equation 15)

【００３７】ここで、注意しなければならないのは、単
語ｓからの単語ｗへの想起の強さが単語ｗからの単語ｓ
への想起の強さと通常異る点である。つまり、単語間の
想起の強さが非対称的である。これは、たとえば、単語
「黒澤」から「侍」への想起は強いであろうが、逆に
「侍」から「黒澤」への想起は必ずしも強くないような
ことに対応する。図７の例における単語ｗｏｒｄ１から
単語ｗｏｒｄ２への想起の強さの計算例を図８に、単語
ｗｏｒｄ２から単語ｗｏｒｄ１への想起の強さの計算例
を図９にそれぞれ示す。このように、単語ｗｏｒｄ１か
ら単語ｗｏｒｄ２への想起の強さ（０．３７９）が単語
ｗｏｒｄ２から単語ｗｏｒｄ１への想起の強さ（０．３
６０）より大きくなるのは、図７の例における、ｗｏｒ
ｄ１が現れればｗｏｒｄ２が必ず現れているのに対し、
ｗｏｒｄ２が現れればｗｏｒｄ１が必ずしも現れていな
い現象に基づくものである。Here, it should be noted that the strength of the recall from the word s to the word w depends on the word s from the word w.
It is a point that is usually different from the strength of recollection. In other words, the strength of recall between words is asymmetric. This corresponds to the fact that, for example, the word "Kurosawa" is strongly associated with "Samurai", but conversely, the word "Samurai" is not always associated with "Kurosawa." FIG. 8 shows a calculation example of the recall strength from the word word1 to the word word2 in the example of FIG. 7, and FIG. 9 shows a calculation example of the recall strength from the word word2 to the word word1. In this way, the strength of recollection from the word word1 to the word word2 (0.379) is equal to the strength of recollection from the word word2 to the word word1 (0.3).
60) is larger than that in the example of FIG.
If d1 appears, word2 always appears, whereas
This is based on the phenomenon that if word2 appears, word1 does not necessarily appear.

【００３８】次に作成部４および記憶部１について説明
する。作成部４は、単語をノードと表現する。また作成
部４は、計算部３から、単語から単語への想起の強さを
参照しながら、想起の強さの大きい単語のノードの間に
リンクを張る。具体的には、たとえば、単語ｓから単語
ｗへの想起の強さが閾値より大きければ、単語ｓのノー
ドから単語ｗのノードへ有向のリンクを張る。このよう
に、すべての単語に対してリンク張る操作を繰り返し、
意味ネットワークを構築する。記憶部１は、このように
して作成された意味ネットワークを記憶する。Next, the creation unit 4 and the storage unit 1 will be described. The creation unit 4 expresses a word as a node. Further, the creating unit 4 establishes a link between the nodes of the words having high recall strength while referring to the strength of recall from word to word from the calculation unit 3. Specifically, for example, if the strength of recall from the word s to the word w is larger than the threshold value, a directed link is established from the node of the word s to the node of the word w. In this way, repeat the operation of linking for all words,
Build a semantic network. The storage unit 1 stores the semantic network created in this way.

【００３９】図１０に作成部４の処理例を示す。まず、
全単語リストを用意し（ステップＳ１１）、これをリス
トＷＬ１に入れる（ステップＳ１２）。次に、リストＷ
Ｌ１の先頭の単語ｗ１を取り出し（ステップＳ１４）、
全単語リストをリストＷＬ２に入れ（ステップＳ１
５）、単語ｗ１に関して以下の処理を繰り返す。FIG. 10 shows a processing example of the creating unit 4. First,
A list of all words is prepared (step S11) and put in the list WL1 (step S12). Then list W
The first word w1 of L1 is extracted (step S14),
Put all word list in list WL2 (step S1
5), the following processing is repeated for the word w1.

【００４０】まず、リストＷＬ２の先頭の単語ｗ２を取
り出し（ステップＳ１７）、単語ｗ１と同じ単語でなけ
れば（ステップＳ１８でＮＯ）、単語ｗ１から単語ｗ２
への想起の強さを計算部３から参照し（ステップＳ１
９）、想起の強さが閾値を超えていれば（ステップＳ２
０でＹＥＳ）、単語ｗ１から単語ｗ２へ有向リンクを張
る（ステップＳ２１）。想起の強さが閾値を超えていな
ければ（ステップＳ２０でＮＯ）、単語ｗ１から単語ｗ
２へはリンクを張らない。リストＷＬ２中の１つの単語
についての処理を終えると、リストＷＬ２中の次の単語
についても同様の処理を行い、これをリストＷＬ２中の
全単語について繰り返す。First, the first word w2 in the list WL2 is taken out (step S17), and if it is not the same word as the word w1 (NO in step S18), the words w1 to w2 are extracted.
The calculation unit 3 refers to the strength of recollection (step S1
9) If the strength of recall exceeds the threshold (step S2)
If YES at 0), a directional link is established from the word w1 to the word w2 (step S21). If the strength of recollection does not exceed the threshold value (NO in step S20), word w1 to word w
Do not link to 2. When the processing for one word in the list WL2 is completed, the same processing is performed for the next word in the list WL2, and this is repeated for all the words in the list WL2.

【００４１】リストＷＬ１中の１つの単語に関して上述
した処理を終えると（ステップＳ１６でＹＥＳ）、リス
トＷＬ１から次の単語を取り出し（ステップＳ１４）、
その単語についても前述と同様の処理を行う。これをリ
ストＷＬ１中の全単語について繰り返す（ステップＳ１
３）。When the above-mentioned processing is completed for one word in the list WL1 (YES in step S16), the next word is extracted from the list WL1 (step S14).
The same process as described above is performed for the word. This is repeated for all the words in the list WL1 (step S1).
3).

【００４２】次に、本発明の意味ネットワーク自動作成
装置の第二の実施例について説明する。図１１はその構
成を示し、図１２はその処理の流れを示す。この意味ネ
ットワーク自動作成装置２０は、記憶部１、統計部２、
計算部３、作成部４から構成される。Next, a second embodiment of the meaning network automatic creation apparatus of the present invention will be described. FIG. 11 shows its configuration, and FIG. 12 shows its processing flow. The meaning network automatic creation device 20 includes a storage unit 1, a statistics unit 2,
It is composed of a calculation unit 3 and a creation unit 4.

【００４３】統計部１は、図示しない入力装置や記憶装
置などから複数のテキストを入力し（ステップＳ１）、
それらのテキストに対して形態素解析を行い（ステップ
Ｓ２−１）、さらにそれらのテキストに対して語義曖昧
性解消を行い（ステップＳ２−２）、その後それらのテ
キストから概念と概念の共起頻度、概念の出現頻度、全
テキスト数を統計する（ステップＳ３）。次に、計算部
３は、統計部２から概念と概念の共起頻度、概念の出現
頻度、全テキスト数を入力し、これらのデータを基に概
念から概念への想起の強さを情報量尺度を用いて計算す
る（ステップＳ４）。次に、作成部４は、概念をノード
として表現し、計算部３から、概念から概念への想起の
強さを参照し、想起の強さが予め定めた閾値以上の概念
から概念へリンクを張ることによって意味ネットワーク
を作成し（ステップＳ５）、これを記憶部１へ出力する
（ステップＳ６）。The statistics section 1 inputs a plurality of texts from an input device, a storage device or the like (not shown) (step S1),
Morphological analysis is performed on these texts (step S2-1), word sense disambiguation is further performed on those texts (step S2-2), and then the co-occurrence frequency of concepts and concepts from those texts, The appearance frequency of the concept and the total number of texts are statisticized (step S3). Next, the calculation unit 3 inputs the frequency of concept co-occurrence, the frequency of concept appearance, and the total number of texts from the statistics unit 2, and based on these data, the strength of recollection from concept to concept Calculation is performed using the scale (step S4). Next, the creation unit 4 expresses the concept as a node, refers to the strength of the recollection from the concept to the concept from the calculation unit 3, and links the concept to the concept with the strength of the recollection being a predetermined threshold value or more. A semantic network is created by stretching (step S5), and this is output to the storage unit 1 (step S6).

【００４４】第二の実施例が第一の実施例と相違すると
ころは、第一の実施例では単語からなる意味ネットワー
クを作成しているのに対し、第二の実施例では、概念か
らなる意味ネットワークを作成している点である。ここ
でいう概念は、直感的に単語の表す意味のことである。
このため、統計部２は、形態素解析を行ったテキストに
対してさらに語義曖昧性解消を行う。たとえば、図５と
図６のテキスト例では、「fly 」には「飛ぶ」の意味も
あれば、「蝿」の意味もある。語義曖昧性解消では、前
後の文脈をみて、この文における「fly 」の意味は「飛
ぶ」の意味であること、つまり、「fly 」の表す概念が
「飛ぶ」であることを判断する。図６の形態素解析結果
に対して語義曖昧性解消を行った後、たとえば、図１３
のような結果が得られる。図１３では、たとえば、「fl
y2」は「飛ぶ」の概念、つまり「飛ぶ」の意味を表す。
統計部２は、次に概念と概念の共起頻度を統計する。The second embodiment differs from the first embodiment in that the first embodiment creates a semantic network of words, whereas the second embodiment consists of concepts. This is the point of creating a semantic network. The concept here is intuitively the meaning of a word.
Therefore, the statistical unit 2 further performs word sense disambiguation on the text subjected to the morphological analysis. For example, in the text examples of FIGS. 5 and 6, "fly" has the meaning of "fly" and "fly". In the disambiguation of word sense disambiguation, the meaning of "fly" in this sentence is determined to mean "fly," that is, the concept represented by "fly" is "fly," depending on the context. After performing word sense disambiguation on the morphological analysis result of FIG.
The result is as follows. In FIG. 13, for example, "fl
“Y2” represents the concept of “fly”, that is, the meaning of “fly”.
The statistical unit 2 then statistics the concept and the co-occurrence frequency of the concept.

【００４５】計算部３は、統計部２から概念と概念の共
起頻度、概念の出現頻度、全テキスト数を入力する。次
に、各概念ｓに対して、その概念ｓから他の概念ｗへの
想起の強さを計算する。計算方法は第一の実施例と同じ
である。The calculation unit 3 inputs the concepts and the co-occurrence frequency of the concepts, the appearance frequency of the concepts, and the total number of texts from the statistical unit 2. Next, for each concept s, the strength of recall from the concept s to another concept w is calculated. The calculation method is the same as in the first embodiment.

【００４６】作成部４は、概念をノードとして表現す
る。次に、作成部４は、計算部３から、概念から概念へ
の想起の強さを参照し、想起の強さの大きい概念のノー
ドの間にリンクを張る。具体的には、たとえば、概念ｓ
から概念ｗへの想起の強さが閾値より大きければ、概念
ｓのノードから概念ｗのノードへ有向のリンクを張る。
このように、すべての概念に対してリンク張る操作を繰
り返し、意味ネットワークを作成する。記憶部１は、こ
うして作成された意味ネットワークを記憶する。The creating unit 4 expresses the concept as a node. Next, the creation unit 4 refers to the strength of recollection from concept to concept from the calculation unit 3 and forms a link between the nodes of the concept having a high strength of recollection. Specifically, for example, the concept s
If the strength of recollection from to concept w is larger than the threshold value, a directed link is established from the node of concept s to the node of concept w.
In this way, the linking operation is repeated for all concepts to create a semantic network. The storage unit 1 stores the semantic network created in this way.

【００４７】次に、本発明の意味ネットワーク自動作成
装置の第三の実施例について述べる。図１４はその構成
を示す。この意味ネットワーク自動作成装置３０は、記
憶部１、統計部２、計算部３、作成部４、インタフェー
ス部５、探索部６、表示部７から構成される。Next, a third embodiment of the meaning network automatic creation apparatus of the present invention will be described. FIG. 14 shows the configuration. The semantic network automatic creation device 30 includes a storage unit 1, a statistics unit 2, a calculation unit 3, a creation unit 4, an interface unit 5, a search unit 6, and a display unit 7.

【００４８】第三の実施例はインタフェース部５、探索
部６、表示部７を備える点で第一の実施例と異なる。そ
の他の部分は第一の実施例と同じである。The third embodiment differs from the first embodiment in that the interface section 5, the search section 6 and the display section 7 are provided. The other parts are the same as in the first embodiment.

【００４９】インタフェース部５は、図示しないキーボ
ード等の入力装置を通じてユーザから意味ネットワーク
におけるノードの指定を受ける。たとえば、ユーザが
「car」というノードを指定すると、それを受ける。ま
た、探索の範囲の指定もユーザから入力する。The interface section 5 receives a designation of a node in the semantic network from a user through an input device such as a keyboard (not shown). For example, if the user specifies the node "car", it will receive it. The user also inputs the designation of the search range.

【００５０】探索部６は、インタフェース部５からユー
ザに指定されたノードを入力し、また記憶部１から意味
ネットワークを入力し、意味ネットワークにおける指定
されたノードを含む部分意味ネットワークを見つけ出
す。具体的には、探索部６は、指定されたノードから連
想を行う。連想は、まず指定されたノードから始まり、
そのノードとつながったノードへ行き、さらにつながっ
たノードへ行く。連想は、探索範囲の指定でユーザによ
って指定された回数まで繰り返される。但し、連想で一
度尋ねたことのあるノードにマークをつけ、そのノード
からさらに連想を行わないようにする。The search unit 6 inputs the node designated by the user from the interface unit 5 and the semantic network from the storage unit 1, and finds a partial semantic network including the designated node in the semantic network. Specifically, the search unit 6 associates with the designated node. The association starts with the specified node,
Go to the node connected to that node, and go to the node connected to it. The association is repeated up to the number of times specified by the user in specifying the search range. However, the node that has been asked once by associating is marked so that further associating is not performed from that node.

【００５１】図１５に記憶部１に記憶された意味ネット
ワークの例を示す。図１５で示すのは、ロイター通信の
新聞記事約９０００件のデータを基に本発明の意味ネッ
トワーク自動作成装置を使って実際に構築した意味ネッ
トワークの一部である。本発明によれば、数少ないデー
タをもっていても人間の直感にかなり近い意味ネットワ
ークを作成できることがわかる。このような意味ネット
ワークに対して、「car 」というノードから一段回の連
想を行うと、探索部６は図１６に示すような部分的な意
味ネットワークを見つけ、表示部７は、見つかった部分
的な意味ネットワークをユーザに示す。FIG. 15 shows an example of the semantic network stored in the storage unit 1. FIG. 15 shows a part of a semantic network actually constructed using the automatic semantic network creating device of the present invention based on data of about 9000 newspaper articles of Reuters. According to the present invention, it can be seen that even with a small amount of data, it is possible to create a semantic network that is quite similar to human intuition. When such a semantic network is associated once with a node called "car", the search unit 6 finds a partial semantic network as shown in FIG. 16, and the display unit 7 displays the found partial network. Show users the meaningful network.

【００５２】図１７に探索部６の処理例を示す。探索部
６はインタフェース部５を通じてユーザからノード指定
と何段までの連想を行うかの指定を受け取る（ステップ
Ｓ３１）。指定されたノードをｓｔａｒｔ、指定された
段数をｋとする。探索部６は、再帰関数Ｆｉｎｄ＿Ｐａ
ｔｈを実行する（ステップＳ３２）。ここで、再帰関数
Ｆｉｎｄ＿Ｐａｔｈは、ｓｔａｒｔとｋを引数とし、パ
スのリストＰａｔｈ＿ｌｉｓｔを返す関数である。この
返却されたパスのリストＰａｔｈ＿Ｌｉｓｔが表示部７
に出力される（ステップＳ３３）。FIG. 17 shows a processing example of the search unit 6. The search unit 6 receives the node designation and the designation of how many stages to associate with the user from the user through the interface unit 5 (step S31). Let the designated node be start and the designated number of stages be k. The search unit 6 uses the recursive function Find_Pa
th is executed (step S32). Here, the recursive function Find_Path is a function that takes start and k as arguments and returns a path list Path_list. The list Path_List of the returned paths is displayed on the display unit 7.
Is output (step S33).

【００５３】再帰関数Ｆｉｎｄ＿Ｐａｔｈの処理例を図
１８に示す。引数はｎｏｄｅ、ｋとする。まず、ｎｏｄ
ｅは連想されたことがあるとマークし、Ｐａｔｈ＿Ｌｉ
ｓｔを空リストとする（ステップＳ４１）。引数ｋが０
でなければ（ステップＳ４２でＮＯ）、ｎｏｄｅにリン
クされたノードのリストをＬｉｎｋｅｄリストに入れ
（ステップＳ４３）、Ｌｉｎｋｅｄリストから先頭のノ
ードｆｉｒｓｔを取り出し（ステップＳ４５）、それが
連想されたことがないノードであれば（ステップＳ４６
でＮＯ）、ｆｉｒｓｔとｋ−１を引数として自分自身を
呼び出し、その返却されたＰａｒｔｉａｌ＿Ｐａｔｈ−
ＬｉｓｔのパスをＰａｔｈ＿Ｌｉｓｔに追加する（ステ
ップＳ４７）。同じ処理をＬｉｎｋｅｄリスト中の残り
のノードについても繰り返し、Ｌｉｎｋｅｄリスト中の
全ノードについての処理を終えると（ステップＳ４４で
ＹＥＳ）、Ｐａｔｈ＿Ｌｉｓｔにおける全てのパスの先
頭にｎｏｄｅをつけ、得られた新しいＰａｔｈ＿Ｌｉｓ
ｔを返し（ステップＳ４８）、処理を終える。FIG. 18 shows a processing example of the recursive function Find_Path. The arguments are node and k. First, nod
e marks Mark as having been associated, Path_Li
st is an empty list (step S41). Argument k is 0
If not (NO in step S42), the list of nodes linked to the node is put in the Linked list (step S43), the first node first is extracted from the Linked list (step S45), and it has never been associated. If it is a node (step S46)
, NO), first and k-1 are called as arguments, and the returned Partial_Path-
The path of List is added to Path_List (step S47). The same process is repeated for the remaining nodes in the Linked list, and when the processes for all the nodes in the Linked list are finished (YES in step S44), nodes are added to the heads of all paths in Path_List to obtain new Path_List.
t is returned (step S48), and the process ends.

【００５４】なお、図１４に示した第三の実施例では、
第一の実施例に対してインタフェース部５、探索部６お
よび表示部７を追加したが、第二の実施例に対してイン
タフェース部５、探索部６および表示部７を追加するこ
とにより、ユーザから指定された概念のノードを含む部
分意味ネットワークを探索して表示する実施例（第四の
実施例）が実現できる。Incidentally, in the third embodiment shown in FIG.
Although the interface unit 5, the search unit 6 and the display unit 7 are added to the first embodiment, the interface unit 5, the search unit 6 and the display unit 7 are added to the second embodiment so that the user An embodiment (fourth embodiment) of searching for and displaying a partial semantic network including a node of a concept specified by can be realized.

【００５５】図１９は本発明を適用したコンピュータの
一例を示す構成図である。コンピュータＡは、中央処理
装置、主記憶、ハードディスク装置、フロッピィディス
ク装置、ＣＤ−ＲＯＭユニットなどを備えるコンピュー
タ本体Ｂと、表示装置Ｃと、キーボードＤと、マウスＥ
とで構成される。Ｆはフロッピィディスク、ＣＤ−ＲＯ
Ｍ等の機械読み取り可能な記録媒体であり、意味ネット
ワーク自動作成プログラムが記録されている。記録媒体
Ｆに記録された意味ネットワーク自動作成プログラム
は、コンピュータ本体Ｂによって読み取られ、コンピュ
ータ本体Ｂの動作を制御することにより、コンピュータ
本体Ｂ上に、図１に示した第一の実施例および図１１に
示した第二の実施例にあっては、記憶部１、統計部２、
計算部３、作成部４を実現し、図１４に示した第三の実
施例および前記第四の実施例にあっては更にインタフェ
ース部５、探索部６および表示部７を実現する。FIG. 19 is a block diagram showing an example of a computer to which the present invention is applied. The computer A includes a computer main body B including a central processing unit, a main memory, a hard disk device, a floppy disk device, a CD-ROM unit, a display device C, a keyboard D, and a mouse E.
Composed of and. F is a floppy disk, CD-RO
It is a machine-readable recording medium such as M, in which a semantic network automatic creation program is recorded. The meaning network automatic creation program recorded on the recording medium F is read by the computer main body B, and the operation of the computer main body B is controlled, whereby the first embodiment shown in FIG. In the second embodiment shown in FIG. 11, the storage unit 1, the statistics unit 2,
The calculation unit 3 and the creation unit 4 are realized, and further, the interface unit 5, the search unit 6, and the display unit 7 are realized in the third embodiment and the fourth embodiment shown in FIG.

【００５６】[0056]

【発明の効果】以上説明したように、本発明によれば、
人手によらない効率的な方法によって意味ネットワーク
を構築することができ、また、データを基に情報量尺度
を用いて作成しているので、非常に客観的な知識を構築
することが可能である。更に、ループを持つ意味ネット
ワークも作成することができ、また或るノードから別の
ノードへの有向リンクおよびその逆方向の有向リンクを
統計的な想起の度合いに基づいてそれぞれ独立に作成す
ることが可能である。As described above, according to the present invention,
It is possible to construct a semantic network by an efficient method that does not require human hands, and it is possible to construct a very objective knowledge because it is created by using an information scale based on data. . Furthermore, it is possible to create a semantic network with loops, and to create directional links from one node to another node and vice versa in an independent manner based on the degree of statistical recall. It is possible.

[Brief description of drawings]

【図１】本発明の意味ネットワーク自動作成装置の第一
の実施例のブロック図である。FIG. 1 is a block diagram of a first embodiment of an automatic meaning network creation device of the present invention.

【図２】本発明の意味ネットワーク自動作成装置の第一
の実施例の処理例を示すフローチャートである。FIG. 2 is a flowchart showing a processing example of the first embodiment of the meaning network automatic creation device of the present invention.

【図３】入力テキストの例を示す図である。FIG. 3 is a diagram showing an example of input text.

【図４】図３のテキストに対する形態素解析の結果の例
を示す図である。FIG. 4 is a diagram showing an example of a result of morphological analysis on the text of FIG.

【図５】入力テキストの例を示す図である。FIG. 5 is a diagram showing an example of input text.

【図６】図５のテキストに対する形態素解析の結果の例
を示す図である。FIG. 6 is a diagram showing an example of a result of morphological analysis on the text of FIG.

【図７】統計部の処理例の説明図である。FIG. 7 is an explanatory diagram of a processing example of a statistical unit.

【図８】単語ｗｏｒｄ１から単語ｗｏｒｄ２への想起の
強さの具体的な計算例を示す図である。[Fig. 8] Fig. 8 is a diagram illustrating a specific calculation example of the strength of recall from the word word1 to the word word2.

【図９】単語ｗｏｒｄ２から単語ｗｏｒｄ１への想起の
強さの具体的な計算例を示す図である。FIG. 9 is a diagram illustrating a specific calculation example of the strength of recall from the word word2 to the word word1.

【図１０】作成部の処理例を示すフローチャートであ
る。FIG. 10 is a flowchart illustrating a processing example of a creation unit.

【図１１】本発明の意味ネットワーク自動作成装置の第
二の実施例のブロック図である。FIG. 11 is a block diagram of a second embodiment of the meaning network automatic creation device of the present invention.

【図１２】本発明の意味ネットワーク自動作成装置の第
二の実施例の処理例を示すフローチャートである。FIG. 12 is a flowchart showing a processing example of the second embodiment of the meaning network automatic creation device of the present invention.

【図１３】図６のテキストに対する語義曖昧性解消の結
果の例を示す図である。FIG. 13 is a diagram showing an example of a result of word sense disambiguation for the text of FIG. 6;

【図１４】本発明の意味ネットワーク自動作成装置の第
三の実施例のブロック図である。FIG. 14 is a block diagram of a third embodiment of the semantic network automatic creation system of the invention.

【図１５】構築された意味ネットワークの一部分の例を
示す図である。FIG. 15 is a diagram showing an example of a part of a constructed semantic network.

【図１６】探索できた部分意味ネットワークの例を示す
図である。FIG. 16 is a diagram showing an example of a partially semantic network that can be searched.

【図１７】探索部の処理例を示すフローチャートであ
る。FIG. 17 is a flowchart illustrating a processing example of a search unit.

【図１８】探索部で実行される再帰関数の処理例を示す
フローチャートである。FIG. 18 is a flowchart showing a processing example of a recursive function executed by a search unit.

【図１９】本発明を適用したコンピュータの一例を示す
構成図である。FIG. 19 is a configuration diagram showing an example of a computer to which the present invention has been applied.

[Explanation of symbols]

１…記憶部２…統計部３…計算部４…作成部５…インタフェース部６…探索部７…表示部１０、２０、３０…意味ネットワーク自動作成装置 1 ... storage unit 2 ... Statistics department 3 ... Calculation part 4 ... Creation department 5 ... Interface section 6 ... Search unit 7 ... Display 10, 20, 30 ... Semantic network automatic creation device

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平11−96177（ＪＰ，Ａ) 特開平５−94482（ＪＰ，Ａ) 特開2000−56977（ＪＰ，Ａ) 持橋大地・松本裕治，連想としての意味，情報処理学会研究報告99−ＮＬ− 134−21，日本，1999年11月26日，Ｖｏｌ．99，Ｎｏ．95，ｐ．155−ｐ．162 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/30 G10L 15/18 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-11-96177 (JP, A) JP-A-5-94482 (JP, A) JP-A-2000-56977 (JP, A) Daichi Mochihashi, Yuji Matsumoto , Meaning of association, IPSJ Research Report 99-NL-134-21, Japan, November 26, 1999, Vol. 99, No. 95, p. 155-p. 162 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/30 G10L 15/18 JISST file (JOIS)

Claims

(57) [Claims]

1. A storage unit that stores a semantic network of words, a plurality of texts are input, morpheme analysis is performed on the input texts, and from the texts subjected to the morpheme analysis,
The statistics section that statistics the word-to-word co-occurrence frequency, the word appearance frequency, and the total number of texts, and the word-to-word co-occurrence frequency, the word appearance frequency, and the total number of texts from the statistics section word co-occurrence frequency of the word, the word frequency of, based on the total number of text, for each word, the calculation section calculated using the amount of information measure the strength of the recall from the word to other words There is an information measure
To use probabilistic complexity and
The strike is regarded as one data, and the first word
The strength of recollection of the second word in the
Of the data when paying attention to whether or not two words have appeared
Probabilistic complexity and the text in which the first word appears
Pay attention to whether the second word appears in the strike group
The stochastic complexity of data and the first
The second word appears in the text group in which no word appears
Probabilistic completion of data when focusing on whether or not
A calculation unit that calculates the difference from the sum of kisses, and one word is expressed as one node, and for each expressed node, the strength of recollection from the word of that node to the words of other nodes If the strength of the referred recollection is greater than or equal to a predetermined threshold value by referring to the calculation unit, a directed link from that node to another node is formed, and a directed graph with a directed link is formed of words. An automatic creation device for a semantic network, comprising: a creation unit for outputting to the storage unit as a semantic network.

2. A storage unit that stores a semantic network consisting of concepts, a plurality of texts are input, morphological analysis is performed on the input texts, and word sense disambiguation is performed on the morphologically analyzed texts. And a co-occurrence frequency of concepts and concepts from the text that has been subjected to word sense disambiguation, a statistical part that statistics the frequency of appearance of concepts, and the total number of texts, and a co-occurrence frequency of concepts and concepts from the statistical part, and an appearance frequency of concepts , Input the total number of texts, and based on the input concept and the frequency of co-occurrence of the concept, the frequency of appearance of the concept, and the total number of texts, for each concept, the strength of recall from that concept to other concepts A calculation unit that calculates using an information content scale,
To use probabilistic complexity and
The strike is regarded as one data, and the first concept
The strength of recollection of the second concept of meaning
Of the data when focusing on whether the concept of 2 has appeared
Probabilistic complexity and the text where the first concept emerges
Paying attention to whether the second concept appears in the strike group
The stochastic complexity of data and the first
The second concept appears in the text group where the concept does not appear
Probabilistic completion of data when focusing on whether or not
A calculation unit that calculates as a difference from the sum of chisities, and one concept is expressed as one node, and for each expressed node, the strength of recollection from that node's concept to that of another node When the strength of the referred recollection is equal to or more than a predetermined threshold value, a directed link is established from that node to another node, and the directed graph with the directed link is constructed from the concept. An automatic creation device for a semantic network, comprising: a creation unit for outputting to the storage unit as a semantic network.

3. An interface unit for receiving a designation of a node in a semantic network from a user, a node designated by the user from the interface unit, and a semantic network from a storage unit,
A search unit for finding a partial semantic network including a node designated by the user in the input semantic network; and a display unit for inputting the partial semantic network from the searching unit and displaying the input partial semantic network. 3. The meaning network automatic creation device according to claim 1 or 2 .

4. A storage unit for storing a semantic network of words, inputting a plurality of texts, performing a morphological analysis on the input texts, and selecting a text from the morphological analysis.
Word-to-word co-occurrence frequency, word appearance frequency, statistical part that statistics the total number of texts, word-to-word co-occurrence frequency, word appearance frequency, total text number from the statistics part, and the entered word a co-occurrence frequency of the word, word frequency, based on the total number of text, for each word, met calculator calculates using the amount of information measure the strength of the recall from the word to other words Information scale
To use probabilistic complexity and
The strike is regarded as one data, and the first word
The strength of recollection of the second word in the
Of the data when paying attention to whether or not two words have appeared
Probabilistic complexity and the text in which the first word appears
Pay attention to whether the second word appears in the strike group
The stochastic complexity of data and the first
The second word appears in the text group in which no word appears
Probabilistic completion of data when focusing on whether or not
A calculation unit that calculates as a difference from the sum of kisses , one word is expressed as one node, and for each expressed node, the strength of recall from the word of that node to the words of other nodes is expressed. With reference to the calculation unit, if the strength of the referred recall is equal to or greater than a predetermined threshold value, a directed link is established from that node to another node, and a directed graph with a directed link is composed of words. A computer-readable recording medium in which a program that functions as a creating unit that outputs to the storage unit as a network is recorded.

5. A computer, a storage unit for storing a semantic network of concepts, inputting a plurality of texts, performing morphological analysis on the input texts, and word sense ambiguity for the texts subjected to the morphological analysis. The co-occurrence frequency of the concept and the concept, the appearance frequency of the concept, and the statistical part that statistics the total number of texts from the text that has been resolved and the word sense disambiguation has been performed. Input the frequency and the total number of texts, and based on the input concept and the frequency of co-occurrence of the concept, the appearance frequency of the concept, and the total number of texts, the strength of recollection from that concept to other concepts Is a calculation unit that calculates
To use probabilistic complexity and
The strike is regarded as one data, and the first concept
The strength of recollection of the second concept of meaning
Of the data when focusing on whether the concept of 2 has appeared
Probabilistic complexity and the text where the first concept emerges
Paying attention to whether the second concept appears in the strike group
The stochastic complexity of data and the first
The second concept appears in the text group where the concept does not appear
Probabilistic completion of data when focusing on whether or not
A calculation unit that calculates as a difference from the sum of chisities , one concept is expressed as one node, and for each expressed node, the strength of recall from the concept of that node to the concept of other nodes is expressed. With reference to the calculation unit, if the strength of the referred recall is equal to or greater than a predetermined threshold value, a directed link is established from that node to another node, and the meaning of the concept is a directed graph with directed links. A computer-readable recording medium in which a program that functions as a creating unit that outputs to the storage unit as a network is recorded.

6. A computer, further an interface unit for receiving a designation of a node in a semantic network from a user, inputting a node designated by the user from the interface unit, and inputting a partial semantic network from a storage unit. A search unit for finding a partial semantic network including a node designated by a user in the specified partial semantic network; a display unit for inputting the partial semantic network from the searching unit and displaying the input partial semantic network; Recorded claim 4 or
5. A computer-readable recording medium according to item 5 .