JPH06168129A

JPH06168129A - Knowledge extracting device

Info

Publication number: JPH06168129A
Application number: JP4341101A
Authority: JP
Inventors: Hidekazu Arita; 英一有田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1992-11-30
Filing date: 1992-11-30
Publication date: 1994-06-14

Abstract

PURPOSE:To obtain not a simple associative concept pair of (concept 1)-(concept (i)) but a useful knowledge of (concept 1)-(concept (j))-(concept (i)) by referring to the knowledge stored in a main knowledge base and extracting a concept having prescribed relations to an associated concept. CONSTITUTION:An associating part 2 extracts associative relations of concepts from a text database 1. Knowledge of a subject area of data stored in the text database 1 is stored in a domain knowledge base 3. A knowledge synthesizing part 4 obtains an associated concept pair from the associating part 2 and refers to the domain knowledge base 3 to obtain the relations between these concepts and outputs them as knowledge. The concept pair of concepts 1 and 2 associated by the associating part 2 is extracted from the text database 1, and the domain knowledge base 3 is referred to clarify the relations of this concept pair by the knowledge synthesizing part 4, and new knowledge indicating that concepts 1 and 2 are related to a concept 3 is extracted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、テキストデータベー
スから得られる概念（単語）間の連想関係を、ドメイン
知識を参照することにより、概念間の関連を明らかにし
た知識として抽出する知識抽出装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a knowledge extracting device for extracting an associative relationship between concepts (words) obtained from a text database as knowledge that reveals the relationship between concepts by referring to domain knowledge. It is a thing.

【０００２】[0002]

【従来の技術】図１６は例えば特開平３ー１８９６２号
公報に示された従来の知識抽出装置を示す構成図であ
り、図において、１１は概念とその連想とからなる連想
概念対の学習を適切な制御によって一定の尤度を残した
状態で終えたニューラルネットワーク、１２はニューラ
ルネットワーク１１に概念のデータを入力する概念入力
手段、１３はニューラルネットワーク１１からの出力を
連想データとして出力する連想出力手段である。2. Description of the Related Art FIG. 16 is a block diagram showing a conventional knowledge extracting apparatus disclosed in, for example, Japanese Patent Laid-Open No. 18962/1993, in which reference numeral 11 indicates learning of an associative concept pair consisting of a concept and its association. Neural network finished with a certain likelihood left by appropriate control, 12 is concept input means for inputting conceptual data to neural network 11, and 13 is an associative output for outputting output from neural network 11 as associative data. It is a means.

【０００３】次に動作について説明する。知識抽出装置
の使用者は、ある概念Ａ１に関連する諸概念を幅広く検
討するために概念入力手段１２から概念Ａ１を入力す
る。入力された概念Ａ１はニューラルネットワーク１１
の入力信号となる。ニューラルネットワーク１１は一定
の尤度を残した状態にあるので連想概念対（Ａ１−Ｂ
１）を学習していたとしても、概念Ａ１の連想として概
念Ｂ１のみならず概念Ｂ１１，．．．．．，Ｂ１ｎが連
想出力手段１３から得られる。このようにして使用者は
概念Ａ１の連想として意外性のある概念Ｂ１
１，．．．．Ｂ１ｎを得ることができる。Next, the operation will be described. The user of the knowledge extraction device inputs the concept A1 from the concept input means 12 in order to widely study various concepts related to the certain concept A1. The inputted concept A1 is the neural network 11
Input signal. Since the neural network 11 remains in a certain likelihood, the associative concept pair (A1-B
1), the concept A1 is associated with not only the concept B1 but also the concepts B11 ,. ．．．． , B1n are obtained from the associative output means 13. In this way, the user can use the concept B1 having an unexpectedness as the association of the concept A1.
1 ,. ．．． B1n can be obtained.

【０００４】[0004]

【発明が解決しようとする課題】従来の知識抽出装置は
以上のように構成されているので、概念１とその概念１
から連想される概念２，．．，概念ｉ，．．，概念ｎと
の関係が明らかでなく、（１）なぜ概念１から概念ｉが
連想されるのか理由がわからない、（２）概念１と概念
ｉの関係が明らかでないので、連想された概念対を別の
知識情報処理で利用することが困難である、（３）概念
１から複数の概念が連想された場合、それぞれの連想の
違いが明らかでない、（４）概念１から多くの概念が連
想された場合、連想された概念の利用目的の観点からの
優先度をつけることができない、という問題点があっ
た。Since the conventional knowledge extracting apparatus is constructed as described above, concept 1 and its concept 1
Concepts associated with 2. ． , Concept i ,. ． , The relationship with the concept n is not clear, (1) I do not know why the concept i is associated with the concept 1, and (2) the relationship between the concept 1 and the concept i is not clear. It is difficult to use in another knowledge information processing. (3) When a plurality of concepts are associated with concept 1, the difference between the associations is not clear. (4) Many concepts are associated with concept 1. In that case, there was a problem that it was not possible to prioritize the associated concept from the viewpoint of the purpose of use.

【０００５】この発明は、上記のような問題点を解消す
るためになされたもので、概念１とその概念１から連想
される概念２，．．．，概念ｉ，．．，概念ｎとの「関
係」も提供する事で、（概念１−概念ｉ）という単なる
連想概念対ではなく、（概念１−関係ｊ−概念ｉ）とい
う有用な知識を得ることができる知識抽出装置を得るこ
とを目的とする。The present invention has been made in order to solve the above-mentioned problems, and includes concept 1 and concepts 2 ,. ．． , Concept i ,. ． , By providing a "relationship" with the concept n, knowledge extraction that can obtain useful knowledge of (concept 1-relation j-concept i) rather than a simple associative concept pair of (concept 1-concept i) The purpose is to obtain the device.

【０００６】[0006]

【課題を解決するための手段】請求項１の発明に係る知
識抽出装置は、ドメイン知識ベースにより蓄積された知
識を参照し、連想部により連想された概念に対して所定
の関係にある概念を抽出するものである。A knowledge extracting apparatus according to the invention of claim 1 refers to knowledge accumulated by a domain knowledge base, and identifies a concept having a predetermined relationship with a concept associated by an association unit. To extract.

【０００７】請求項２の発明に係る知識抽出装置は、請
求項１の知識抽出装置に加えて構造情報をもったテキス
トの一部分を選択するようにしたものである。The knowledge extracting device according to the invention of claim 2 is adapted to select a part of a text having structure information in addition to the knowledge extracting device of claim 1.

【０００８】請求項３の発明に係る知識抽出装置は、請
求項１の知識抽出装置に加えて、ある概念を代表的な１
つの概念に変換するようにしたものである。In addition to the knowledge extracting device of claim 1, the knowledge extracting device according to the invention of claim 3 is a representative one of a certain concept.
It is designed to be converted into one concept.

【０００９】請求項４の発明に係る知識抽出装置は、請
求項３の知識抽出装置に加えて、ユーザが同義語と判定
したある概念を、同義語変換部に追加するようにしたも
のである。In addition to the knowledge extracting device according to the third aspect, the knowledge extracting device according to the fourth aspect of the present invention adds a certain concept determined by the user to be a synonym to the synonym conversion unit. .

【００１０】請求項５の発明に係る知識抽出装置は、請
求項１の知識抽出装置に加えて、複数の単語から成る一
つの概念を、それぞれの単語に対して同義語変換するよ
うにしたものである。A knowledge extracting device according to a fifth aspect of the present invention is, in addition to the knowledge extracting device of the first aspect, configured to convert one concept consisting of a plurality of words into a synonym for each word. Is.

【００１１】[0011]

【作用】請求項１の発明における知識抽出装置は、テキ
ストデータベースから連想部で連想される概念１と概念
２の概念対を抽出し、その概念対がどのような関係にあ
るかをドメイン知識ベースを参照することにより知識合
成部で明らかにして、概念１と概念２が概念３の関係に
あるという新たな知識を抽出する。According to the invention of claim 1, the knowledge extracting device extracts a concept pair of concept 1 and concept 2 associated with each other in the associative unit from the text database and determines the relationship between the concept pair and the domain knowledge base. The new knowledge that the concept 1 and the concept 2 have the relationship of the concept 3 is clarified by referring to, and new knowledge is extracted.

【００１２】請求項２の発明におけるテキスト構造選択
部は、構造を持ったテキストからそのテキストの一部分
を選択する。The text structure selection unit in the invention of claim 2 selects a part of the text from the text having the structure.

【００１３】請求項３の発明における同義語変換部は、
ある概念を代表的な１つの概念に変換する。The synonym conversion unit in the invention of claim 3 is
Convert a concept into one representative concept.

【００１４】請求項４の発明における同義関係登録部
は、ユーザが同義語と判定したある概念を、同義語変換
部に追加する。The synonym relation registration unit in the invention of claim 4 adds a certain concept determined by the user as a synonym to the synonym conversion unit.

【００１５】請求項５の発明における同義表現変換部
は、複数の単語から成る一つの概念を、それぞれの単語
に対して同義語変換する。The synonymous expression conversion unit in the invention of claim 5 converts one concept composed of a plurality of words into a synonym for each word.

【００１６】[0016]

【Example】

実施例１．以下この発明の一実施例を図について説明す
る。図１は、請求項１の発明による知識抽出装置の一実
施例を示す全体構成図である。図において、１は抽出す
る知識のもととなるテキストデータベース、２はテキス
トデータベース１から概念の連想関係を抽出する連想
部、３はテキストデータベース１に蓄積されるデータの
話題領域の知識を格納しているドメイン知識ベース、４
は連想部２から連想概念対を得て、その概念間の関係を
ドメイン知識ベース３を参照して求め、知識として出力
する知識合成部である。Example 1. An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is an overall configuration diagram showing an embodiment of a knowledge extracting device according to the invention of claim 1. In the figure, 1 is a text database that is a source of knowledge to be extracted, 2 is an associative unit that extracts an associative relationship of concepts from the text database 1, and 3 is knowledge of a topic area of data accumulated in the text database 1. Domain knowledge base, 4
Is a knowledge synthesizing unit that obtains an associative concept pair from the associative unit 2, obtains the relationship between the concepts by referring to the domain knowledge base 3, and outputs it as knowledge.

【００１７】次に動作について説明する。図２はドメイ
ン知識ベース３の例である。図２（ａ）は知識の表現形
式を表しており、（概念１、関係ｒ、概念２）の３つ組
でドメイン知識を表現する。これは概念１が概念２と関
係ｒの関係が有ることを表す。関係ｒとしては概念２が
概念１の上位の概念であるというｉｓａの関係や、概念
１と概念２が同義であるというｅｑの関係がある。図２
（ｂ）は理解を助けるために図２（ａ）のドメイン知識
をグラフ構造で示したものである。Next, the operation will be described. FIG. 2 is an example of the domain knowledge base 3. FIG. 2A shows a knowledge representation format, and domain knowledge is represented by a triplet of (concept 1, relationship r, concept 2). This means that concept 1 has a relationship of r with concept 2. As the relation r, there is an isa relation that the concept 2 is a superordinate concept of the concept 1 and an eq relation that the concept 1 and the concept 2 are synonymous. Figure 2
FIG. 2B shows the domain knowledge of FIG. 2A in a graph structure in order to facilitate understanding.

【００１８】図３は連想部２の例である。連想部２は、
概念連想関係抽出部２１と概念連想ネットワーク２２と
連想制御部２３で構成される。概念連想関係抽出部２１
はテキストデータベース１の内容に応じてテキストに含
まれる概念間の連想関係を抽出する。概念の連想関係を
求める一つの方法として概念間の共起頻度を利用する方
法がある。概念ｉと概念ｊが同じテキストデータに有れ
ば、概念ｉと概念ｊは共起しているという。多くのテキ
ストデータに対して概念ｉと概念ｊが共起していれば、
概念ｉと概念ｊは連想関係にあるといえる。概念連想関
係抽出部２１から得られる概念の連想関係は概念連想ネ
ットワーク２２で表現される。概念ｉと概念ｊは連想の
強さＷｉｊのリンクで結び付けられる。Ｗｉｊは例えば
概念ｉと概念ｊの共起する頻度を正規化したものを用い
ることができる。連想制御部２３は概念連想ネットワー
ク２２を利用してある概念ａが指定されたときにその概
念ａから連想される概念１、概念２、．．．、概念
ｉ、．．．、概念ｎを得る。連想される概念を求める１
つの方法として、概念ａと共起関係のリンクで結ばれて
いる概念のうち、連想の強さがあるしきい値以上である
ものとする方法がある。FIG. 3 shows an example of the associative unit 2. The association section 2
It is composed of a concept association relation extraction unit 21, a concept association network 22, and an association control unit 23. Conceptual association relation extraction unit 21
Extracts an associative relationship between concepts included in the text according to the contents of the text database 1. One way to find the association of concepts is to use the co-occurrence frequency between concepts. If concept i and concept j are in the same text data, concept i and concept j are said to co-occur. If concept i and concept j co-occur for many text data,
It can be said that the concept i and the concept j have an associative relationship. The concept associative relationship obtained from the concept associative relationship extracting unit 21 is expressed by the concept associative network 22. Concept i and concept j are linked by a link of association strength Wij. As Wij, for example, a normalized frequency of co-occurrence of concept i and concept j can be used. The associative control unit 23 uses the concept associative network 22 to specify a concept a, which is associated with the concept 1, concept 2 ,. ．． , Concept i ,. ．． , Get concept n. Search for associated concepts 1
As one of the methods, there is a method in which, among the concepts connected to the concept a by a link of co-occurrence relation, the strength of association is equal to or higher than a certain threshold.

【００１９】図４は知識合成部４の動作を説明するフロ
ーチャートである。知識合成部４の動作を図４に従って
説明する。まずステップＳＴ１で、連想部２から概念ａ
と概念ａから連想される概念１、概念２、．．．、概念
ｋ、．．．、概念ｎを得る。次に概念１から概念ｎまで
順にステップＳＴ２からステップＳＴ９までの処理をそ
れぞれ行う。以下ではｉ番目の概念である概念ｉについ
ての動作を説明する。ステップＳＴ２で概念ｉを概念ｂ
とする。ステップＳＴ３でドメイン知識ベースの中で概
念ｂを３つ組の第１項に持つものを検索する。ステップ
ＳＴ４で３つ組がドメイン知識ベースに有るかどうか判
断する。有る場合はステップＳＴ５へ、ない場合はステ
ップＳＴ９へ進む。ステップＳＴ５で、３つ組を（概念
ｂ、関係ｒ、概念ｃ）とする。ステップＳＴ６で関係ｒ
の種類を調べる。関係ｒが同値関係ｅｑｕの場合はステ
ップＳＴ８で概念ｃを概念ｂとしてステップＳＴ３に戻
る。関係ｒが階層関係ｉｓａの場合はステップＳＴ７で
（概念ａ、概念ｃ、概念ｉ）を抽出した知識として記憶
する。ステップＳＴ４で３つ組が見つからなかった場合
はステップＳＴ９で（概念ａ、連想、概念ｉ）を抽出し
た知識として記憶する。ステップＳＴ１０で概念１から
概念ｎまでの知識抽出結果をまとめたものを概念ａから
の知識抽出結果とする。FIG. 4 is a flow chart for explaining the operation of the knowledge synthesizing section 4. The operation of the knowledge synthesizing unit 4 will be described with reference to FIG. First, in step ST1, the concept a from the associative unit 2
Concept 1, concept 2 ,. ．． , Concept k ,. ．． , Get concept n. Next, the processes from step ST2 to step ST9 are sequentially performed from concept 1 to concept n. The operation of the concept i, which is the i-th concept, will be described below. In step ST2, the concept i is converted to the concept b
And In step ST3, the domain knowledge base having the concept b in the first term of the triple is searched. In step ST4, it is determined whether the triplet is in the domain knowledge base. If yes, go to step ST5; otherwise, go to step ST9. In step ST5, the triplet is defined as (concept b, relation r, concept c). Relation r in step ST6
Find out what kind of. When the relation r is the equivalence relation equ, the concept c is set as the concept b in step ST8, and the process returns to step ST3. When the relation r is the hierarchical relation isa, (concept a, concept c, concept i) is stored as the extracted knowledge in step ST7. If the triplet is not found in step ST4, (concept a, association, concept i) is stored as extracted knowledge in step ST9. In step ST10, the knowledge extraction results from concept 1 to concept n are summarized as the knowledge extraction result from concept a.

【００２０】例としてエキスパートシステム（上記の説
明の概念ａに対応）という概念からＬｉｓｐ（上記の説
明の概念１に対応）とｉｆ−ｔｈｅｎルール（上記の説
明の概念２に対応）という概念が連想され、ドメイン知
識ベースとして図２の内容があったとすると、抽出され
る知識は（エキスパートシステム、プログラミング言
語、Ｌｉｓｐ）、（エキスパートシステム、知識表現、
ｉｆ−ｔｈｅｎルール）となる。これはエキスパートシ
ステムはＬｉｓｐというプログラミング言語で作成さ
れ、知識表現としてｉｆ−ｔｈｅｎルールを使うことが
多い事を表している。As an example, from the concept of the expert system (corresponding to concept a in the above description), the concepts of Lisp (corresponding to concept 1 in the above description) and if-then rule (corresponding to concept 2 in the above description) are associated. Assuming that the domain knowledge base has the contents shown in FIG. 2, the extracted knowledge is (expert system, programming language, Lisp), (expert system, knowledge representation,
if-then rule). This means that the expert system is written in a programming language called Lisp and often uses if-then rules as a knowledge expression.

【００２１】実施例２．図５は、請求項２の発明による
知識抽出装置の一実施例を示す全体構成図である。図に
おいて５はテキストデータベース１に格納されるテキス
トが構造情報を持つ場合、テキストデータから、その構
造の一部を選択して切り出して、切り出したテキストデ
ータの一部分を連想部２の入力とするテキスト構造選択
部である。Example 2. FIG. 5 is an overall configuration diagram showing an embodiment of a knowledge extracting device according to the invention of claim 2. In the figure, when the text stored in the text database 1 has structure information, 5 is a text in which a part of the structure is selected and cut out from the text data and a part of the cut out text data is input to the associative unit 2. It is a structure selection unit.

【００２２】次に動作について説明する。図９は構造情
報を持つテキストデータの例である。各テキストデータ
に対応してそれぞれその構造情報を保持する構造情報テ
ーブルがある。構造情報テーブルには、テキストデータ
の部分名とそれに対応するテキストデータの開始位置と
終了位置が対応して記述される。テキストデータが技術
文献の場合、一般的に文献のタイトル、文献の著者、著
者の所属、文献の概要、本文の第１章、第２
章、．．．、結論、そして参考文献などが部分名とな
る。Next, the operation will be described. FIG. 9 is an example of text data having structure information. There is a structure information table that holds the structure information corresponding to each text data. In the structure information table, partial names of text data and corresponding start positions and end positions of text data are described in correspondence with each other. When the text data is a technical document, generally, the title of the document, the author of the document, the affiliation of the author, the outline of the document, Chapters 1 and 2 of the text.
chapter,. ．． , Conclusions, and references are part names.

【００２３】図１０はテキスト構造選択部５の動作を説
明するフローチャートである。テキスト構造選択部５の
動作を図１０に従って説明する。ステップＳＴ９１で選
択するテキストの部分名をユーザより得て、それをＰと
する。Ｐは例えば「概要」などとなる。次にステップＳ
Ｔ９２でテキストデータベース１からテキストＴを得
る。テキストＴが空であるかどうかをステップＳＴ９３
で判断し、空の場合は処理を終了する。空でない場合は
ステップＳＴ９４でテキストＴの構造情報テーブルから
Ｐに対応する部分名の行を検索し、そのＰの部分のテキ
ストＴにおける開始位置Ｓと終了位置Ｅを求める。次に
ステップＳＴ９５でテキストＴの位置Ｓから位置Ｅまで
の間を切り出し、それを選択されたテキストの一部分と
して連想部２へ送る。その後、ステップＳＴ９２に戻
る。FIG. 10 is a flow chart for explaining the operation of the text structure selection unit 5. The operation of the text structure selection unit 5 will be described with reference to FIG. The partial name of the text selected in step ST91 is obtained from the user and is set as P. P is, for example, “outline”. Then step S
At T92, the text T is obtained from the text database 1. In step ST93, it is determined whether the text T is empty.
If it is empty, the process ends. If it is not empty, the line of the partial name corresponding to P is searched from the structure information table of the text T in step ST94, and the start position S and the end position E of the P part in the text T are obtained. Next, in step ST95, the portion of the text T from the position S to the position E is cut out and sent to the associative unit 2 as a part of the selected text. Then, it returns to step ST92.

【００２４】実施例３．図６は、請求項３の発明による
知識抽出装置の一実施例を示す全体構成図である。図に
おいて６は同義関係にある複数の単語を１つの標準とな
る単語に変換する同義語変換部である。Example 3. FIG. 6 is an overall configuration diagram showing an embodiment of the knowledge extraction device according to the invention of claim 3. In the figure, 6 is a synonym conversion unit for converting a plurality of synonymous words into one standard word.

【００２５】次に動作について説明する。図１１は同義
語変換部６の例である。ある単語が与えられた時、イン
デックス部はその単語の字種、文字数などの簡単に自動
的に求めることができる特徴を利用して、その単語が登
録されている可能性がある同義語辞書の範囲を求める。
同義語辞書は見出しの単語とその単語に対する標準とな
る単語を記載する。Next, the operation will be described. FIG. 11 is an example of the synonym conversion unit 6. When a word is given, the index part uses the features that can be easily and automatically obtained, such as the character type and the number of characters of the word, to create a synonym dictionary in which the word may be registered. Find the range.
The synonym dictionary describes the words of the heading and the standard words for the words.

【００２６】図１２は同義語変換部６の動作を説明する
フローチャートである。同義語変換部６の動作を図１２
に従って説明する。まずステップＳＴ１１１で連想部２
より変換の対象となる単語Ｗを得る。次にステップＳＴ
１１２でＷの字種を判定しそれをＪＳとする。字種の種
類としては「ニューラルネットワーク」のようなカタカ
ナ、「神経回路網」の様な漢字、「ひずみ」のようなひ
らがな、「ｎｅｗｒａｌｎｅｔｗｏｒｋ」のような英
数字、そして「Ｃ言語」や「しきい値」のようにいろい
ろな字種が混合されたものなどがある。次にステップＳ
Ｔ１１３でＷの文字数を計算しそれをＣＮとする。例え
ばＷが「神経回路網」の場合、文字数ＣＮは５である。
次にステップＳＴ１１４でＪＳとＣＮをキーとしてイン
デックス部を検索し、それから得られる見出し番号をＩ
Ｎとする。そしてステップＳＴ１１５で同義語辞書の見
出し番号ＩＮの行を検索し、その行の次単語番号をＪ
Ｎ、見出しの単語をＥＷ、標準となる単語をＣＷとす
る。次にステップＳＴ１１６で単語Ｗと見出しの単語Ｅ
Ｗが同じであるかどうか判定し、同じ場合はステップＳ
Ｔ１１７でＣＷを変換された単語として処理を終了す
る。異なる場合はステップＳＴ１１８で次単語番号ＪＮ
があるかどうか判定する。例えば同義語辞書の次単語番
号ＪＮのところの数値が０の時はＪＮがないと決めてお
く。次単語番号ＪＮがない場合はステップＳＴ１１９で
Ｗを変換された単語として処理を終了する。有る場合は
ステップＳＴ１１２でＪＮを新たなＩＮとしてステップ
ＳＴ１１５に戻る。FIG. 12 is a flow chart for explaining the operation of the synonym conversion unit 6. The operation of the synonym conversion unit 6 is shown in FIG.
Follow the instructions below. First, in step ST111, the associative unit 2
The word W to be converted is obtained. Next step ST
At 112, the character type of W is determined and it is set as JS. The types of characters include katakana such as "neural network", kanji such as "neural network", hiragana such as "distortion", alphanumeric characters such as "newal network", and "C language" or " There is a mixture of various character types such as "threshold". Then step S
At T113, the number of W characters is calculated, and is set as CN. For example, when W is “neural network”, the number of characters CN is 5.
Next, in step ST114, the index part is searched using JS and CN as keys, and the index number obtained therefrom is I.
Let N. Then, in step ST115, the line of the index number IN in the synonym dictionary is searched, and the next word number of that line is set to J.
N, the word of the heading is EW, and the standard word is CW. Next, in step ST116, the word W and the heading word E
It is determined whether W is the same, and if they are the same, step S
At T117, the processing ends with CW as the converted word. If they are different, the next word number JN in step ST118.
Determine if there is. For example, when the numerical value at the next word number JN in the synonym dictionary is 0, it is determined that there is no JN. If the next word number JN does not exist, the process ends with the word W converted in step ST119. If there is, JN is set as a new IN in step ST112 and the process returns to step ST115.

【００２７】実施例４．図７は、請求項４の発明による
知識抽出装置の一実施例を示す全体構成図である。７は
連想部２から得られた単語対が同義語変換部６の同義語
辞書に登録されていない新たな同義関係にある単語対で
あるかどうか判定する同義関係登録部である。Example 4. FIG. 7 is an overall configuration diagram showing an embodiment of the knowledge extracting device according to the invention of claim 4. Reference numeral 7 denotes a synonym relation registration unit that determines whether the word pair obtained from the association unit 2 is a word pair having a new synonym relation that is not registered in the synonym dictionary of the synonym conversion unit 6.

【００２８】次に動作について説明する。図１３は同義
関係登録部７の動作を説明するフローチャートである。
まずステップＳＴ７０１で連想部２より得られる連想単
語対を単語Ｗ１と単語Ｗ２とする。次にステップＳＴ７
０２でＷ１とＷ２をユーザに提示し、ユーザにＷ１とＷ
２が同義関係にあるかどうかを判定する。ステップＳＴ
７０３でユーザがＷ１とＷ２が同義関係にあると判定し
た場合はステップＳＴ７０４でＷ１とＷ２を同義語変換
部６の同義語辞書に登録すると共にインデックス部を更
新して終了する。ステップＳＴ７０３でユーザがＷ１と
Ｗ２が同義関係にないと判定した場合は終了する。Next, the operation will be described. FIG. 13 is a flowchart for explaining the operation of the synonym relationship registration unit 7.
First, in step ST701, the associative word pairs obtained from the associative unit 2 are defined as the word W1 and the word W2. Next in step ST7
In 02, W1 and W2 are presented to the user, and W1 and W are presented to the user.
It is determined whether or not 2 have a synonymous relationship. Step ST
When the user determines in step 703 that W1 and W2 have a synonymous relationship, in step ST704 W1 and W2 are registered in the synonym dictionary of the synonym conversion unit 6, the index unit is updated, and the process ends. When the user determines in step ST703 that W1 and W2 are not synonymous with each other, the process ends.

【００２９】実施例５．図８は、請求項５の発明による
知識抽出装置の一実施例を示す全体構成図である。８は
同義関係にある複数の単語から構成される概念を１つの
標準的な表現に変換する同義表現変換部である。Example 5. FIG. 8 is an overall configuration diagram showing an embodiment of the knowledge extraction device according to the invention of claim 5. Reference numeral 8 denotes a synonymous expression conversion unit that converts a concept composed of a plurality of synonymous words into one standard expression.

【００３０】次に動作について説明する。図１４は同義
表現変換部８の構成例である。複数の単語から構成され
る概念の表現をセグメンテーション部８１で各構成要素
に分割し、それぞれの構成要素について同義語変換部６
で標準的な表現に変換し、同義表現合成部８３で標準的
な表現に変換された各構成要素を合成して、複数の単語
から構成される概念の標準的な表現を得る。Next, the operation will be described. FIG. 14 is a configuration example of the synonymous expression conversion unit 8. A segmentation unit 81 divides a concept expression composed of a plurality of words into each constituent element, and the synonym conversion unit 6 divides each constituent element.
Is converted into a standard expression with the synonymous expression synthesizing unit 83, and the constituent elements converted into the standard expression are combined to obtain a standard expression of a concept composed of a plurality of words.

【００３１】図１５は同義表現変換部８の動作を説明す
るフローチャートである。まずステップＳＴ８０１で連
想部２より変換の対象となる複数の単語から構成される
概念の表現ＣＣを得る。次にステップＳＴ８０２でＣＣ
を構成要素に分割し、分割された各構成要素をＳＣ
１，．．．ＳＣｉ，．．．ＳＣｎとする。次にＳＣ１か
らＳＣｎまでステップＳＴ８０３でＳＣｉを同義語変換
し、その標準となる表現ＨＣｉとする。そしてステップ
ＳＴ８０４でＨＣ１，．．．ＨＣｉ，．．．ＨＣｎを合
成し標準的な表現ＨＣＣとする。このような複数の単語
から構成される概念の表現の例として、技術文献などの
参考文献がある。例えば「山田、鈴木：○○に関するモ
デル、信学論（Ｂ）、Ｊ６２−Ｂ、１、１２−２２、１
９７９」という表現は、著者名（山田、鈴木）、タイト
ル（○○に関するモデル）、掲載雑誌名（信学論
（Ｂ））、ボリューム（Ｊ６２−Ｂ）、ナンバー
（１）、ページ（１２−２２）、発行年度（１９７９）
などの参考文献という概念のそれぞれの構成要素に分割
され、次にそれぞれの構成要素はその標準的な表現に変
換される。この例では例えば、著者名（山田、鈴木）、
タイトル（○○に関するモデル）、掲載雑誌名（電子情
報通信学会論文誌（Ｂ））、ボリューム（Ｖｏｌ．Ｊ６
２−Ｂ）、ナンバー（Ｎｏ．１）、ページ（ＰＰ．１２
−２２）、発行年度（１９７９）に変換される。次に各
構成要素を合成し「山田、鈴木：○○に関するモデル、
電子情報通信学会論文誌（Ｂ）、Ｖｏｌ．Ｊ６２−Ｂ、
Ｎｏ．１、ＰＰ．１２−２２、１９７９」が標準的な参
考文献の表現となる。FIG. 15 is a flow chart for explaining the operation of the synonymous expression conversion section 8. First, in step ST801, the associative unit 2 obtains a concept expression CC composed of a plurality of words to be converted. Next, in step ST802, CC
Is divided into components, and each divided component is SC
1 ,. ．． SCi ,. ．． SCn. Then, in step ST803, SCi is converted into a synonym from SC1 to SCn, and the standardized expression HCi is obtained. Then, in step ST804, HC1 ,. ．． HCi ,. ．． HCN is synthesized and used as the standard expression HCC. An example of an expression of a concept composed of a plurality of words is a reference document such as a technical document. For example, "Yamada, Suzuki: Model for ○○, Theory of Science (B), J62-B, 1, 12-22, 1
The expression "979" refers to the author name (Yamada, Suzuki), title (model for XX), journal name (faculty theory (B)), volume (J62-B), number (1), page (12- 22), the year of issue (1979)
Etc. is divided into each component of the concept of references, and then each component is converted into its standard representation. In this example, for example, the author name (Yamada, Suzuki),
Title (model related to ○○), published journal name (IEICE Transactions (B)), volume (Vol. J6
2-B), number (No. 1), page (PP.12)
-22), is converted to the year of issue (1979). Next, each component is synthesized and "Yamada, Suzuki: model about ○○,
IEICE Transactions (B), Vol. J62-B,
No. 1, PP. 12-22, 1979 "is a standard reference expression.

【００３２】[0032]

【発明の効果】以上のように請求項１の発明によれば知
識抽出装置を、大量のテキストを蓄積するテキストデー
タベースと、そのテキストから概念対を連想する連想部
と、その概念対がどのような関係にあるかをドメイン知
識ベースを参照して知識合成部が調べるように構成した
ので、概念間の関係が分かっている概念対を得る（知識
を抽出する）ことができ、その概念対を別の知識情報処
理で利用することができるなどの効果があるAs described above, according to the first aspect of the invention, the knowledge extraction apparatus is provided with a text database for accumulating a large amount of text, an associative unit for associating a concept pair from the text, and the concept pair. Since the knowledge synthesis unit refers to the domain knowledge base to check if there is a relation, it is possible to obtain a concept pair (extract knowledge) for which the relation between concepts is known, and There is an effect that it can be used in another knowledge information processing

【００３３】請求項２の発明によれば、請求項１の発明
の効果に加えてテキスト構造選択部を、テキストの構造
から判断することができるテキスト内の重要な部分を、
ユーザが選択するように構成したので、ユーザが重要で
あると考えるテキストの一部分から新たな知識が抽出で
きるなどの効果がある。According to the second aspect of the invention, in addition to the effect of the first aspect of the invention, the text structure selection portion can detect important parts in the text which can be judged from the structure of the text.
Since the selection is made by the user, there is an effect that new knowledge can be extracted from a part of the text that the user thinks is important.

【００３４】請求項３の発明によれば、請求項１の発明
の効果に加えて同義語変換部を、同義語関係にある複数
の単語を代表的な１つの単語に変換するように構成した
ので、ドメイン知識ベースに無い単語からも知識を抽出
できるなどの効果がある。According to the invention of claim 3, in addition to the effect of the invention of claim 1, the synonym conversion unit is configured to convert a plurality of words having a synonym relationship into one representative word. Therefore, there is an effect that knowledge can be extracted even from words that are not in the domain knowledge base.

【００３５】請求項４の発明によれば、請求項３の発明
の効果に加えて同義関係登録部を、同義と判定された単
語を同義語変換部の同義語辞書に追加するように構成し
たので、対象とするテキストにあった同義語を同義語辞
書に追加でき、知識抽出装置の性能が向上するなどの効
果がある。According to the invention of claim 4, in addition to the effect of the invention of claim 3, the synonym relation registration unit is configured to add the word determined to be synonymous to the synonym dictionary of the synonym conversion unit. Therefore, synonyms that match the target text can be added to the synonym dictionary, which has the effect of improving the performance of the knowledge extraction device.

【００３６】請求項５の発明によれば、請求項１の発明
の効果に加えて同義表現変換部を、同義関係にある複数
の単語から構成される概念を１つの標準的な表現に変換
するように構成したので、複数の単語から構成される概
念を対象として知識を抽出できるなどの効果がある。According to the invention of claim 5, in addition to the effect of the invention of claim 1, the synonymous expression conversion unit converts a concept composed of a plurality of synonymous words into one standard expression. Since it is configured as described above, there is an effect that knowledge can be extracted for a concept composed of a plurality of words.

[Brief description of drawings]

【図１】請求項１の発明による知識抽出装置の一実施例
を示す全体構成図である。FIG. 1 is an overall configuration diagram showing an embodiment of a knowledge extraction device according to the invention of claim 1.

【図２】ドメイン知識ベース例を示す図である。FIG. 2 is a diagram showing an example of a domain knowledge base.

【図３】連想部の例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of an associative unit.

【図４】知識合成部の動作を説明するフローチャートで
ある。FIG. 4 is a flowchart illustrating an operation of a knowledge synthesizing unit.

【図５】請求項２の発明による知識抽出装置の一実施例
を示す全体構成図である。FIG. 5 is an overall configuration diagram showing an embodiment of a knowledge extracting device according to the invention of claim 2;

【図６】請求項３の発明による知識抽出装置の一実施例
を示す全体構成図である。FIG. 6 is an overall configuration diagram showing an embodiment of a knowledge extracting device according to the invention of claim 3;

【図７】請求項４の発明による知識抽出装置の一実施例
を示す全体構成図である。FIG. 7 is an overall configuration diagram showing an embodiment of a knowledge extraction device according to the invention of claim 4;

【図８】請求項５の発明による知識抽出装置の一実施例
を示す全体構成図である。FIG. 8 is an overall configuration diagram showing an embodiment of a knowledge extraction device according to the invention of claim 5;

【図９】構造情報を持つテキストデータ例を示す図であ
る。FIG. 9 is a diagram showing an example of text data having structure information.

【図１０】テキスト構造選択部の動作を説明するフロー
チャートである。FIG. 10 is a flowchart illustrating an operation of a text structure selection unit.

【図１１】同義語変換部の例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an example of a synonym conversion unit.

【図１２】同義語変換部の動作を説明するフローチャー
トである。FIG. 12 is a flowchart illustrating an operation of a synonym conversion unit.

【図１３】同義関係登録部の動作を説明するフローチャ
ートである。FIG. 13 is a flowchart illustrating an operation of a synonym relationship registration unit.

【図１４】同義表現変換部の構成例を示す図である。FIG. 14 is a diagram illustrating a configuration example of a synonymous expression conversion unit.

【図１５】同義表現変換部の動作を説明するフローチャ
ートである。FIG. 15 is a flowchart illustrating an operation of a synonymous expression conversion unit.

【図１６】従来の知識抽出装置を示す構成図である。FIG. 16 is a block diagram showing a conventional knowledge extraction device.

[Explanation of symbols]

１テキストデータベース２連想部３ドメイン知識ベース４知識合成部５テキスト構造選択部６同義語変換部７同義関係登録部８同義表現変換部１１ニューラルネットワーク１２概念入力手段１３連想出力手段 DESCRIPTION OF SYMBOLS 1 text database 2 associative unit 3 domain knowledge base 4 knowledge synthesis unit 5 text structure selection unit 6 synonym conversion unit 7 synonym relation registration unit 8 synonymous expression conversion unit 11 neural network 12 concept input unit 13 associative output unit

Claims

[Claims]

1. A text database for accumulating text data including concepts, and an associative relationship between each concept included in the text data accumulated by the text database is obtained, and concepts having an associative relationship with a certain concept are extracted. And a domain knowledge base that accumulates, as knowledge, concepts that have a predetermined relationship to a certain concept in advance,
A knowledge extracting device comprising: a knowledge synthesizing unit that refers to the knowledge accumulated by the domain knowledge base and extracts a concept having a predetermined relationship with the concept having the associative relationship extracted by the associative unit.

2. A text database that stores text data including a concept, a text structure selection unit that selects a part of the text data stored by the text database, and text data selected by the text structure selection unit. Associative part that obtains the association relation between each concept included in the, and extracts the concept that has the association relation to a certain concept, and the domain knowledge base that accumulates the concepts that have a certain relation to the certain concept as knowledge in advance. And refer to the knowledge accumulated by the above domain knowledge base,
A knowledge extracting device comprising: a knowledge synthesizing unit that extracts a concept having a predetermined relationship with the concept having the associative relationship extracted by the associative unit.

3. A text database for accumulating text data including concepts, a synonym conversion unit for converting synonyms of the concepts contained in the text data accumulated by the text database into standard concepts, and An associative unit that obtains an associative relationship between standard concepts converted by the synonym conversion unit and extracts a concept that has an associative relationship with a concept, and a concept that has a predetermined relationship with the concept in advance. A domain knowledge base that accumulates as knowledge, and a knowledge synthesizing unit that refers to the knowledge accumulated by the domain knowledge base and extracts a concept having a predetermined relationship with the concept having the associative relationship extracted by the associative unit. Knowledge extraction device.

4. A text database that stores text data including a concept, a synonym conversion unit that converts synonyms of the concepts contained in the text data stored by the text database into standard concepts, and An associative unit that obtains an associative relationship between standard concepts converted by the synonym conversion unit and extracts a concept that is in an associative relationship with a certain concept, and a result that is associated by the associative unit as a result When a pair has a synonymous relationship, a synonym relation registration unit that registers the concept pair in the synonym conversion unit, a domain knowledge base that stores in advance a concept having a predetermined relationship to a certain concept as knowledge, A knowledge synthesizing unit that refers to the knowledge accumulated by the domain knowledge base and extracts a concept having a predetermined relationship with the concept having the associative relationship extracted by the associative unit. Knowledge extractor equipped with and.

5. A text database for accumulating text data containing a concept composed of a plurality of words, and a synonym for converting synonyms of the words constituting the concept accumulated by the text database into a standard concept. An expression conversion unit, an association unit that obtains an associative relationship between standard concepts converted by the synonymous expression conversion unit, and extracts a concept that is in an associative relationship with a certain concept; Domain knowledge base for accumulating concepts in the relationship of knowledge as knowledge, and knowledge accumulated by the domain knowledge base, to refer to a concept having a predetermined relationship with the concept in the associative relationship extracted by the associative unit. A knowledge extracting device having a knowledge synthesizing unit for extracting.