JP2001188678A

JP2001188678A - Language case inferring device, language case inferring method, and storage medium on which language case inference program is described

Info

Publication number: JP2001188678A
Application number: JP2000000379A
Authority: JP
Inventors: Yasuhiro Takayama; 泰博高山; Takeyuki Aikawa; 勇之相川; Katsushi Suzuki; 克志鈴木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-01-05
Filing date: 2000-01-05
Publication date: 2001-07-10

Abstract

PROBLEM TO BE SOLVED: To solve such a problem that the degree of similarity of a detailed natural language including aspect expression of a sentence cannot be calculated and the case including the sentence which is syntactically and semantically similar to a retrieval sentence cannot be retrieved since calculation of the degree of similarity is executed only by using frequency information and category classification of a keyword extracted from the case. SOLUTION: When a keyword is extracted from at least one or more case sentences, a cluster to which each case sentence belongs is decided and a problem solving tree to be characterized by a set of each keyword is generated on the basis of a decision result of the cluster, the cluster to which each case sentence belongs is segmentised on the basis of the attribute of language expression of each case sentence which belongs to each cluster of the problem solving tree.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、過去の事例文を
分類して事例データベースに登録し、検索文に類似する
事例文を検索する言語事例推論装置，言語事例推論方法
及び言語事例推論プログラムが記述された記憶媒体に関
するものである。The present invention relates to a language case inference apparatus, a language case inference method, and a language case inference program for classifying past case sentences, registering them in a case database, and searching for a case sentence similar to a search sentence. It relates to the described storage medium.

【０００２】[0002]

【従来の技術】相談窓口における問い合わせ応対業務や
障害監視センターにおける障害対応業務などでのやりと
りを電子的な手段で蓄積したデータを再利用したいとい
う要求が強くある。こうした要求に対して、予め、自然
言語で表現された事例文を事例データベースに登録し、
新規の入力文である検索文と類似する事例文を検索する
言語事例推論装置が従来から存在する。2. Description of the Related Art There is a strong demand for reusing data accumulated by electronic means in exchanges of inquiries at a consultation desk and at a failure monitoring center at a failure monitoring center. In response to such a request, a case sentence expressed in natural language is registered in the case database in advance,
Conventionally, there is a language case inference apparatus that searches for a case sentence similar to a search sentence that is a new input sentence.

【０００３】図２８は例えば特開平９−７３４６４号公
報に示された従来の言語事例推論装置を示す構成図であ
り、図において、１は自然言語文で記述された過去の事
例、２は過去の事例１を登録する事例データベース、３
は事例データベース２に格納されている事例１の［問
題］中からキーワードを抽出するとともに、事例数テー
ブル７を参照して、事例１の属性情報を生成する属性情
報生成手段である。FIG. 28 is a block diagram showing a conventional language case inference apparatus disclosed in, for example, Japanese Patent Application Laid-Open No. 9-73364. In the drawing, reference numeral 1 denotes a past case described in a natural language sentence; Case database that registers case 1
Is an attribute information generating unit that extracts keywords from [problems] of the case 1 stored in the case database 2 and generates attribute information of the case 1 by referring to the case number table 7.

【０００４】４は属性情報生成手段３により抽出された
キーワードを特定するキーワード番号を格納するキーワ
ード番号テーブル、５は事例１を特定する事例番号を格
納するとともに、各事例に現れるキーワードを特定する
情報を格納するキーワードテーブル、６は各事例の分類
結果を示すカテゴリ番号を格納するカテゴリテーブル、
７はカテゴリ番号に対応するキーワード番号の数ｔ_ｉｊ
等を格納する事例数テーブル、８は属性情報９を格納す
る属性データベース、９は事例番号，カテゴリ番号，キ
ーワード番号及びキーワード毎の重みから構成される属
性情報、１０は新規事例と過去の事例との類似度を生成
する類似度生成手段である。[0004] Reference numeral 4 denotes a keyword number table storing keyword numbers specifying keywords extracted by the attribute information generating means 3, and reference numeral 5 denotes a case number storing case numbers specifying case 1 and information specifying keywords appearing in each case. Is a keyword table, 6 is a category table for storing a category number indicating the classification result of each case,
7 is the number t _{ij of} keyword numbers corresponding to category numbers
A case number table for storing attribute information 9; an attribute database 8 for storing attribute information 9; 9; attribute information including a case number, a category number, a keyword number, and a weight for each keyword; Is a similarity generating means for generating the similarity of.

【０００５】次に動作について説明する。まず、過去の
事例１の蓄積時においては、属性情報生成手段３が、事
例データベース２に格納する事例１の[問題]中からキー
ワードを抽出し、事例数テーブル７を参照して、カテゴ
リＣ_ｉに属する事例の総数Ｓ_ｉ、属性番号ｊの属性が出
現する事例の総数ｔ_ｊ、カテゴリＣ_ｉの属性番号ｊの属
性に対する重みω_ｉｊを生成し、これらの属性情報を属
性データベース８に格納する。 ω_ｉｊ＝（ｔ_ｉｊ／ｔ_ｊ−Ｓ_ｉ／Ｓ）＋（ｔ_ｉｊ／Ｓ_ｉ
―ｔ_ｊ／Ｓ）Ｓ＝Ｓ_１＋Ｓ_２＋…＋Ｓ_ｎ Next, the operation will be described. First, when the past case 1 is accumulated, the attribute information generating means 3 extracts a keyword from the [problem] of the case 1 stored in the case database 2 and refers to the case number table 7 to find the category C _i. to generate a weight ω _ij for example the total number S _i, the total number t _j of cases attribute of the attribute number j _appears, the attribute of the attribute number j of category C _i of belonging to, to store these attribute information in the attribute database 8 . ω _ij = (t _ij / t _j −S _i / S) + (t _ij / S _i
−t _j / S) S = S ₁ + S ₂ +... + S _n

【０００６】次に、事例の検索時においては、類似度生
成手段１０が新たに入力された検索文からキーワードを
抽出し、キーワード番号テーブル４からキーワード番号
を求める。次に、類似度生成手段１０は、検索文から抽
出したキーワードのリストから属性データベース８を検
索し、一致するキーワードを含む属性情報を取り出し、
類似度Ｇを計算する。Ｇ＝（重みの総和）×（一致したキーワード数／属性デ
ータベース８から取り出したキーワードリストの長さ）そして、類似度生成手段１０は、類似度Ｇの高い順に事
例番号をソートし、その事例の内容を表示したり印字し
たりする。Next, when searching for a case, the similarity generating means 10 extracts a keyword from the newly input search sentence and obtains a keyword number from the keyword number table 4. Next, the similarity generation means 10 searches the attribute database 8 from a list of keywords extracted from the search sentence, extracts attribute information including a matching keyword,
The similarity G is calculated. G = (sum of weights) × (number of matched keywords / length of keyword list extracted from attribute database 8) Then, similarity generating means 10 sorts the case numbers in descending order of similarity G, and Display or print the contents.

【０００７】[0007]

【発明が解決しようとする課題】従来の言語事例推論装
置は以上のように構成されているので、事例から抽出さ
れたキーワードの頻度情報やカテゴリ分類のみを用いて
類似度計算を実行するため、文の様相表現等を含めた詳
細な自然言語の類似度を計算することができず、検索文
と構文的・意味的に類似する文を含む事例を検索するこ
とができない課題があった。また、一般に文の構文構造
だけですべての事例の類似性を判定しようとすると、多
大な処理時間がかかってしまう課題があった。Since the conventional language case inference apparatus is configured as described above, similarity calculation is performed using only frequency information and category classification of keywords extracted from the case. There is a problem that it is not possible to calculate a detailed natural language similarity including a sentence modal expression and the like, and it is not possible to search for a case including a sentence syntactically and semantically similar to a search sentence. Further, in general, if it is attempted to determine the similarity of all cases using only the syntax structure of a sentence, there has been a problem that a great deal of processing time is required.

【０００８】この発明は上記のような課題を解決するた
めになされたもので、効率よく事例を分類して問題解決
木を構築することができるとともに、検索文と構文的・
意味的に類似する文を含む事例を検索することができる
言語事例推論装置，言語事例推論方法及び言語事例推論
プログラムが記述された記憶媒体を得ることを目的とす
る。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and it is possible to efficiently classify cases to construct a problem solving tree, and to provide a search statement and syntactical
It is an object of the present invention to provide a language case inference apparatus, a language case inference method, and a storage medium in which a language case inference program is described, which can search for a case including a sentence having a semantically similar sentence.

【０００９】[0009]

【課題を解決するための手段】この発明に係る言語事例
推論装置は、少なくとも１以上の事例文からキーワード
を抽出して、各事例文が属するクラスタを決定し、その
決定結果に基づいて各キーワードの集合で特徴付けられ
る問題解決木を生成するキーワードクラスタリング手段
と、そのキーワードクラスタリング手段により生成され
た問題解決木の各クラスタに属する各事例文の言語表現
の属性に基づいて、各事例文が属するクラスタを細分化
する詳細クラスタリング手段とを設けたものである。SUMMARY OF THE INVENTION A language case inference apparatus according to the present invention extracts a keyword from at least one or more case sentences, determines a cluster to which each case sentence belongs, and determines each keyword based on the determination result. Keyword clustering means for generating a problem solving tree characterized by a set of, and each case sentence belongs based on the attribute of the linguistic expression of each case sentence belonging to each cluster generated by the problem solving tree generated by the keyword clustering means. And a detailed clustering means for subdividing a cluster.

【００１０】この発明に係る言語事例推論方法は、少な
くとも１以上の事例文からキーワードを抽出して、各事
例文が属するクラスタを決定し、その決定結果に基づい
て各キーワードの集合で特徴付けられる問題解決木を生
成すると、その問題解決木の各クラスタに属する各事例
文の言語表現の属性に基づいて、各事例文が属するクラ
スタを細分化するようにしたものである。The language case inference method according to the present invention extracts a keyword from at least one case sentence, determines a cluster to which each case sentence belongs, and is characterized by a set of keywords based on the determination result. When the problem solving tree is generated, the cluster to which each case sentence belongs is subdivided based on the attribute of the linguistic expression of each case sentence belonging to each cluster of the problem solving tree.

【００１１】この発明に係る言語事例推論方法は、事例
文からキーワードを抽出するに際して、電子化文書から
事例文を抽出し、その事例文の種別を示すタグを付与す
るようにしたものである。In the language case inference method according to the present invention, when extracting a keyword from a case sentence, the case sentence is extracted from the digitized document, and a tag indicating the type of the case sentence is added.

【００１２】この発明に係る言語事例推論方法は、問題
解決木を生成する際、クラスタの階層数を指定するよう
にしたものである。In the language case inference method according to the present invention, when generating a problem solving tree, the number of hierarchical levels of a cluster is specified.

【００１３】この発明に係る言語事例推論方法は、各事
例文が属するクラスタを細分化する際、クラスタを構成
する事例文間の最低限の類似度を指定するようにしたも
のである。In the language case inference method according to the present invention, when subdividing the cluster to which each case sentence belongs, the minimum similarity between the case sentences constituting the cluster is specified.

【００１４】この発明に係る言語事例推論方法は、事例
文からキーワードを抽出する際、その事例文中の単語の
うち、出現頻度が設定値より高い単語をキーワードとし
て抽出するようにしたものである。In the language case inference method according to the present invention, when a keyword is extracted from a case sentence, a word having an appearance frequency higher than a set value among words in the case sentence is extracted as a keyword.

【００１５】この発明に係る言語事例推論方法は、事例
文からキーワードを抽出する際、対象領域又は業務に依
存する単語と各単語間の関係に関する知識を保有するオ
ントロジを参照し、その事例文中の単語のうち、そのオ
ントロジに保有されている単語をキーワードとして抽出
するようにしたものである。In the language case inference method according to the present invention, when a keyword is extracted from a case sentence, an ontology that holds knowledge of a word depending on a target area or a task and a relationship between the words is referred to, and the case sentence in the case sentence is referred to. Among the words, words held in the ontology are extracted as keywords.

【００１６】この発明に係る言語事例推論方法は、各事
例文が属するクラスタを細分化する際、対象領域又は業
務に依存する単語と各単語間の関係に関する知識を保有
するオントロジを参照し、そのオントロジに保有されて
いる単語と単語間の関係に関する知識を詳細クラスタリ
ングに用いるようにしたものである。In the language case inference method according to the present invention, when subdividing a cluster to which each case sentence belongs, the language refers to an ontology that holds knowledge about a word that depends on a target area or business and a relationship between the words, The knowledge about the relationship between words and words held in the ontology is used for detailed clustering.

【００１７】この発明に係る言語事例推論方法は、オン
トロジが意味的な上位−下位関係を示すＩＳ−Ａ関係知
識を保有するようにしたものである。In the language case inference method according to the present invention, the ontology has IS-A relation knowledge indicating a semantic upper-lower relation.

【００１８】この発明に係る言語事例推論方法は、オン
トロジが意味的な部分−全体関係を示すＨＡＳ−Ａ関係
知識を保有するようにしたものである。In the language case inference method according to the present invention, the ontology has HAS-A relation knowledge indicating a semantic part-whole relation.

【００１９】この発明に係る言語事例推論方法は、オン
トロジが格関係知識を保有するようにしたものである。In the language case inference method according to the present invention, the ontology holds case knowledge.

【００２０】この発明に係る言語事例推論方法は、オン
トロジが言い換え知識を保有するようにしたものであ
る。The language case inference method according to the present invention is such that the ontology holds paraphrase knowledge.

【００２１】この発明に係る言語事例推論方法は、オン
トロジが背反関係知識を保有するようにしたものであ
る。In the language case inference method according to the present invention, the ontology holds conflicting knowledge.

【００２２】この発明に係る言語事例推論方法は、クラ
スタに属する事例文と検索文の類似文照合を実行する
際、構文的要素の属性に基づいて意味構造の照合を実行
するようにしたものである。In the language case inference method according to the present invention, when performing similar sentence matching between a case sentence belonging to a cluster and a search sentence, matching of a semantic structure is executed based on attributes of syntactic elements. is there.

【００２３】この発明に係る言語事例推論方法は、クラ
スタに属する事例文と検索文の類似文照合を実行する
際、照合の詳細度を指定するようにしたものである。In the language case inference method according to the present invention, when performing similar sentence matching between a case sentence belonging to a cluster and a search sentence, the degree of detail of matching is specified.

【００２４】この発明に係る言語事例推論方法は、照合
の詳細度として、構文解析木における木構造の深さを指
定するようにしたものである。In the language case inference method according to the present invention, the depth of a tree structure in a parse tree is specified as the level of detail of collation.

【００２５】この発明に係る言語事例推論方法は、各ク
ラスタ間の類似関係をデータベースに記述するようにし
たものである。In the language case inference method according to the present invention, the similarity between each cluster is described in a database.

【００２６】この発明に係る言語事例推論方法は、各ク
ラスタ間の背反関係をデータベースに記述するようにし
たものである。In the language case inference method according to the present invention, the conflicting relation between each cluster is described in a database.

【００２７】この発明に係る言語事例推論プログラムが
記述された記憶媒体は、少なくとも１以上の事例文から
キーワードを抽出して、各事例文が属するクラスタを決
定し、その決定結果に基づいて各キーワードの集合で特
徴付けられる問題解決木を生成するキーワードクラスタ
リング処理手順と、そのキーワードクラスタリング処理
手順により生成された問題解決木の各クラスタに属する
各事例文の言語表現の属性に基づいて、各事例文が属す
るクラスタを細分化する詳細クラスタリング処理手順と
を記述するようにしたものである。The storage medium in which the language case inference program according to the present invention is described extracts keywords from at least one or more case sentences, determines a cluster to which each case sentence belongs, and determines each keyword based on the determination result. Based on the keyword clustering processing procedure for generating a problem solving tree characterized by a set of, and the attribute of the linguistic expression of each case sentence belonging to each cluster generated by the keyword clustering processing procedure. And a detailed clustering processing procedure for subdividing the cluster to which the belongs.

【００２８】[0028]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による言
語事例推論装置を示す構成図であり、図において、１１
は自然言語により記述された事例文を含む問合わせ記録
等の電子化文書、１２は電子化文書１１から少なくとも
１以上の事例文等を抽出する文抽出部、１３は文抽出部
１２により抽出された各事例文及び検索文入力部２１に
より入力された検索文２０の形態素解析や構文解析を実
行する文解析部、１４はキーワードクラスタリング部１
５及び詳細クラスタリング部１６を制御して、各事例文
を分類する類似事例分類部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a configuration diagram showing a language case inference apparatus according to Embodiment 1 of the present invention.
Is a digitized document such as an inquiry record including a case sentence described in a natural language, 12 is a sentence extracting unit for extracting at least one or more case sentences from the digitized document 11, and 13 is extracted by the sentence extracting unit 12. A sentence analysis unit that performs morphological analysis and syntax analysis of each of the case sentences and the search sentence 20 input by the search sentence input unit 21, and a keyword clustering unit 1.
5 and a detailed clustering unit 16 to classify each case sentence.

【００２９】１５は文抽出部１２により抽出された各事
例文からキーワードを抽出して、各事例文が属するクラ
スタを決定し、その決定結果に基づいて各キーワードの
集合で特徴付けられる問題解決木を生成するキーワード
クラスタリング部（キーワードクラスタリング手段）、
１６はキーワードクラスタリング部１５により生成され
た問題解決木の各クラスタに属する各事例文の言語表現
の属性に基づいて、各事例文が属するクラスタを細分化
する詳細クラスタリング部（詳細クラスタリング手
段）、１７は詳細クラスタリング部１６によるクラスタ
リング結果や文解析部１３の解析結果等を格納する事例
データベース、１８は対象領域又は業務に依存する単語
と各単語間の対応関係に関する知識を保有するオントロ
ジ、１９は詳細クラスタリング部１６によるクラスタリ
ング結果等を編集する事例クラスタ編集部である。Reference numeral 15 denotes a problem solving tree characterized by extracting a keyword from each case sentence extracted by the sentence extraction unit 12, determining a cluster to which each case sentence belongs, and characterizing the set of each keyword based on the decision result. A keyword clustering unit (keyword clustering means) for generating
A detailed clustering unit (detailed clustering means) 16 subdivides the cluster to which each case sentence belongs based on the attribute of the linguistic expression of each case sentence belonging to each cluster generated by the keyword clustering unit 15. Is a case database that stores the results of clustering by the detailed clustering unit 16 and the results of analysis by the sentence analysis unit 13, 18 is an ontology that holds knowledge on words that depend on the target area or business and the correspondence between each word, and 19 is details This is a case cluster editing unit for editing a clustering result or the like by the clustering unit 16.

【００３０】２０は検索文、２１は検索文２０を入力す
る検索文入力部、２２は検索文２０からキーワードを抽
出して、そのキーワードを要素とするクラスタを問題解
決木から検索し、そのクラスタに属する事例文と検索文
２０の類似文照合を実行する類似事例検索部（類似事例
検索手段）、２３は類似事例検索部２２の検索結果等を
表示する検索結果表示部である。図２はこの発明の実施
の形態１による言語事例推論方法（事例構築処理）を示
すフローチャートである。なお、図１の言語事例推論装
置のすべてをソフトウエアで構成し、コンピュータによ
り読み取り可能な言語事例推論プログラムを記述した記
憶媒体を用意するようにしてもよい。Reference numeral 20 denotes a search sentence, reference numeral 21 denotes a search sentence input unit for inputting the search sentence 20, and reference numeral 22 extracts a keyword from the search sentence 20 and searches a cluster having the keyword as an element from a problem solving tree. Is a similar case search unit (similar case search means) for executing similar sentence matching between the case sentences belonging to the search sentence 20 and the search sentence 20, and a search result display unit 23 for displaying the search results and the like of the similar case search unit 22. FIG. 2 is a flowchart showing a language case inference method (case construction processing) according to Embodiment 1 of the present invention. Note that all of the language case inference apparatus in FIG. 1 may be configured by software, and a storage medium in which a computer-readable language case inference program is described may be prepared.

【００３１】次に動作について説明する。まず、ステッ
プＳＴ１において、文抽出部１２が問い合わせ記録等の
電子化文書１１中から事例文などの処理対象の文を抽出
する。Next, the operation will be described. First, in step ST1, the sentence extraction unit 12 extracts a sentence to be processed such as a case sentence from the digitized document 11 such as an inquiry record.

【００３２】図６は電子化文書１１の一例を示す説明図
である。一つの電子化文書ファイル中に複数の文書が格
納されている場合があり、図６の例では、文書開始タグ
＜ＤＯＣ＞と文書終了タグ＜／ＤＯＣ＞とにより、文書
の区切りが表わされている。また、一つの文書は複数の
文書フィールドから構成されるものとする。例えば、図
６に示すように、＜顧客名＞、＜顧客電話＞、＜機種
＞、＜件名＞といった内容が数字や簡単な文字列などで
表現できる定型フィールド部分と、複数の自然言語文が
記述される＜質問＞、＜回答＞といった非定型フィール
ド部分とから構成される。FIG. 6 is an explanatory diagram showing an example of the digitized document 11. In some cases, a plurality of documents are stored in one digitized document file. In the example of FIG. 6, a document start tag <DOC> and a document end tag </ DOC> indicate a document break. ing. One document is composed of a plurality of document fields. For example, as shown in FIG. 6, a fixed field portion in which contents such as <customer name>, <customer phone>, <model>, and <subject> can be expressed by a number or a simple character string, and a plurality of natural language sentences It consists of a non-standard field part such as <question> and <answer> to be described.

【００３３】図３は文抽出処理（ステップＳＴ１）の具
体的内容を示すフローチャートである。図６に示すよう
な電子化文書１１を読み込む場合の処理について説明す
る。まず、ステップＳＴ１１において、最初の文書開始
タグを取り出す。ステップＳＴ１２では、文書開始タグ
を取り出すことができたか否かを判定する。文書開始タ
グを取り出すことができた場合には、ステップＳＴ１３
において、文書Ａを「事例」として処理するために一意
の事例番号を付与する。FIG. 3 is a flowchart showing the specific contents of the sentence extraction process (step ST1). Processing for reading the digitized document 11 as shown in FIG. 6 will be described. First, in step ST11, the first document start tag is extracted. In step ST12, it is determined whether or not the document start tag has been successfully extracted. If the document start tag can be taken out, step ST13
, A unique case number is assigned to process the document A as a “case”.

【００３４】次に、ステップＳＴ１４において、一文書
に対する文抽出処理を実施し、一文書に対する文抽出処
理が終了すると、ステップＳＴ１５において、文書終了
タグを確認し、文書終了タグが無い場合には警告等を行
う。次に、ステップＳＴ１６において、次の文書Ｂに対
する文書開始タグを取り出し、ステップＳＴ１２から同
様の処理を繰り返す。ステップＳＴ１２において、文書
開始タグの取り出しが検出されなくなるか、文書ファイ
ルの末尾に到達するまで上記の処理を繰り返し実行す
る。Next, in step ST14, a sentence extraction process is performed on one document, and when the sentence extraction process on one document is completed, a document end tag is checked in step ST15. If there is no document end tag, a warning is issued. And so on. Next, in step ST16, a document start tag for the next document B is extracted, and the same processing is repeated from step ST12. In step ST12, the above processing is repeatedly executed until the removal of the document start tag is not detected or the end of the document file is reached.

【００３５】図４は図３における文抽出処理（ステップ
ＳＴ１４）の具体的内容を示すフローチャートである。
一文書に対する文抽出処理（ステップＳＴ１４）は、電
子化文書１１を受け取ると、まず、ステップＳＴ２１に
おいて、電子化文書１１中に文書フィールド開始タグ
（記号“＜”と記号“＞”で囲まれた文字列。以下、開
始タグと略記する。）があるか否かを調べ、開始タグが
あれば、その文字列をフィールド名として取り出す。図
６の例では、＜顧客名＞から文字列“顧客名”をフィー
ルド名として取り出す。この時、ステップＳＴ２２にお
いて、開始タグが無かった場合には、その電子化文書１
１に対する処理を終了する。FIG. 4 is a flowchart showing the specific contents of the sentence extraction process (step ST14) in FIG.
In the sentence extraction process for one document (step ST14), when the digitized document 11 is received, first, in step ST21, the document field start tags (enclosed by the symbols “<” and “>”) in the digitized document 11 are received. A character string (hereinafter abbreviated as a start tag) is checked to see if it exists, and if there is a start tag, the character string is extracted as a field name. In the example of FIG. 6, the character string "customer name" is extracted from <customer name> as a field name. At this time, if there is no start tag in step ST22, the digitized document 1
The processing for 1 is completed.

【００３６】ステップＳＴ２２において、開始タグがあ
る場合には、ステップＳＴ２３において、そのフィール
ドが定型フィールドであるか否かを判定する。フィール
ドが定型フィールドである場合には、ステップＳＴ２４
において、フィールドの内容、例えば、図６では、フィ
ールドが＜顧客名＞のとき文字列「山田太郎」をそのま
ま事例データベース１７の対応する欄に登録する。続い
て、ステップＳＴ２５において、文書フィールド終了タ
グ（記号“＜／”と記号“＞”とで囲まれた文字列）、
例えば、＜／顧客名＞などがあるか否かを確認し、無け
れば警告などを表示する。If there is a start tag in step ST22, it is determined in step ST23 whether or not the field is a fixed field. If the field is a fixed field, step ST24
In FIG. 6, for example, in FIG. 6, when the field is <customer name>, the character string “Taro Yamada” is registered in the corresponding column of the case database 17 as it is. Subsequently, in step ST25, a document field end tag (a character string surrounded by a symbol “<//” and a symbol “>”),
For example, it is confirmed whether or not </ customer name> is present, and if not, a warning is displayed.

【００３７】一方、ステップＳＴ２３において、そのフ
ィールドが非定型フィールドであると判定された場合、
例えば、図６における＜質問＞タグ等の場合には、ステ
ップＳＴ２６において、事例番号とフィールド名を渡し
て非定型フィールド処理を実行する（具体的な処理内容
は後述する）。ステップＳＴ２５又はステップＳＴ２６
の処理が終了すると、ステップＳＴ２７において、次の
文書フィールドの開始タグを取り出し、ステップＳＴ２
２から同様の処理を繰り返す。On the other hand, if it is determined in step ST23 that the field is an irregular field,
For example, in the case of the <question> tag or the like in FIG. 6, in step ST26, an atypical field process is executed by passing the case number and the field name (specific processing contents will be described later). Step ST25 or step ST26
Is completed, in step ST27, the start tag of the next document field is extracted, and the process proceeds to step ST2.
The same processing is repeated from step 2.

【００３８】なお、ステップＳＴ２３における定型フィ
ールドと非定型フィールドの判定は、図８に示すような
文書フィールド情報を参照することにより行う。文書フ
ィールド情報には、フィールドの名称（例えば、顧客名
や質問）や、各フィールドが定型か非定型かを示す定型
／非定型情報が予め定義されている。また、文書フィー
ルド情報には、事例データの集合を大きく分類するため
の「カテゴリ属性」や、事例データの一覧を表示したり
するときに各事例を識別するために用いる「タイトル属
性」などのオプション情報も必要に応じて定義されてい
る。It should be noted that the determination of the fixed field and the non-fixed field in step ST23 is performed by referring to document field information as shown in FIG. In the document field information, field names (for example, customer names and questions) and standard / non-standard information indicating whether each field is standard or non-standard is defined in advance. In addition, document field information includes options such as "category attribute" for classifying a large set of case data and "title attribute" for identifying each case when displaying a list of case data. Information is defined as needed.

【００３９】図５は非定型フィールド処理（ステップＳ
Ｔ２６）の具体的内容を示すフローチャートである。ス
テップＳＴ３１において、現在の事例番号と現在のフィ
ールド名が受け渡されると、電子化文書１１中の該当す
る非定型フィールドに記述されている複数の文の中か
ら、句点（。）や、中黒（・）、章番号等の箇条書きを
表す文字列などで判定して、一つの文を抽出する。FIG. 5 shows an atypical field process (step S).
It is a flowchart which shows the specific content of T26). In step ST31, when the current case number and the current field name are passed, a period (.) Or a bullet is selected from a plurality of sentences described in the corresponding atypical field in the digitized document 11. One sentence is extracted by judging with (•), a character string representing an itemized list such as a chapter number, or the like.

【００４０】図６の例では、＜質問＞フィールドからは
最初に「エアコンから音がする。」を文として抽出す
る。ここで抽出した文のことを、事例データを構成する
「事例文」と呼ぶ。ステップＳＴ３１において事例文が
取り出せた場合には、その事例文に、事例番号、フィー
ルド名、フィールド内で何番目の文であるかを示す文番
号を付与する。ステップＳＴ３２では、事例文が取り出
せたか否かを調べ、事例文が取り出せなかった場合に
は、そのフィールド内におけるすべての文の取り出しが
終了したものとし、ステップＳＴ３５で文書フィールド
の終了タグを確認して、非定型フィールド処理を終了
し、図４のステップＳＴ２７に戻る。In the example of FIG. 6, "sound from air conditioner" is first extracted as a sentence from the <question> field. The sentence extracted here is called a “case sentence” constituting the case data. If a case sentence can be extracted in step ST31, the case sentence is assigned a case number, a field name, and a sentence number indicating the number of the sentence in the field. In step ST32, it is checked whether or not a case sentence can be taken out. If the case sentence cannot be taken out, it is assumed that all the sentences in the field have been taken out, and the end tag of the document field is confirmed in step ST35. Then, the atypical field processing ends, and the process returns to step ST27 of FIG.

【００４１】ステップＳＴ３２において事例文が抽出さ
れた場合には、ステップＳＴ３３において、その事例文
に対して当該文のタイプを表わすラベルである「文タ
グ」を付与し、ステップＳＴ３４で事例データベース１
７に登録する。ステップＳＴ３３では、文タグを付与す
るための情報として、図９に示すような文タグ一覧表を
参照する。文タグ一覧表には、フィールド毎に文中のキ
ーとなる表現（条件部）と対応する文タグが予め定義さ
れている。If a case sentence is extracted in step ST32, a "sentence tag", which is a label indicating the type of the sentence, is added to the case sentence in step ST33.
Register to 7. In step ST33, a sentence tag list as shown in FIG. 9 is referred to as information for adding a sentence tag. In the sentence tag list, a sentence tag corresponding to a key expression (condition part) in the sentence is defined in advance for each field.

【００４２】なお、図９では、説明を簡単にするため、
表層（文中に含まれる字面）の文字列を用いて文タグを
判定する場合を示しているが、条件部に文の構文パタン
や否定、推量などの様相表現等の指定ができるようにし
ておき、事例文に対して後述する文解析を行ってから文
タグ一覧表との対応を取ることによって、文タグを決定
してもよい。In FIG. 9, for simplicity of explanation,
In this example, the sentence tag is determined using the character string of the surface layer (the characters included in the sentence). However, it is necessary to allow the specification of the sentence pattern, negation, guesswork, etc. in the condition part. Alternatively, a sentence tag may be determined by performing a sentence analysis, which will be described later, on the case sentence and then taking correspondence with the sentence tag list.

【００４３】図７は文抽出ステップＳＴ１により図６に
示した電子化文書１１から生成された“事例”を示す説
明図である。事例は、図７に示すように、事例番号、＜
顧客名＞などの定型フィールドとその値、フィールド
名、フィールド内の文番号、文タグ、文タグ毎文番号、
事例文とからなる非定型フィールドから構成される。FIG. 7 is an explanatory diagram showing "cases" generated from the digitized document 11 shown in FIG. 6 by the sentence extraction step ST1. The case is, as shown in FIG.
Fixed fields such as Customer name> and their values, field names, statement numbers in the fields, statement tags, statement numbers for each statement tag,
It consists of an atypical field consisting of a case sentence.

【００４４】図７では、ステップＳＴ３１における文切
り出し処理、ステップＳＴ３３における文タグ付与処理
がすべて成功した場合を示しているが、これらの処理は
文の区切りや文タグ付与条件の曖昧さのために誤ること
がある。その時には、文抽出部１２に編集・表示機能を
設け、図７のような形で事例をユーザに表示して、入力
を受け付ける構成とすることにより、人手による編集・
修正可能とする。FIG. 7 shows a case where the sentence extraction process in step ST31 and the sentence tag attaching process in step ST33 are all successful. However, these processes are performed due to sentence delimitation and ambiguity of sentence tag attaching conditions. There are mistakes. At that time, the sentence extraction unit 12 is provided with an editing / display function, and the case is displayed to the user in a form as shown in FIG.
Can be modified.

【００４５】次に、図２の文解析処理（ステップＳＴ
２）において、文解析部１３が処理対象とする文の集合
の各文に対して形態素解析と構文解析を実行し、構文構
造を生成する。図１０は文解析処理（ステップＳＴ２）
の具体的内容を示すフローチャートである。Next, the sentence analysis processing of FIG. 2 (step ST
In 2), the sentence analysis unit 13 performs a morphological analysis and a syntax analysis on each sentence of a set of sentences to be processed, and generates a syntax structure. FIG. 10 shows a sentence analysis process (step ST2).
3 is a flowchart showing the specific contents of FIG.

【００４６】ステップＳＴ４１における形態素解析処理
は解析用単語辞書２４を参照し、ステップＳＴ４２にお
ける構文解析処理はオントロジ１８を参照する。また、
解析用単語辞書２４をオントロジ１８に含めておき、形
態素解析処理もオントロジ１８を参照するようにしても
よい。ステップＳＴ４２の構文解析処理が生成した構文
構造（係り受け構造）は、ステップＳＴ４３において、
図１７に示すように事例文と対応づけて事例データベー
ス１７に格納する。The morphological analysis in step ST41 refers to the analysis word dictionary 24, and the syntax analysis in step ST42 refers to the ontology 18. Also,
The analysis word dictionary 24 may be included in the ontology 18, and the morphological analysis process may refer to the ontology 18. In step ST43, the syntax structure (dependency structure) generated by the syntax analysis process in step ST42 is
As shown in FIG. 17, it is stored in the case database 17 in association with the case sentence.

【００４７】図１１は形態素解析処理（ステップＳＴ４
１）の具体的内容を示すフローチャートである。ステッ
プＳＴ５１の形態素解析処理では、解析用単語辞書２
４、付属語辞書２５及び付属語接続表２６を参照して、
事例文（例えば、“ＲＣでタイマが入らない”）を形態
素の列に分割する。形態素解析の方法、付属語辞書２５
及び付属語接続表２６については、多くの文献に詳述さ
れているので説明を省略する。なお、以下の説明では、
図１１の形態素解析結果のように、形態素の区切りを
「／」で示して略記することとする。FIG. 11 shows a morphological analysis process (step ST4).
It is a flowchart which shows the specific content of 1). In the morphological analysis process of step ST51, the analysis word dictionary 2
4. Referring to the auxiliary word dictionary 25 and the auxiliary word connection table 26,
The case sentence (for example, “timer does not enter in RC”) is divided into morpheme columns. Morphological analysis method, auxiliary word dictionary 25
The attached word connection table 26 is described in detail in many documents, and a description thereof will be omitted. In the following description,
As in the morphological analysis result of FIG. 11, morpheme delimiters are abbreviated with "/".

【００４８】形態素解析結果は、自立語部分から、解析
用単語辞書２４に記述されている見出し情報、品詞情
報、単語の種別を表す概念情報である「意味シンボル」
が参照可能となるよう構成する。このとき、辞書を参照
するためのポインタ情報を保持してもよいし、辞書が二
次記憶に存在するなど参照に時間を要する場合、一次記
憶上に上記情報をコピーしてもよい。なお、以下の説明
では、図１１における意味シンボルのように単語を＜＞
でくくって、意味シンボルを表わすものとする。The result of the morphological analysis is, from the independent word part, heading information, part-of-speech information described in the analysis word dictionary 24, and “semantic symbols” which are concept information indicating the type of word.
Is configured to be able to be referred to. At this time, pointer information for referring to the dictionary may be held, or when it takes time to refer, for example, when the dictionary exists in the secondary storage, the information may be copied to the primary storage. Note that, in the following description, words such as <>
And represent a semantic symbol.

【００４９】図１２は構文解析処理（ステップＳＴ４
２）の具体的内容を示すフローチャートである。ステッ
プＳＴ６１の文節構造生成処理では、図１１の形態素解
析出力結果を入力とし、係り受け解析を行う基本単位で
ある文節構造を生成する。文節構造は、最低１つの自立
語形態素と、その自立語形態素に連なる０個以上の付属
語形態素から構成される。図１９は文節構造の一例を示
す説明図である。図１９における文節構造は、係り属
性、受け属性、自立語情報、付属語情報から構成されて
いる。自立語情報は、当該文節を構成する自立語形態素
情報へのポインタであり、付属語情報は、０個以上の複
数の付属語形態素情報へのポインタ配列である。FIG. 12 shows a syntax analysis process (step ST4).
It is a flowchart which shows the specific content of 2). In the phrase structure generation processing of step ST61, the morphological analysis output result of FIG. 11 is input, and a phrase structure that is a basic unit for performing dependency analysis is generated. The phrase structure is composed of at least one independent word morpheme and 0 or more adjunct word morphemes connected to the independent word morpheme. FIG. 19 is an explanatory diagram showing an example of a phrase structure. The phrase structure in FIG. 19 includes a dependency attribute, a receiving attribute, independent word information, and attached word information. The independent word information is a pointer to the independent word morpheme information forming the phrase, and the attached word information is an array of pointers to zero or more attached word morpheme information.

【００５０】ステップＳＴ６２の係り受け解析処理で
は、文法規則２７にしたがって文節構造の係り受け解析
を実行し、係り受け構造を構文解析結果として生成す
る。係り受け解析の方法については、多くの文献等に解
説されているので、ここでは詳細な説明を省略する。な
お、一般に、係り受け解析の際には多数の曖昧性を生ず
るが、その曖昧性を解消するためにオントロジ１８を適
宜参照することとする。In the dependency analysis processing in step ST62, the dependency analysis of the phrase structure is executed in accordance with the grammar rule 27, and the dependency structure is generated as a result of the syntax analysis. Since the dependency analysis method is described in many documents and the like, a detailed description is omitted here. In general, a large number of ambiguities are generated during the dependency analysis, but the ontology 18 is appropriately referred to in order to resolve the ambiguity.

【００５１】図２における文解析処理（ステップＳＴ
２）が終了すると、次に、ステップＳＴ３において、キ
ーワードクラスタリング部１５と詳細クラスタリング部
１６を呼び出しながら処理対象文の構文解析結果の集合
を類似した文毎に分類（クラスタリング）し、図２０に
示すような類似文の集合を含むクラスタからなる問題解
決木を生成する。その問題解決木の各クラスタが類似し
た事例文の集合を持ち、各事例文を図１７に示すような
形式で事例データベース１７に蓄積する。The sentence analysis processing in FIG. 2 (step ST
When 2) is completed, next, in step ST3, a set of syntax analysis results of the processing target sentence is classified for each similar sentence while calling the keyword clustering unit 15 and the detailed clustering unit 16 (clustering), and shown in FIG. A problem solving tree composed of clusters including a set of similar sentences is generated. Each cluster of the problem solving tree has a set of similar case sentences, and each case sentence is stored in the case database 17 in a format as shown in FIG.

【００５２】図１３は類似事例分類処理（ステップＳＴ
３）の具体的内容を示すフローチャートである。類似事
例分類処理（ステップＳＴ３）は、キーワードクラスタ
リング処理（ステップＳＴ７１）と、詳細クラスタリン
グ処理（ステップＳＴ７２）から構成されている。各ク
ラスタリング処理は、必要に応じてオントロジ１８を参
照し、分類（クラスタリング）結果を事例データベース
１７に格納する。FIG. 13 shows a similar case classification process (step ST).
It is a flowchart which shows the specific content of 3). The similar case classification process (step ST3) includes a keyword clustering process (step ST71) and a detailed clustering process (step ST72). Each clustering process refers to the ontology 18 as needed, and stores the classification (clustering) result in the case database 17.

【００５３】図１４はキーワードクラスタリング処理
（ステップＳＴ７１）の具体的内容を示すフローチャー
ト、図１５は図１４の再帰関数呼び出し処理の具体的内
容を示すフローチャートである。図１４及び図１５の説
明に先立ち、図１４及び図１５で使用する記号及び数式
について説明する。FIG. 14 is a flowchart showing the specific contents of the keyword clustering process (step ST71), and FIG. 15 is a flowchart showing the specific contents of the recursive function calling process of FIG. Prior to the description of FIGS. 14 and 15, symbols and mathematical expressions used in FIGS. 14 and 15 will be described.

【００５４】例えば、Ｗ_ｘ，Ｗ_ｙ，…，Ｗ_ｚは事例中に
出現する単語の中から選択したキーワードを表すものと
し、Ｊ（Ｗ_ｘ∧Ｗ_ｙ∧…∧Ｗ_ｚ）はキーワードＷ_ｘ，Ｗ
_ｙ，…，Ｗ_ｚが同時に出現する事例番号のリストを表す
ものとする。また、事例番号のリストＲの要素数を｜Ｊ
（Ｗ_ｘ∧Ｗ_ｙ∧…∧Ｗ_ｚ）｜で表して、キーワード
Ｗ_ｘ，Ｗ_ｙ，…，Ｗ_ｚの結合の強さを表す相互情報量Ｉ
（Ｗ_ｘ，Ｗ_ｙ，…，Ｗ_ｚ）を下式で定義する。For example, W _x , W _y ,..., W _z represent keywords selected from words appearing in the case, and J (W _x ∧W _y ∧... ∧W _z ) is the keyword W _x , W
_{y, ...,} W _z is assumed to represent a list of case number that appears at the same time. Also, the number of elements in the case number list R is | J
_{_{(W x ∧W y ∧ ... ∧W}} z) | represented by the keyword _W _x, W y, ..., mutual information indicating the strength of the binding of _{W z} I
(W _x , W _y ,..., W _z ) is defined by the following equation.

【００５５】[0055]

【数１】 (Equation 1)

【００５６】また、ｋｗｎｕｍ１は最初のキーワードの
候補数、ｋｗｎｕｍ２は２番目以降のキーワードの候補
数、ｍｉｎＪｉｒｅｉはキーワードの集合によるクラス
タリングを停止させるための停止条件としての最小事例
数、ｕｓｅｄＫＷは一旦使用されたキーワードを登録す
る変数、ｓｃｏｒｅＴＨはキーワードの集合の結合度が
高いとしてクラスタ（類似した文の集まり）を構成する
相互情報量の閾値（クラスタ成立条件）である。Also, kwnum1 is the number of candidates for the first keyword, kwnum2 is the number of candidates for the second and subsequent keywords, minJirei is the minimum number of cases as a stop condition for stopping clustering by a set of keywords, and usedKW is used once. The variable for registering the keyword, scoreTH, is a threshold (cluster establishment condition) of mutual information constituting a cluster (a group of similar sentences) on the assumption that the degree of association of the set of keywords is high.

【００５７】まず、ステップＳＴ８１において、事例デ
ータベース１７を参照して、事例文の解析結果からキー
ワードを抽出する。キーワードは単語の出現頻度が設定
値より高い単語を抽出するようにしてもよいが、オント
ロジ１８に保有されている単語を抽出するようにしても
よい。例えば、＜機種＞という意味シンボルがオントロ
ジ１８に保有されている場合には、出現頻度が低くても
キーワードにすることができる。First, in step ST81, a keyword is extracted from the analysis result of the case sentence with reference to the case database 17. As the keyword, a word whose frequency of appearance of the word is higher than a set value may be extracted, or a word held in the ontology 18 may be extracted. For example, when a meaning symbol of <model> is held in the ontology 18, even if the appearance frequency is low, it can be used as a keyword.

【００５８】次に、ステップＳＴ８２において、キーワ
ードと事例番号との対応表を作成する。その対応表で
は、出現頻度が高い順にキーワードがソートされてい
る。ステップＳＴ８３では、一旦使用されたキーワード
を登録するための変数であるｕｓｅｄＫＷに空集合を設
定して初期化する。また、ステップＳＴ８４では、１個
目の種類のキーワードの個数を示すカウンタｉを“１”
に設定する。Next, in step ST82, a correspondence table between keywords and case numbers is created. In the correspondence table, keywords are sorted in descending order of appearance frequency. In step ST83, an empty set is set to usedKW, which is a variable for registering the keyword once used, and the keyword is initialized. In step ST84, the counter i indicating the number of keywords of the first type is set to "1".
Set to.

【００５９】ステップＳＴ８５において、ｉ≦ｋｗｎｕ
ｍ１（１個目の種類のキーワードの最大数）が成立すれ
ば、ステップＳＴ８６に進み、成立しなければ、ステッ
プＳＴ９０に進む。ステップＳＴ８６において、キーワ
ードＷ_ｉがｕｓｅｄＫＷの要素でなければステップＳＴ
８７に進み、要素であればステップＳＴ８９に進む。In step ST85, i ≦ kwnu
If m1 (the maximum number of keywords of the first type) is satisfied, the process proceeds to step ST86; otherwise, the process proceeds to step ST90. In step ST86, step ST if it is not element of the keyword _{W i} is usedKW
The process proceeds to step ST89 if it is an element.

【００６０】ステップＳＴ８７では、ｕｓｅｄＫＷの要
素にキーワードＷ_ｉを登録する。ステップＳＴ８８で
は、パラメータとして、キーワードの集合の個数Ｎ＝１
とキーワード集合｛Ｗ_ｉ｝を渡して、キーワードクラス
タリングの副処理である再帰関数ｃｌｓｔＫＷｓｕｂを
呼び出す処理を実行する（処理の詳細は後述する）。ク
ラスタリング副処理が終了すると、ステップＳＴ８９に
おいて、カウンタｉの値を増分して、ステップＳＴ８５
から処理を繰り返し実行する。ステップＳＴ９０では、
キーワードクラスタリング結果であるキーワード集合に
よる問題解決木（木構造の情報）を事例データベース１
７に格納する。[0060] In step ST87, to register the keyword _{W i} to the elements of the usedKW. In step ST88, the number of keyword sets N = 1 as a parameter
Then, a process of calling a recursive function clstKWsub, which is a sub-process of keyword clustering, is passed by passing the keyword set {W _i } (details of the process will be described later). When the clustering sub-process is completed, in step ST89, the value of the counter i is incremented, and the process proceeds to step ST85.
Repeat the process from. In step ST90,
A problem solving tree (tree structure information) based on a keyword set as a result of keyword clustering is stored in the case database
7 is stored.

【００６１】以下、図１５を参照して、キーワードクラ
スタリング副処理である再帰関数呼び出し処理（ステッ
プＳＴ８８）の処理内容を具体的に説明する。まず、ス
テップＳＴ９１において、カウンタｊを“１”に設定す
る。ここでは、jの最大値をｋｗｎｕｍ２として説明す
る。ｋｗｎｕｍ２の値は、事例全体の大きさや所望の処
理時間など、クラスタの作成状況に応じて変更すること
ができるものとする。ｋｗｎｕｍ１の値も同様である。Referring to FIG. 15, the details of the recursive function calling process (step ST88), which is the keyword clustering sub-process, will be specifically described. First, in step ST91, the counter j is set to "1". Here, a description will be given assuming that the maximum value of j is kwnum2. It is assumed that the value of kwnum2 can be changed according to the cluster creation status, such as the size of the entire case and the desired processing time. The same applies to the value of kwnum1.

【００６２】ステップＳＴ９２において、ｊ≦ｋｗｎｕ
ｍ２が成立すれば、ステップＳＴ９３に進み、成立しな
ければキーワードクラスタリング副処理を終了する。ス
テップＳＴ９３において、キーワードＷ_jがｕｓｅｄＫ
Ｗの要素でなければ、ステップＳＴ９４に進み、要素で
あればステップＳＴ９８に進む。ｊ＝１のときは、Ｗ_１
はｕｓｅｄＫＷの要素であるためステップＳＴ９８に進
むが、ｊ＝２になると、Ｗ_２はｕｓｅｄＫＷの要素でな
いため、ステップＳＴ９４に進む。In step ST92, j ≦ kwnu
If m2 is satisfied, the process proceeds to step ST93, and if not, the keyword clustering sub-process ends. In step ST93, the keyword W _j is usedK
If it is not an element of W, the process proceeds to step ST94, and if it is an element, the process proceeds to step ST98. When j = 1, W ₁
Then proceeds to step ST98 because it is an element of UsedKW, it becomes a j = 2, since _{W 2} is not an element of UsedKW, the process proceeds to step ST94.

【００６３】続いて、ステップＳＴ９４において、条件
｜Ｊ（Ｗ_ｉ１∧Ｗ_ｉ２∧…∧Ｗ_ｉＮ）｜≧ｍｉｎＪｉｒ
ｅｉ（事例数がｍｉｎＪｉｒｅｉ以下になれば、これ以
上細分化しないという条件）、かつ、Ｉ（Ｗ_ｉ１，Ｗ
_ｉ２，…，Ｗ_ｉＮ，Ｗ_ｊ）≧ｓｃｏｒｅＴＨ（相互情報
量の閾値）が成立すれば、ステップＳＴ９５に進み、成
立しなければステップＳＴ９８に進む。例えば、図１４
のステップＳＴ８７でｉ＝１の場合に、ｊ＝２の場合
は、Ｗ_ｉ１はＷ_１と一致し、Ｗ_ｉ２〜Ｗ_ｉＮは存在しな
い。[0063] Subsequently, in step ST94, the condition _{_{| J (W i1 ∧W i2 ∧}} ... ∧W iN) | ≧ minJir
ei (if the number of cases falls below minJirei, no further subdivision), and I (W _i1 , W
_i2, _..., W iN, if _{W j)} ≧ _scoreTH (mutual information threshold) is satisfied, the process proceeds to step ST95, does not hold the process proceeds to step ST98. For example, FIG.
If in step ST87 of i = 1 in the case of j = _{2, W i1} coincides with _{_W _1, W} i2 _{~W iN} is absent.

【００６４】ステップＳＴ９５では、結果格納配列ｋｗ
ｐ［Ｎ＋１］に対してキーワード集合｛Ｗ_ｉ１，
Ｗ_ｉ２，…，Ｗ_ｉＮ，Ｗ_ｊ}を結合度が高いキーワード
の組として登録する。例えば、ｊ＝２では、｛Ｗ_１，Ｗ
_２｝を結合度が高いキーワードの組として登録する。続
いて、ステップＳＴ９６では、ｕｓｅｄＫＷにキーワー
ドＷ_ｊを追加する。更に、ステップＳＴ９７では、Ｎ＋
１とキーワード集合｛Ｗ_ｉ１，Ｗ_ｉ２，…，Ｗ_ｉＮ，Ｗ
_ｊ}をパラメータとして、キーワードクラスタリング副
処理ｃｌｓｔＫＷｓｕｂを再帰的に呼び出す。ステップ
ＳＴ９８では、カウンタｊの値を１増分して、ステップ
ＳＴ９２から同様の処理を繰り返し実行する。ステップ
ＳＴ９２において、ｊ＞ｋｗｎｕｍ２のとき、キーワー
ドクラスタリング副処理ｃｌｓｔＫＷｓｕｂを終了す
る。In step ST95, the result storage array kw
For p [N + 1], the keyword set {W _i1 ,
_{_{W i2, ..., W iN,}} W j} the degree of coupling is registered as a high set of keywords. For example, when j = 2, ｛W ₁ , W
₂ ) is registered as a set of keywords having a high degree of connection. Subsequently, in step ST96, to add a keyword _{W j} to usedKW. Further, in step ST97, N +
1 and the keyword set { _Wi1 , _Wi2 , ..., _WiN , W
_The keyword clustering sub-process clstKWsub is recursively called using _j } as a parameter. In step ST98, the value of the counter j is incremented by one, and the same processing is repeatedly executed from step ST92. In step ST92, if j> kwnum2, the keyword clustering sub-process clstKWsub ends.

【００６５】図１６はキーワードクラスタリング処理
（ステップＳＴ７１）によって生成された階層的な構造
解決木の一例を示す説明図である。図１６において、
「ＡＭｉＴＹ」という単語は、オントロジ１８中に＜機
種＞という意味シンボルが保有されていたためにキーワ
ードとして採用された単語である。FIG. 16 is an explanatory diagram showing an example of a hierarchical structure solution tree generated by the keyword clustering process (step ST71). In FIG.
The word “AMiTY” is a word adopted as a keyword because the ontology 18 has a semantic symbol of <model>.

【００６６】なお、図１４及び図１５では、一度使用さ
れたキーワードをｕｓｅｄＫＷに登録して再利用しない
ものについて示したが（ステップＳＴ８３でｕｓｅｄＫ
Ｗを空集合に設定している）、再利用可能なキーワード
を予め設定し、当該キーワードに対しては再度利用する
ことを許すことも可能である。図１６において、「電
源」というキーワードが２個所に出てきているが、再利
用可能な指定がされていたものである。In FIGS. 14 and 15, the keyword used once is registered in the usedKW and is not reused (in step ST83, the usedKW is used).
W is set to an empty set), a reusable keyword can be set in advance, and the keyword can be used again. In FIG. 16, the keyword “power” appears in two places, but the reusable designation has been made.

【００６７】図１８は詳細クラスタリング処理（ステッ
プＳＴ７２）の具体的内容を示すフローチャートであ
る。まず、ステップＳＴ１０１において、キーワードク
ラスタリング処理（ステップＳＴ７１）によって生成さ
れた階層的な問題解決木のリーフとなるクラスタの数Ｎ
を求める。FIG. 18 is a flowchart showing the specific contents of the detailed clustering process (step ST72). First, in step ST101, the number N of clusters serving as leaves of the hierarchical problem solving tree generated by the keyword clustering process (step ST71)
Ask for.

【００６８】ステップＳＴ１０２において、繰り返しカ
ウンタｉを“０”に初期化する。ステップＳＴ１０３で
は、カウンタｉをリーフクラスタ数Ｎと比較し、カウン
タｉがリーフクラスタ数Ｎより小さければ、ステップＳ
Ｔ１０４に進み、小さくなければステップＳＴ１０６に
進む。In step ST102, a repetition counter i is initialized to "0". In step ST103, the counter i is compared with the number N of leaf clusters.
The process proceeds to T104, and if not smaller, proceeds to step ST106.

【００６９】ステップＳＴ１０４では、該当するリーフ
クラスタｉに含まれる事例文の集合に対して、詳細クラ
スタリング副処理を適用し、事例文の集合を細分化す
る。この副処理では、例えば、最も類似している２つの
文の対を１つのクラスタに順にまとめていく。見つかっ
た２つの対が、ある閾値以下になったときにまとめ上げ
を終了する。このとき、予め設定した数の階層を作るよ
うにしてもよい。なお、ここでは、簡単のためリーフク
ラスタを詳細クラスタリングするように説明したが、キ
ーワード分類処理中に、例えば、用言について適宜クラ
スタリングを呼び出す構成を採ってもよい。ステップＳ
Ｔ１０５ではカウンタｉを１増やし、ステップＳＴ１０
３から同様の処理を繰り返し実行する。ステップＳＴ１
０６では、詳細に分類された事例文のクラスタ情報を事
例データベース１７に格納する。In step ST104, the detailed clustering sub-process is applied to the set of case sentences included in the corresponding leaf cluster i, and the set of case sentences is subdivided. In this sub-process, for example, pairs of two sentences that are most similar are sequentially grouped into one cluster. When the two found pairs are equal to or smaller than a certain threshold, the grouping ends. At this time, a predetermined number of hierarchies may be created. Although the detailed clustering of the leaf clusters has been described here for simplicity, a configuration may be adopted in which, for example, clustering is appropriately called for a word during the keyword classification process. Step S
At T105, the counter i is incremented by one, and at step ST10
The same processing is repeatedly executed from step 3. Step ST1
At 06, the cluster information of the case sentences classified in detail is stored in the case database 17.

【００７０】次に、詳細クラスタリング副処理（ステッ
プＳＴ１０４）における類似度計算を図２４を用いて説
明する。オントロジ１８を用いた詳細クラスタリング処
理（ステップＳＴ７２）における文の類似度の計算は、
構文構造のノードの対応付けや、ノード内の属性による
重み付け(否定、推量などの様相表現を計算対象にする
処理)をする類似文照合によって行う。Next, the similarity calculation in the detailed clustering sub-process (step ST104) will be described with reference to FIG. Calculation of sentence similarity in the detailed clustering process (step ST72) using the ontology 18 is as follows:
This is performed by similar sentence matching in which nodes in the syntax structure are associated with each other and weighted by attributes in the nodes (a process of calculating a modal expression such as negation or guesswork).

【００７１】類似文照合における自然言語の文同士の類
似度計算関数Ｓｉｍ（Ａ，Ｂ，Ｄ）において、引数の
Ａ，Ｂは、構文解析結果である係り受け構造であり、図
２４に示したような木構造であるとする。引数のＤは、
類似度計算を実行する際の照合の詳細度であり、図２４
に示した２つの木構造（ａ）と（ｂ）の類似度計算の際
に、ルートノードから何階層目までを処理対象とするか
を示す値である。ここでは簡単のため、Ｄ＝２として説
明する。In the similarity calculation function Sim (A, B, D) for similarity between natural language sentences in similar sentence matching, arguments A and B are dependency structures which are the results of syntax analysis, and are shown in FIG. Suppose that it has such a tree structure. The argument D is
FIG. 24 is a detail level of the collation when the similarity calculation is performed.
In the calculation of the degree of similarity between the two tree structures (a) and (b) shown in FIG. 7, the value indicates the number of hierarchies from the root node to be processed. Here, for the sake of simplicity, description will be made assuming that D = 2.

【００７２】最初に初期類似度１．０を与える。類似度
１．０は、入力された２つの文が、全く同じ意味を表わ
すということを意味する。以下の処理では、木構造を辿
りながら各ノードの情報を比較し、異なる部分に「ペナ
ルティ」を与えて、１．０から減じ、類似度が“０”あ
るいは所定の値になった時点で、比較対象は類似してい
ないとみなして類似度計算を停止する。図２６は類似度
計算におけるペナルティ計算規則の一例を示す説明図で
ある。First, an initial similarity of 1.0 is given. A similarity of 1.0 means that the two input sentences have exactly the same meaning. In the following processing, the information of each node is compared while tracing the tree structure, a “penalty” is given to a different part, the difference is reduced from 1.0, and when the similarity becomes “0” or a predetermined value, The comparison target is regarded as dissimilar and the similarity calculation is stopped. FIG. 26 is an explanatory diagram showing an example of a penalty calculation rule in similarity calculation.

【００７３】まず、第一レベルのノード間の比較をす
る。ここでは、意味シンボルが＜検出動作＞で等しく、
表層の単語が異なるので図２６の規則０１にしたがっ
て、ペナルティ値−０．０１を与える。次に、第二レベ
ルの比較を行う。このとき、左側のノード（センサ＜モ
ニタ装置＞）については、情報が完全に一致するのでペ
ナルティを与えない。右側のノードは意味シンボルが異
なるので図２６の規則０４にしたがってペナルティ値−
０．３を与える。このようにして、類似度は０．６９と
して計算される。First, comparison between first level nodes is performed. Here, the semantic symbols are equal in <detection operation>,
Since the words on the surface layer are different, a penalty value of −0.01 is given according to rule 01 in FIG. Next, a second level comparison is performed. At this time, no penalty is given to the left node (sensor <monitor device>) because the information completely matches. The node on the right has a different semantic symbol, so the penalty value −
Give 0.3. In this way, the similarity is calculated as 0.69.

【００７４】仮に、上記引数のＤが“１”として与えら
れていれば、類似度計算は第一レベルのみとなり、類似
度は０．９９として計算される。このように、Ｄの値に
より、類似度計算の精度を制御できるので、検索状況に
応じた柔軟な処理が可能である。If the argument D is given as "1", the similarity is calculated only at the first level, and the similarity is calculated as 0.99. As described above, since the accuracy of the similarity calculation can be controlled by the value of D, flexible processing according to the search situation is possible.

【００７５】図２４（ａ）の構造に対して図２６の規則
０２を適用すると、図２３に示すＩＳ−Ａ（上位−下
位）知識３１によって「モニタ装置が目詰まりを認識す
る」という文との類似度は０．９となる。一方、図２６
に示す規則０５を文「ＲＣでタイマが入らない」の係り
受け構造に適用すると、構文解析結果のノードの属性と
して否定を含むため、「ＲＣでタイマが入る」との類似
度は０．１となり、類似していないことになる。同様
に、文中に「〜だろう」のような推量表現があり、構文
解析結果のノードの属性に推量を含む場合には、規則０
６によりペナルティとして−０．１５を減じた類似度が
計算される。When the rule 02 of FIG. 26 is applied to the structure of FIG. 24A, the sentence "monitor recognizes clogging" by IS-A (upper-lower) knowledge 31 shown in FIG. Is 0.9. On the other hand, FIG.
Is applied to the dependency structure of the sentence "timer does not enter by RC", the similarity with "timer enters by RC" is 0.1 because the syntax analysis result includes negation as a node attribute. And are not similar. Similarly, if a sentence includes a guess expression such as “-Would be” and the attribute of the node of the parse result includes guesswork, the rule 0
6, the similarity obtained by subtracting -0.15 as a penalty is calculated.

【００７６】また、上記と同様にして「タイマボタンの
入力を受け付けない」という文に対して図２６の規則０
３を適用すると、図２３に示すＨＡＳ−Ａ（部分−全
体）知識３２を用いて、「リモコンの入力を受け付けな
い」という文との類似度が高くなる。In the same manner as described above, the rule “input of the timer button is not accepted” is applied to the rule 0 in FIG.
Applying No. 3 increases the similarity with the sentence “Remote control input is not accepted” using the HAS-A (partial-whole) knowledge 32 shown in FIG.

【００７７】オントロジ１８のＨＡＳ−Ａ知識３２に
は、背反な情報を記述することができる。図２３の＜タ
イマ表示＞に関するＨＡＳ−Ａ知識３２の例では、背反
な情報３５は、＜タイマ表示＞には「なし」、「入タイ
マ」、「切タイマ」という状態があるが、「なし」と、
「入タイマ」あるいは「切タイマ」は同時には表示され
ないので背反であることを示している。この知識によっ
て、事例文中に「タイマ表示はなしである。」という文
と「タイマ表示に入タイマがついている。」という文が
あるとき、クラスタ間情報として事例データベース１７
にそれぞれの文を含むクラスタが背反であるという情報
を格納しておくために用いることができる。The HAS-A knowledge 32 of the ontology 18 can describe conflicting information. In the example of the HAS-A knowledge 32 relating to <timer display> in FIG. 23, the conflicting information 35 indicates that <timer display> has states of “none”, “on timer”, and “off timer”, but “none”. "When,
The "ON timer" or the "OFF timer" are not displayed at the same time, indicating that they are contrary. With this knowledge, when there is a sentence "No timer display" and a sentence "Timer display has an input timer" in the case sentences, the case database 17 as inter-cluster information.
Can be used to store information that the cluster containing each sentence is conflicting.

【００７８】オントロジ１８の言い換え知識３４には、
同じ意味になる言葉の表現を記述することができる。図
２３の言い換え知識３４は、「＜ブレーカ＞が飛ぶ」と
「＜ブレーカ＞が落ちる」とが同じ意味になるという言
い換え知識が記述されており、類似度照合時に当該知識
を参照することによって、「ブレーカが飛ぶ」と「ブレ
ーカが落ちる」という文を類似度１として扱うことがで
きる。The paraphrase knowledge 34 of the ontology 18 includes:
You can write expressions that have the same meaning. The paraphrase knowledge 34 of FIG. 23 describes paraphrase knowledge that “<breaker> flies” and “<breaker> falls” have the same meaning, and by referring to the knowledge at the time of similarity matching, The sentence "breaker flies" and "breaker falls" can be treated as similarity 1.

【００７９】オントロジ１８の格関係知識３３には、自
然言語表現の体言と用言等の間の格関係を記述すること
ができる。例えば、解析用単語辞書２４において、単語
「接続する」と「つなぐ」が＜接続動作＞という意味シ
ンボルを持ち、単語「排水パイプ」と「排水ホース」が
＜排水パイプ＞という意味シンボルを持ち、単語「室外
装置」が＜室外器＞という意味シンボルを持つとき、格
関係知識３３を参照して「排水パイプを室外装置に接続
する」という文と、「室外装置に排水ホースをつなぐ」
という文を類似しているとして判定することができる。The case relation knowledge 33 of the ontology 18 can describe the case relation between the body language and the declinable language of the natural language expression. For example, in the analysis word dictionary 24, the words "connect" and "connect" have a meaning symbol of <connection operation>, and the words "drain pipe" and "drain hose" have a meaning symbol of <drain pipe>, When the word “outdoor unit” has the meaning symbol “<outdoor unit>”, referring to the case knowledge 33, the sentence “Connect the drain pipe to the outdoor unit” and “connect the drain hose to the outdoor unit”
Can be determined as similar.

【００８０】図２０は図１６のキーワードクラスタリン
グ結果による問題解決木の一部と、図１９に例示した文
節構造の付属語情報を用いて、詳細クラスタリング処理
（ステップＳＴ７２）が分類して作成した問題解決木を
示す説明図である。FIG. 20 shows a problem created by performing a detailed clustering process (step ST72) using a part of the problem solving tree based on the result of the keyword clustering of FIG. 16 and the accessory word information of the phrase structure illustrated in FIG. It is explanatory drawing which shows a solution tree.

【００８１】図２において、類似事例分類処理（ステッ
プＳＴ３）が終了すると、次に、事例クラスタ編集処理
（ステップＳＴ４）では、事例データベース１７に格納
された事例データのクラスタの階層を図２０に示す問題
解決木や、図２７に示す問題解決木のような形で表示す
る。図２７は図６に例示した電子化文書１１の集合から
作成された問題解決木の例を示している。図２７は文抽
出処理（ステップＳＴ１）で抽出された事例文のうち、
＜質問＞フィールドから文タグとして［症状］を持つ
１、２、３番目に出現した文に対して、類似事例分類を
行って作成した問題解決木の例である。この問題解決木
は、図８に示したカテゴリの値毎に作成するものとす
る。In FIG. 2, when the similar case classification process (step ST3) is completed, next, in the case cluster editing process (step ST4), the hierarchy of the clusters of the case data stored in the case database 17 is shown in FIG. It is displayed in the form of a problem solving tree or the problem solving tree shown in FIG. FIG. 27 shows an example of a problem solving tree created from the set of digitized documents 11 shown in FIG. FIG. 27 shows an example of the case sentences extracted in the sentence extraction process (step ST1).
This is an example of a problem solving tree created by performing similar case classification on the first, second, and third sentences having [symptom] as a sentence tag from the <question> field. This problem solving tree is created for each value of the category shown in FIG.

【００８２】図２７において、各クラスタの表示は、そ
のクラスタに関連付けられている事例（文）の数、検索
時に利用する確認チェック欄、表示のために設定したク
ラスタの表示ラベルからなる。そのクラスタが問題解決
木の終端に位置する場合には、終端であることが分かる
ような終端表示を付与する。このとき、事例クラスタ編
集部１９での表示はクラスタが含む事例文の数の多い順
に上から下に表示する。クラスタは類似した事例文の集
合からなり、クラスタの表示ラベルを指定してマウス等
の操作でクラスタ内の情報が参照できるものとする。事
例クラスタ編集部１９では、ユーザ操作によりクラスタ
の表示ラベルの文字列の設定・編集が行えるものとす
る。In FIG. 27, the display of each cluster is composed of the number of cases (sentences) associated with the cluster, a check box used for searching, and a display label of the cluster set for display. When the cluster is located at the end of the problem solving tree, an end display is provided so that the cluster can be recognized as the end. At this time, the display in the case cluster editing unit 19 is displayed from top to bottom in the descending order of the number of case sentences included in the cluster. A cluster is composed of a set of similar case sentences, and information in the cluster can be referred to by designating a display label of the cluster and operating the mouse or the like. The case cluster editing unit 19 can set and edit the character string of the display label of the cluster by a user operation.

【００８３】図１７は事例データベース１７に格納され
た事例文データの一例を示す説明図である。文抽出処理
（ステップＳＴ１）の出力に、文解析処理（ステップＳ
Ｔ２）によって係り受け構造が追加された後、類似事例
分類処理（ステップＳＴ３）で事例文データが属するク
ラスタが決定され、該クラスタ番号が付与されて事例デ
ータベース１７に格納される。このとき、事例データベ
ース１７には図２７の問題解決木に示すようなクラスタ
間の階層を表わす情報も同時に格納される。FIG. 17 is an explanatory diagram showing an example of the case sentence data stored in the case database 17. The output of the sentence extraction process (step ST1) includes the sentence analysis process (step S1).
After the dependency structure is added by T2), the cluster to which the case sentence data belongs is determined in the similar case classification process (step ST3), the cluster number is assigned, and the cluster is stored in the case database 17. At this time, information representing the hierarchy between clusters as shown in the problem solving tree of FIG. 27 is also stored in the case database 17 at the same time.

【００８４】また、図２５は類似事例分類処理（ステッ
プＳＴ３）のステップＳＴ９３において階層関係以外の
クラスタ間情報を事例データベース１７に格納したもの
の一例を示す説明図である。クラスタ間情報はクラスタ
番号で表わされる２つのクラスタ間の関係を定義する。
関係タイプの例としては、−１（背反）と１（類似）と
がある。背反関係は、図２３のＨＡＳ−Ａ知識３２に例
示した背反な情報３５を参照して作成する。類似関係
は、各クラスタにそのクラスタを代表する代表文を設定
できるようにしておき、その代表文の間の類似度をクラ
スタ間情報として格納しておく。図２５は図２７の問題
解決木に示したクラスタ間の関係の例を示したものであ
る。FIG. 25 is an explanatory diagram showing an example in which the inter-cluster information other than the hierarchical relation is stored in the case database 17 in step ST93 of the similar case classification process (step ST3). The inter-cluster information defines a relationship between two clusters represented by a cluster number.
Examples of relation types include -1 (reciprocal) and 1 (similar). The conflicting relationship is created with reference to conflicting information 35 exemplified in the HAS-A knowledge 32 of FIG. In the similarity relation, a representative sentence representing the cluster can be set in each cluster, and the similarity between the representative sentences is stored as inter-cluster information. FIG. 25 shows an example of the relationship between clusters shown in the problem solving tree of FIG.

【００８５】図２１はこの発明の実施の形態１による言
語事例推論方法（事例検索処理）を示すフローチャート
である。以下、図２１を参照して事例の検索処理を説明
する。まず、ステップＳＴ１１１において、図1の検索
文入力部２１を用いて利用者が所望の文書を検索するた
めの検索文（新たな問題の記述）を入力する。このと
き、検索文入力部２１は、キーボードはもちろんのこ
と、文字認識装置または音声認識装置等でもよい。FIG. 21 is a flowchart showing a language case inference method (case search process) according to Embodiment 1 of the present invention. Hereinafter, the case search process will be described with reference to FIG. First, in step ST111, the user inputs a search sentence (new problem description) for searching for a desired document using the search sentence input unit 21 of FIG. At this time, the search sentence input unit 21 may be a character recognition device or a voice recognition device as well as a keyboard.

【００８６】次に、ステップＳＴ１１２において、検索
文入力部２１により入力された検索文の解析を行う。ス
テップＳＴ１１２の文解析処理は、構築処理のところで
説明したように、形態素解析処理（ステップＳＴ４１）
と構文解析処理（ステップＳＴ４２）の順で検索文を解
析し、検索文に対する係り受け構造を生成する。各解析
処理の詳細は、前記事例構築処理の場合と同様であるの
で、ここでは記述を割愛する。Next, in step ST112, the search sentence input by the search sentence input unit 21 is analyzed. The sentence analysis process in step ST112 is, as described in the construction process, a morphological analysis process (step ST41).
Then, the search sentence is analyzed in the order of the syntax analysis process (step ST42), and a dependency structure for the search sentence is generated. Since the details of each analysis process are the same as those in the case construction process, the description is omitted here.

【００８７】図２２は類似事例検索処理（ステップＳＴ
１１３）の具体的内容を示すフローチャートである。ま
ず、ステップＳＴ１２１において、一次検索結果の件数
Ｍを“０”に初期化し、ステップＳＴ１２２において、
カウンタｋを“１”に初期化する。FIG. 22 shows a similar case search process (step ST).
It is a flowchart which shows the specific content of 113). First, in step ST121, the number M of primary search results is initialized to “0”, and in step ST122,
The counter k is initialized to “1”.

【００８８】次に、ステップＳＴ１２３において、事例
データベース１７に対して事例文の一次検索を行う。即
ち、事例データベース１７の蓄積量が少量であれば、入
力検索文と全事例文の類似照合を行うことも考えられる
が、一般には大量の事例文が蓄積されているため、すべ
ての事例文との照合処理を行うと処理時間に長時間を要
する問題が生じる。Next, in step ST123, a primary search of a case sentence is performed on the case database 17. That is, if the storage amount of the case database 17 is small, it is conceivable to perform similarity matching between the input search sentence and all the case sentences. However, since a large amount of case sentences are generally stored, all the case sentences When the collation processing is performed, there is a problem that a long processing time is required.

【００８９】そこで、事例文の係り受け構造に対して、
その構造の構文要素が持つ自立語による索引（図１７を
参照）及びキーワードクラスタリング結果の単語の集合
による索引を設けておき、この索引を用いて一次検索処
理を実施し、類似文照合の対象とする事例文を予め絞り
込むようにする。したがって、図２１の類似事例検索処
理（ステップＳＴ１１３）では、詳細クラスタリング処
理（ステップＳＴ７２）のところで説明した類似文照合
を呼び出し、前記一次検索結果の各事例文の係り受け解
析結果と、入力された検索文の係り受け解析結果との照
合を行う。Then, for the dependency structure of the case sentence,
An index using independent words of the syntax element of the structure (see FIG. 17) and an index based on a set of words resulting from keyword clustering are provided, and a primary search process is performed using this index to determine the similarity matching target. Case sentences to be narrowed down in advance. Therefore, in the similar case search process (step ST113) of FIG. 21, the similar sentence collation described in the detailed clustering process (step ST72) is called, and the dependency analysis result of each case sentence in the primary search result and the input. Performs collation with the dependency analysis result of the search sentence.

【００９０】次に、ステップＳＴ１２３の一次検索処理
を終了すると、ステップＳＴ１２４からステップＳＴ１
２６までのループ処理を前記一次検索結果の各事例文に
対して実行する。ステップＳＴ１２５において、一次検
索結果のｋ番目の事例文Ｓ_ｋ（図２４（ｂ）：係り受け
構造）と入力検索文Ｓ_０（図２４（ａ）：係り受け構
造）との間で前記の類似度計算Ｓｉｍ（Ｓ_０，Ｓ_ｋ，
Ｄ）を行い、ステップＳＴ１２６で当該計算結果を出力
する。Ｄの値は前述したように、必要に応じて設定す
る。Next, when the primary search processing of step ST123 is completed, steps ST124 to ST1 are executed.
The loop processing up to 26 is executed for each case sentence of the primary search result. In step ST125, the similarity between the k-th case sentence S _k (FIG. 24 (b): dependency structure) and the input search sentence S ₀ (FIG. 24 (a): dependency structure) of the primary search result is obtained. Degree calculation Sim (S ₀ , S _k ,
D) is performed, and the calculation result is output in step ST126. The value of D is set as needed, as described above.

【００９１】最後に、ステップＳＴ１２４においてルー
プ終了が検出されると、ステップＳＴ１２７において、
一次検索結果を類似度順にソートし、図２１のステップ
ＳＴ１１４において、その類似度順の一次検索結果を図
1の検索結果表示部２３に出力する。本方式によれば、
仮に文中に含まれるキーワードレベルでは完全に一致し
ていても、文の意味内容が異なれば類似度が小さくな
り、また、文中に含まれるキーワードが異なっていて
も、オントロジ１８を利用した類似文照合を行うことに
よって意味的に同じ内容の文の類似度が高くなる。Finally, when the end of the loop is detected in step ST124, in step ST127,
The primary search results are sorted in the order of similarity, and in step ST114 of FIG.
The result is output to the search result display section 23 of FIG. According to this method,
Even if they completely match at the keyword level included in the sentence, the similarity decreases if the meaning and content of the sentence are different, and even if the keywords included in the sentence are different, similar sentence matching using the ontology 18 , The similarity of sentences having semantically the same contents is increased.

【００９２】従来のキーワードのみによる検索において
は、このようなきめの細かい処理をしていないため、否
定文あるいは様相表現を含めると検索文と意味が大きく
異なる文などが検索ゴミとして上位に出力されてしまう
が、本方式によれば、例えば類似度が閾値以下の文は表
示しないように設定することで類似文としては出力しな
いようにすることができる。In a conventional search using only keywords, such detailed processing is not performed. Therefore, if a negative sentence or modal expression is included, a sentence having a significantly different meaning from the search sentence is output as a search garbage at a higher position. However, according to this method, for example, a sentence whose similarity is equal to or less than the threshold value is set not to be displayed, so that the sentence can not be output as a similar sentence.

【００９３】以上のように、検索時には、文解析処理
（ステップＳＴ１１２）において、入力検索文に対する
係り受け構造を生成し、類似事例検索処理（ステップＳ
Ｔ１１３）において、係り受け構造による検索を実施
し、類似文照合においてオントロジ１８を参照しつつ、
上記入力検索文と上記検索結果との類似度を計算し、検
索結果表示処理（ステップＳＴ１１４）でその類似度に
したがって、検索結果を検索結果表示部２３に出力する
ことで、多様な自然言語の表現に対して検索ゴミの少な
い事例検索を実現している。As described above, at the time of retrieval, in the sentence analysis process (step ST112), the dependency structure for the input search sentence is generated, and the similar case search process (step S112) is performed.
At T113), a search using the dependency structure is performed, and while referring to the ontology 18 in the similar sentence matching,
By calculating the similarity between the input search sentence and the search result, and outputting the search result to the search result display unit 23 in accordance with the similarity in the search result display process (step ST114), various natural language A case search with less search garbage is realized for expressions.

【００９４】検索結果表示処理（ステップＳＴ１１４）
における検索結果の表示の方法は、図２０に示す問題解
決木及び図２７に示す問題解決木を検索結果表示部２３
に表示し、入力文と類似した事例文を含むクラスタを強
調表示することによって行ってもよい。図２７は、例え
ば、「いやな音が聞こえる」という入力検索文に対し
て、「気になる音」とラベル表示されたクラスタが、最
も類似した事例文「変な音が聞こえる」を含むクラスタ
として強調表示された例を示している。Search result display processing (step ST114)
In the method of displaying the search result in the search result display unit 23, the problem solving tree shown in FIG.
May be displayed by highlighting a cluster including a case sentence similar to the input sentence. FIG. 27 shows an example in which, for an input search sentence “I hear an unpleasant sound”, the cluster labeled “Sound of concern” includes the most similar case sentence “I hear a strange sound”. In the example shown in FIG.

【００９５】また、検索結果表示部２３では、事例クラ
スタ編集部１９と同様に問題解決木をクラスタが含む事
例文の数の多い順に上から下に表示するので、ユーザが
過去に件数の多い事例を参照することが容易になる。ユ
ーザは各クラスタを指定して、そのクラスタが含む事例
文に対応する事例を参照し、図７のような形で表示する
ことができる。一つのクラスタには複数の事例文が含ま
れ、それぞれ別の事例に対応する。事例を参照する場合
に事例の一覧表を表示するには、図８に示したタイトル
属性のフィールドに対応する事例の内容を一覧表示すれ
ばよい。Further, the search result display unit 23 displays the problem solving tree from the top in the descending order of the number of case sentences included in the cluster, similarly to the case cluster editing unit 19. It becomes easy to refer to. The user can designate each cluster, refer to the case corresponding to the case sentence included in the cluster, and display the case as shown in FIG. One cluster includes a plurality of case sentences, each of which corresponds to another case. To display a list of cases when referring to the cases, the contents of the cases corresponding to the title attribute fields shown in FIG. 8 may be displayed in a list.

【００９６】図２５に示すように、事例データベース１
７にはクラスタ間情報が格納されている。クラスタ間情
報はクラスタ番号で表わされる２つのクラスタ間の背反
関係や類似関係が定義されている。このクラスタ間情報
を用いることにより、例えば、「タイマ表示はなし」と
いう入力文に対して、図２７で「本体ランプ未点灯」と
いうクラスタが検索されたときに、ユーザがそのクラス
タの状態が正しいとして、確認チェック欄を選択したと
きに、類似クラスタである「本体エリア表示未点灯」と
いうクラスタを強調表示したり、入タイマ表示がある状
態である「入タイマ作動」というクラスタの表示を非強
調表示したりすることができる。このように、事例蓄積
時に、データ間の関係を含めて事例データとしてあらか
じめ事例データベース１７に格納しておくことにより、
データ間の関係を意識した検索の効率化を図ることがで
きる。As shown in FIG. 25, the case database 1
7 stores inter-cluster information. The inter-cluster information defines a reciprocal relationship and a similar relationship between two clusters represented by cluster numbers. By using this inter-cluster information, for example, when a cluster “body lamp not lit” in FIG. 27 is searched for an input sentence “no timer display”, the user determines that the state of the cluster is correct. When the check box is selected, the similar cluster “main body area display not lit” is highlighted, and the display of the “on timer operation” cluster with the on timer display is not highlighted. Or you can. As described above, by storing the case data including the relationship between the data in the case database 17 in advance when storing the cases,
It is possible to improve the efficiency of the search in consideration of the relationship between data.

【００９７】[0097]

【発明の効果】以上のように、この発明によれば、少な
くとも１以上の事例文からキーワードを抽出して、各事
例文が属するクラスタを決定し、その決定結果に基づい
て各キーワードの集合で特徴付けられる問題解決木を生
成するキーワードクラスタリング手段と、そのキーワー
ドクラスタリング手段により生成された問題解決木の各
クラスタに属する各事例文の言語表現の属性に基づい
て、各事例文が属するクラスタを細分化する詳細クラス
タリング手段とを設けるように構成したので、効率よく
事例を分類して問題解決木を構築することができるとと
もに、検索文と構文的・意味的に類似する文を含む事例
を検索することができる効果がある。また、キーワード
クラスタリング手段が各事例文が属するクラスタを決定
した後に、詳細クラスタリング手段が各事例文が属する
クラスタを細分化しておくので、検索時に効率よく検索
することができる効果がある。As described above, according to the present invention, keywords are extracted from at least one or more case sentences, the cluster to which each case sentence is determined is determined, and a set of each keyword is determined based on the determination result. The keyword clustering means for generating the problem solving tree to be characterized and the cluster to which each case sentence belongs are subdivided based on the attribute of the linguistic expression of each case sentence belonging to each cluster generated by the problem solving tree generated by the keyword clustering means. And a detailed clustering means for optimizing, it is possible to efficiently classify cases and construct a problem solving tree, and to search for cases including sentences syntactically and semantically similar to the search sentence. There is an effect that can be. Further, after the keyword clustering means determines the cluster to which each case sentence belongs, the detailed clustering means subdivides the cluster to which each case sentence belongs.

【００９８】この発明によれば、少なくとも１以上の事
例文からキーワードを抽出して、各事例文が属するクラ
スタを決定し、その決定結果に基づいて各キーワードの
集合で特徴付けられる問題解決木を生成すると、その問
題解決木の各クラスタに属する各事例文の言語表現の属
性に基づいて、各事例文が属するクラスタを細分化する
ように構成したので、効率よく事例を分類して問題解決
木を構築することができるとともに、検索文と構文的・
意味的に類似する文を含む事例を検索することができる
効果がある。また、各事例文が属するクラスタを決定し
た後に、各事例文が属するクラスタを細分化しておくの
で、検索時に効率よく検索することができる効果があ
る。According to the present invention, a keyword is extracted from at least one or more case sentences, a cluster to which each case sentence is determined is determined, and a problem solving tree characterized by a set of each keyword is determined based on the determination result. When generated, the cluster to which each case sentence belongs is subdivided based on the attribute of the linguistic expression of each case sentence belonging to each cluster of the problem solving tree. As well as search statements and syntactic
There is an effect that a case including a sentence that is semantically similar can be searched. Further, after determining the cluster to which each case sentence belongs, the cluster to which each case sentence belongs is subdivided, so that there is an effect that the search can be efficiently performed at the time of search.

【００９９】この発明によれば、事例文からキーワード
を抽出するに際して、電子化文書から事例文を抽出し、
その事例文の種別を示すタグを付与するように構成した
ので、特定のタグが付与されている同一種別の事例文の
みを照合の処理対象とすることができる効果がある。According to the present invention, when extracting a keyword from a case sentence, the case sentence is extracted from the digitized document.
Since the configuration is such that a tag indicating the type of the case sentence is added, there is an effect that only case sentences of the same type to which a specific tag has been added can be set as a processing target of collation.

【０１００】この発明によれば、問題解決木を生成する
際、クラスタの階層数を指定するように構成したので、
適宜、照合精度と処理速度のバランスを変更することが
できる効果がある。According to the present invention, when the problem solving tree is generated, the number of hierarchical levels of the cluster is specified.
There is an effect that the balance between the matching accuracy and the processing speed can be appropriately changed.

【０１０１】この発明によれば、各事例文が属するクラ
スタを細分化する際、クラスタを構成する事例文間の最
低限の類似度を指定するように構成したので、処理時間
が許容される範囲内で、所望の照合精度を設定すること
ができる効果がある。According to the present invention, when subdividing the cluster to which each case sentence belongs, the minimum similarity between the case sentences constituting the cluster is specified, so that the processing time is allowed within the allowable range. Within this, there is an effect that a desired collation accuracy can be set.

【０１０２】この発明によれば、事例文からキーワード
を抽出する際、その事例文中の単語のうち、出現頻度が
設定値より高い単語をキーワードとして抽出するように
構成したので、分類に用いるキーワードとして重要なも
のを選んで用いることができる効果がある。According to the present invention, when a keyword is extracted from a case sentence, a word whose appearance frequency is higher than a set value among words in the case sentence is extracted as a keyword. There is an effect that important things can be selected and used.

【０１０３】この発明によれば、事例文からキーワード
を抽出する際、対象領域又は業務に依存する単語と各単
語間の関係に関する知識を保有するオントロジを参照
し、その事例文中の単語のうち、そのオントロジに保有
されている単語をキーワードとして抽出するように構成
したので、出現頻度が低い単語でも、重要度の高い単語
をキーワードに設定することができるようになり、きめ
の細かい照合処理を実現することができる効果がある。According to the present invention, when a keyword is extracted from a case sentence, an ontology that has knowledge about a word that depends on a target area or a task and a relationship between the words is referred to, and among words in the case sentence, Since the keywords held in the ontology are extracted as keywords, even words with low frequency of appearance can be set as keywords with high importance, realizing fine-grained matching processing There is an effect that can be.

【０１０４】この発明によれば、各事例文が属するクラ
スタを細分化する際、対象領域又は業務に依存する単語
と各単語間の関係に関する知識を保有するオントロジを
参照し、そのオントロジに保有されている単語と単語間
の関係に関する知識を詳細クラスタリングに用いるよう
に構成したので、きめの細かい照合処理を実現すること
ができる効果がある。According to the present invention, when subdividing the cluster to which each case sentence belongs, the ontology that holds knowledge about the relationship between words and words that depend on the target area or task and each word is referred to, and the ontology that is held in that ontology is referred to. Since the knowledge about the relation between the words and the words is used for the detailed clustering, there is an effect that a fine-grained matching process can be realized.

【０１０５】この発明によれば、オントロジが意味的な
上位−下位関係を示すＩＳ−Ａ関係知識を保有するよう
に構成したので、言語表現間の意味的な上位−下位関係
を識別することができる効果がある。According to the present invention, since the ontology has the IS-A relation knowledge indicating the semantic superior-subordinate relation, it is possible to identify the semantic superior-subordinate relation between the linguistic expressions. There is an effect that can be done.

【０１０６】この発明によれば、オントロジが意味的な
部分−全体関係を示すＨＡＳ−Ａ関係知識を保有するよ
うに構成したので、言語表現間の意味的な部分−全体関
係を識別することができる効果がある。According to the present invention, since the ontology is configured to have the HAS-A relation knowledge indicating the semantic part-whole relation, it is possible to identify the semantic part-whole relation between the linguistic expressions. There is an effect that can be done.

【０１０７】この発明によれば、オントロジが格関係知
識を保有するように構成したので、文間の格関係を識別
することができる効果がある。According to the present invention, since the ontology is configured to hold the case relation knowledge, there is an effect that the case relation between sentences can be identified.

【０１０８】この発明によれば、オントロジが言い換え
知識を保有するように構成したので、表現の異なる同一
意味内容の文を同一の文として取り扱うことができる効
果がある。According to the present invention, since the ontology is configured to hold paraphrase knowledge, there is an effect that sentences having the same meaning and different expressions can be treated as the same sentence.

【０１０９】この発明によれば、オントロジが背反関係
知識を保有するように構成したので、文間の背反関係を
識別することができる効果がある。According to the present invention, since the ontology is configured to hold the conflicting knowledge, there is an effect that the conflicting relation between sentences can be identified.

【０１１０】この発明によれば、クラスタに属する事例
文と検索文の類似文照合を実行する際、構文的要素の属
性に基づいて意味構造の照合を実行するように構成した
ので、様相表現等の相違を識別して、照合精度を高める
ことができる効果がある。According to the present invention, when similar sentence matching between a case sentence belonging to a cluster and a search sentence is executed, matching is performed on a semantic structure based on the attribute of a syntactic element. This has the effect of identifying the difference between the two and improving the matching accuracy.

【０１１１】この発明によれば、クラスタに属する事例
文と検索文の類似文照合を実行する際、照合の詳細度を
指定するように構成したので、所望の照合精度を設定す
ることができる効果がある。According to the present invention, when performing similar sentence matching between a case sentence and a search sentence belonging to a cluster, a configuration is adopted in which the degree of detail of matching is specified, so that a desired matching accuracy can be set. There is.

【０１１２】この発明によれば、照合の詳細度として、
構文解析木における木構造の深さを指定するように構成
したので、構文的に重要な要素を重視して、所望の照合
精度を設定することができる効果がある。According to the present invention, the detailed level of the collation is
Since the configuration is such that the depth of the tree structure in the parse tree is specified, there is an effect that a desired collation accuracy can be set with emphasis on syntactically important elements.

【０１１３】この発明によれば、各クラスタ間の類似関
係をデータベースに記述するように構成したので、各ク
ラスタ間の類似関係を明示することができる効果があ
る。According to the present invention, since the similarity between the clusters is described in the database, there is an effect that the similarity between the clusters can be specified.

【０１１４】この発明によれば、各クラスタ間の背反関
係をデータベースに記述するように構成したので、各ク
ラスタ間の背反関係を明示することができる効果があ
る。According to the present invention, since the reciprocal relationship between the clusters is described in the database, the reciprocal relationship between the clusters can be specified.

【０１１５】この発明によれば、少なくとも１以上の事
例文からキーワードを抽出して、各事例文が属するクラ
スタを決定し、その決定結果に基づいて各キーワードの
集合で特徴付けられる問題解決木を生成するキーワード
クラスタリング処理手順と、そのキーワードクラスタリ
ング処理手順により生成された問題解決木の各クラスタ
に属する各事例文の言語表現の属性に基づいて、各事例
文が属するクラスタを細分化する詳細クラスタリング処
理手順とを記述するように構成したので、効率よく事例
を分類して問題解決木を構築することができるととも
に、検索文と構文的・意味的に類似する文を含む事例を
検索することができる効果がある。また、キーワードク
ラスタリング処理手順が各事例文が属するクラスタを決
定した後に、詳細クラスタリング処理手順が各事例文が
属するクラスタを細分化しておくので、検索時に効率よ
く検索することができる効果がある。According to the present invention, a keyword is extracted from at least one or more case sentences, a cluster to which each case sentence is determined is determined, and a problem solving tree characterized by a set of each keyword is determined based on the determination result. Detailed clustering processing for subdividing the cluster to which each case sentence belongs based on the keyword clustering processing procedure to be generated and the attribute of the linguistic expression of each case sentence belonging to each cluster in the problem solving tree generated by the keyword clustering processing procedure Since it is configured to describe procedures, it is possible to efficiently classify cases and build problem solving trees, and to search for cases that include sentences that are syntactically and semantically similar to search sentences. effective. Further, after the keyword clustering processing procedure determines the cluster to which each case sentence belongs, the detailed clustering processing procedure subdivides the cluster to which each case sentence belongs.

[Brief description of the drawings]

【図１】この発明の実施の形態１による言語事例推論
装置を示す構成図である。FIG. 1 is a configuration diagram illustrating a language case inference device according to Embodiment 1 of the present invention;

【図２】この発明の実施の形態１による言語事例推論
方法（事例構築処理）を示すフローチャートである。FIG. 2 is a flowchart illustrating a language case inference method (case construction processing) according to Embodiment 1 of the present invention;

【図３】文抽出処理（ステップＳＴ１）の具体的内容
を示すフローチャートである。FIG. 3 is a flowchart showing specific contents of a sentence extraction process (step ST1).

【図４】図３における文抽出処理（ステップＳＴ１
４）の具体的内容を示すフローチャートである。FIG. 4 is a sentence extraction process in FIG. 3 (step ST1);
It is a flowchart which shows the specific content of 4).

【図５】非定型フィールド処理（ステップＳＴ２６）
の具体的内容を示すフローチャートである。FIG. 5 is an atypical field process (step ST26).
3 is a flowchart showing the specific contents of FIG.

【図６】電子化文書１１の一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of the digitized document 11;

【図７】文抽出ステップＳＴ１により図６に示した電
子化文書１１から生成された“事例”を示す説明図であ
る。FIG. 7 is an explanatory diagram showing “cases” generated from the digitized document 11 shown in FIG. 6 by a sentence extraction step ST1.

【図８】文書フィールド情報等を示す説明図である。FIG. 8 is an explanatory diagram showing document field information and the like.

【図９】文タグ一覧表を示す説明図である。FIG. 9 is an explanatory diagram showing a sentence tag list.

【図１０】文解析処理（ステップＳＴ２）の具体的内
容を示すフローチャートである。FIG. 10 is a flowchart showing specific contents of a sentence analysis process (step ST2).

【図１１】形態素解析処理（ステップＳＴ４１）の具
体的内容を示すフローチャートである。FIG. 11 is a flowchart showing specific contents of a morphological analysis process (step ST41).

【図１２】構文解析処理（ステップＳＴ４２）の具体
的内容を示すフローチャートである。FIG. 12 is a flowchart showing specific contents of a syntax analysis process (step ST42).

【図１３】類似事例分類処理（ステップＳＴ３）の具
体的内容を示すフローチャートである。FIG. 13 is a flowchart showing specific contents of a similar case classification process (step ST3).

【図１４】キーワードクラスタリング処理（ステップ
ＳＴ７１）の具体的内容を示すフローチャートである。FIG. 14 is a flowchart showing specific contents of a keyword clustering process (step ST71).

【図１５】図１４の再帰関数呼び出し処理の具体的内
容を示すフローチャートである。FIG. 15 is a flowchart showing specific contents of a recursive function calling process of FIG.

【図１６】キーワードクラスタリング処理（ステップ
ＳＴ７１）によって生成された階層的な構造解決木の一
例を示す説明図である。FIG. 16 is an explanatory diagram showing an example of a hierarchical structure solution tree generated by a keyword clustering process (step ST71).

【図１７】事例データベース１７に格納された事例文
データの一例を示す説明図である。FIG. 17 is an explanatory diagram showing an example of case sentence data stored in the case database 17;

【図１８】詳細クラスタリング処理（ステップＳＴ７
２）の具体的内容を示すフローチャートである。FIG. 18 shows a detailed clustering process (step ST7).
It is a flowchart which shows the specific content of 2).

【図１９】文節構造の一例を示す説明図である。FIG. 19 is an explanatory diagram showing an example of a clause structure.

【図２０】詳細クラスタリング処理（ステップＳＴ７
２）が分類して作成した問題解決木を示す説明図であ
る。FIG. 20 shows a detailed clustering process (step ST7).
FIG. 4 is an explanatory diagram showing a problem solving tree created by classification in 2).

【図２１】この発明の実施の形態１による言語事例推
論方法（事例検索処理）を示すフローチャートである。FIG. 21 is a flowchart illustrating a language case inference method (case search process) according to Embodiment 1 of the present invention;

【図２２】類似事例検索処理（ステップＳＴ１１３）
の具体的内容を示すフローチャートである。FIG. 22 is a similar case search process (step ST113).
3 is a flowchart showing the specific contents of FIG.

【図２３】オントロジ１８の保有する知識を示す説明
図である。FIG. 23 is an explanatory diagram showing knowledge held by the ontology 18.

【図２４】事例文等の構造を示す説明図である。FIG. 24 is an explanatory diagram showing a structure of a case sentence and the like.

【図２５】類似事例分類処理（ステップＳＴ３）のス
テップＳＴ９３において階層関係以外のクラスタ間情報
を事例データベース１７に格納したものの一例を示す説
明図である。FIG. 25 is an explanatory diagram showing an example in which inter-cluster information other than a hierarchical relationship is stored in the case database 17 in step ST93 of the similar case classification process (step ST3).

【図２６】類似度計算におけるペナルティ計算規則の
一例を示す説明図である。FIG. 26 is an explanatory diagram showing an example of a penalty calculation rule in similarity calculation.

【図２７】類似事例分類を行って作成した問題解決木
の一例を示す説明図である。FIG. 27 is an explanatory diagram showing an example of a problem solving tree created by performing similar case classification.

【図２８】従来の言語事例推論装置を示す構成図であ
る。FIG. 28 is a configuration diagram showing a conventional language case inference apparatus.

[Explanation of symbols]

１１電子化文書、１２文抽出部、１３文解析部、
１４類似事例分類部、１５キーワードクラスタリン
グ部（キーワードクラスタリング手段）、１６詳細クラ
スタリング部（詳細クラスタリング手段）、１７事例
データベース、１８オントロジ、１９事例クラスタ
編集部、２０検索文、２１検索文入力部、２２類
似事例検索部（類似事例検索手段）、２３検索結果表
示部、２４解析用単語辞書、２５付属語辞書、２６
付属語接続表、２７文法規則、３１ＩＳ−Ａ知
識、３２ＨＡＳ−Ａ知識、３３格関係知識、３４
言い換え知識、３５背反な情報。11 digitized document, 12 sentence extraction unit, 13 sentence analysis unit,
14 similar case classification unit, 15 keyword clustering unit (keyword clustering means), 16 detailed clustering unit (detailed clustering means), 17 case databases, 18 ontology, 19 case cluster editing unit, 20 search sentences, 21 search sentence input unit, 22 Similar case search section (similar case search means), 23 search result display section, 24 word dictionary for analysis, 25 attached word dictionary, 26
Annex connection table, 27 grammar rules, 31 IS-A knowledge, 32 HAS-A knowledge, 33 case relation knowledge, 34
Paraphrase knowledge, 35 rebellious information.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 15/403 ３３０Ｃ３５０Ｃ (72)発明者鈴木克志東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B075 ND03 NK32 NR12 PP24 PR04 PR06 QM08 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification mark FI Theme coat ゛ (Reference) G06F 15/403 330C 350C (72) Inventor Katsushi Suzuki 2-3-2 Marunouchi 2-chome, Chiyoda-ku, Tokyo Mitsubishi Electric F term in reference (reference) 5B075 ND03 NK32 NR12 PP24 PR04 PR06 QM08

Claims

[Claims]

1. Keyword clustering for extracting a keyword from at least one or more case sentences, determining a cluster to which each case sentence belongs, and generating a problem solving tree characterized by a set of each keyword based on the determination result. Means, detailed clustering means for subdividing the cluster to which each case sentence belongs based on the attribute of the linguistic expression of each case sentence belonging to each cluster generated by the keyword clustering means, and keywords from the search sentence Language case inference apparatus, comprising: extracting a cluster having the keyword as an element from the problem solving tree; and performing similar sentence matching on the case sentences belonging to the cluster and the similar sentence of the search sentence. .

2. Extracting a keyword from at least one or more case sentences, determining a cluster to which each case sentence belongs, and generating a problem solving tree characterized by a set of keywords based on the determination result. Based on the attribute of the linguistic expression of each case sentence belonging to each cluster of the problem solving tree, the cluster to which each case sentence belongs is subdivided, and a keyword is extracted from the search sentence, and the cluster having the keyword as an element is described above. A language case inference method that searches from a problem solving tree and performs similar sentence matching between case sentences belonging to the cluster and the search sentence.

3. The language case inference method according to claim 2, wherein, when extracting a keyword from the case sentence, the case sentence is extracted from the digitized document and a tag indicating a type of the case sentence is added.

4. The language case inference method according to claim 2, wherein when generating the problem solving tree, the number of hierarchical levels of the cluster is specified.

5. The language case inference method according to claim 2, wherein at the time of subdividing the cluster to which each case sentence belongs, a minimum similarity between the case sentences constituting the cluster is specified.

6. The language case inference method according to claim 2, wherein, when extracting a keyword from the case sentence, a word having an appearance frequency higher than a set value among words in the case sentence is extracted as a keyword.

7. When extracting a keyword from a case sentence, referencing an ontology that holds knowledge of a relationship between each word and a word that depends on a target area or a task, and among the words in the case sentence, holding the ontology in the ontology 3. The language case inference method according to claim 2, wherein the extracted words are extracted as keywords.

8. When subdividing the cluster to which each case sentence belongs, refer to an ontology that has knowledge about the relationship between each word and a word that depends on the target area or task, and refer to the ontology that holds the ontology. 3. The language case inference method according to claim 2, wherein knowledge about relations between words is used for detailed clustering.

9. The language case inference method according to claim 7, wherein the ontology has IS-A relation knowledge indicating a semantic upper-lower relation.

10. The language case inference method according to claim 7, wherein the ontology has HAS-A relation knowledge indicating a semantic part-whole relation.

11. The language case inference method according to claim 7, wherein the ontology has case-related knowledge.

12. The language case inference method according to claim 7, wherein the ontology has paraphrase knowledge.

13. The language case inference method according to claim 7, wherein the ontology holds conflicting knowledge.

14. The language case inference according to claim 2, wherein when performing similar sentence matching between the case sentence belonging to the cluster and the search sentence, matching of the semantic structure is executed based on the attribute of the syntactic element. Method.

15. The language case inference method according to claim 2, wherein when performing similar sentence matching between the case sentence belonging to the cluster and the search sentence, the degree of detail of the matching is specified.

16. The method according to claim 1, wherein the depth of the tree structure in the parse tree is specified as the level of detail of the collation.
5. The language case inference method according to 5.

17. The method according to claim 2, wherein the similarity between the clusters is described in a database.
The language case inference method according to any one of the above.

18. The method according to claim 2, wherein the conflicting relation between each cluster is described in a database.
The language case inference method according to any one of the above.

19. A keyword is extracted from at least one or more case sentences, a cluster to which each case sentence belongs is determined,
A keyword clustering processing procedure for generating a problem solving tree characterized by a set of keywords based on the determination result, and a linguistic expression of each case sentence belonging to each cluster generated by the above-described keyword clustering processing procedure A detailed clustering processing procedure for subdividing the cluster to which each case sentence belongs based on the attribute, and extracting a keyword from the search sentence, searching for a cluster having the keyword as an element from the problem solving tree, and belonging to the cluster A storage medium in which a language case inference program including a case sentence and a similar case search processing procedure for executing similar sentence matching of the search sentence is described.