JP4831737B2

JP4831737B2 - Keyword emphasis device and program

Info

Publication number: JP4831737B2
Application number: JP2006028325A
Authority: JP
Inventors: 真樹村田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2006-02-06
Filing date: 2006-02-06
Publication date: 2011-12-07
Anticipated expiration: 2026-02-06
Also published as: JP2007207161A

Description

本発明は、ユーザが指定した領域に含まれる語の中で、例えば、「疑問詞」＋「数字と結合できる名詞（時、月、年、歳、枚、など）」で示されるキーワードに対応して、本文中において「数値」＋「数字と結合できる名詞」で表される部分を強調表示することで、疑問詞の問いかけに対応する回答部分が容易に判るようにするキーワード強調装置及びプログラムに関する。 The present invention corresponds to a keyword indicated by, for example, “interrogative words” + “nouns that can be combined with numbers (hours, months, years, years, photos, etc.)” in the words included in the area specified by the user. Then, a keyword emphasizing device and a program that make it easy to find the answer part corresponding to the question of the question word by highlighting the part represented by “numerical value” + “noun that can be combined with a number” in the text About.

従来のキーワード入力に対する検索結果の強調表示システムは、タイトル中に出てきた単語を本文中において強調表示するものであった（特許文献１参照）。
特開２００４−２８０１７６号公報 A conventional search result highlighting system for keyword input highlights a word appearing in a title in the text (see Patent Document 1).
JP 2004-280176 A

上記従来の強調表示システムは、タイトルが質問文となり本文が回答文となったものにおいて、回答文の中で質問の疑問詞に対応する部分を強調表示できるものではなかった。 The conventional highlighting system cannot highlight a portion corresponding to the question question word in the answer sentence in the case where the title is the question sentence and the text is the answer sentence.

本発明は上記問題点の解決を図り、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分を容易に見つけるようにすることを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems and to easily find a display portion corresponding to an interrogative word that the user really wants to know in a displayed answer document.

図１は本発明のキーワード強調装置の説明図である。図１中、１は表示装置（表示手段）、２は入力装置（入力手段）、３は抽出手段（抽出装置）、４は疑問詞後接語抽出装置（疑問詞後接語抽出手段）、５は主要語抽出装置（主要語抽出手段）である。 FIG. 1 is an explanatory diagram of a keyword emphasizing apparatus according to the present invention. In FIG. 1, 1 is a display device (display means), 2 is an input device (input means), 3 is an extraction means (extraction device), 4 is an interrogative postfix postfix extraction device (interrogative postfix postfix extraction means), Reference numeral 5 denotes a main word extraction device (main word extraction means).

本発明は、前記従来の課題を解決するため次のような手段を有する。 The present invention has the following means in order to solve the conventional problems.

（１）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から疑問詞に後接する名詞又は疑問詞に後接する接尾辞を取り出す疑問詞後接語抽出手段４と、前記回答の記事において取り出した前記疑問詞に後接していた名詞又は接尾辞を強調表示する表示手段１とを備える。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分を容易に見つけることができる。 (1): an input means 2 for inputting a set of questions and their answer articles, and a question word postfix extraction means 4 for extracting a noun that follows the question word or a suffix that follows the question word from the question sentence; And display means 1 for highlighting the noun or suffix that was suffixed to the question word extracted in the answer article. For this reason, it is possible to easily find the display portion corresponding to the question word that the user really wants to know in the displayed answer document.

（２）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から疑問詞に後接する数字と結合できる所定の名詞又は疑問詞に後接する数字と結合できる所定の接尾辞を取り出す疑問詞後接語抽出手段４と、前記回答の記事において数字と前記取り出した所定の名詞又は所定の接尾辞のうち少なくとも一つを強調表示する表示手段１とを備える。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する回答（数字）の表示部分を容易に見つけることができる。 (2): a predetermined noun that can be combined with a number that follows the question from the question sentence and a predetermined suffix that can be combined with the number that follows the question. And a display unit 1 that highlights at least one of a number and the extracted predetermined noun or predetermined suffix in the article of the answer. For this reason, it is possible to easily find the display portion of the answer (number) corresponding to the question word that the user really wants to know in the displayed answer document.

（３）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から所定の数量表現を指す疑問詞があることを確認する抽出手段３と、前記回答の記事において数字を強調表示する表示手段１とを備える。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する回答（数字）の表示部分を容易に見つけることができる。 (3): Input means 2 for inputting a set of questions and their answer articles, extraction means 3 for confirming that there is a question word indicating a predetermined quantity expression from the question sentence, and numbers in the answer articles And display means 1 for highlighting. For this reason, it is possible to easily find the display portion of the answer (number) corresponding to the question word that the user really wants to know in the displayed answer document.

（４）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から予め指定した疑問詞の種類を特定する抽出手段３と、前記回答の記事において前記疑問詞の種類に対応する固有表現を抽出して強調表示する表示手段１とを備える。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する固有表現の表示部分を容易に見つけることができる。 (4): an input means 2 for inputting a set of questions and their answer articles, an extraction means 3 for specifying a pre-designated question word type from the question sentence, and a question word type in the answer article And display means 1 for extracting and highlighting the specific expression corresponding to. For this reason, the display part of the specific expression corresponding to the question word which wants to know really can be easily found in the displayed answer document.

（５）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から予め指定した理由を指す疑問詞を特定する抽出手段３と、前記回答の記事において前記理由を示す所定の単語を強調表示する表示手段１とを備える。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する理由を示す表示部分を容易に見つけることができる。 (5): Input means 2 for inputting a set of questions and their answer articles, extraction means 3 for specifying a question word indicating the reason specified in advance from the sentence of the question, and the reason in the answer article Display means 1 for highlighting a predetermined word. For this reason, the display part which shows the reason corresponding to the question word which wants to know really in the displayed answer document can be found easily.

（６）：前記（１）〜（５）のキーワード強調装置において、前記質問の文から主要語を取り出す主要語抽出手段５を備え、前記表示手段１は、前記回答の記事において前記取り出した主要語を強調表示する。このため、強調表示される主要語の周辺の回答文書の中で、本当に知りたい疑問詞に対応する表示部分（回答）を容易に見つけることができる。 (6): In the keyword emphasizing device according to (1) to (5), the keyword emphasizing device includes a main word extracting unit 5 for extracting a main word from the sentence of the question, and the display unit 1 includes the main word extracted in the article of the answer. Highlight a word. For this reason, it is possible to easily find the display portion (answer) corresponding to the question word that the user really wants to know in the answer documents around the highlighted main word.

（７）：前記（６）のキーワード強調装置において、前記表示手段１で強調表示する主要語と他の強調表示では、異なる強調表示を行う。このため、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分をより簡単に見つけることができる。 (7): In the keyword emphasizing device of (6), different emphasis display is performed for the main word highlighted by the display means 1 and other emphasis display. For this reason, it is possible to more easily find a display portion corresponding to a question word that is really desired to be known in the displayed answer document.

（８）：質問とその回答の記事のセットを入力する入力手段２と、前記質問の文から疑問詞に後接する名詞又は疑問詞に後接する接尾辞を取り出す疑問詞後接語抽出手段４と、前記回答の記事において取り出した前記疑問詞に後接していた名詞又は接尾辞を強調表示する表示手段１として、コンピュータを機能させるためのプログラムとする。このため、このプログラムをコンピュータにインストールすることで、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分を容易に見つけることができるキーワード強調装置を容易に提供することができる。 (8): an input means 2 for inputting a set of questions and their answer articles, and a questionable postfix postfix extraction means 4 for extracting a noun or postscript suffixed from the question sentence from the question sentence. A program for causing a computer to function as the display means 1 that highlights the noun or suffix that is suffixed to the interrogative word extracted in the answer article. For this reason, by installing this program in a computer, it is possible to easily provide a keyword emphasis device that can easily find a display portion corresponding to a question word that the user really wants to know in the displayed answer document.

本発明によれば次のような効果がある。
（１）：表示手段で、取り出した疑問詞に後接していた名詞又は接尾辞を回答の記事において強調表示するため、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分を容易に見つけることができる。 The present invention has the following effects.
(1): In order to highlight the noun or suffix that was followed by the extracted question word in the answer article by the display means, a display portion corresponding to the question word that is really desired to be known in the displayed answer document is displayed. Can be easily found.

（２）：表示手段で、数字と取り出した所定の名詞又は所定の接尾辞のうち少なくとも一つを回答の記事において強調表示するため、表示された回答文書の中で本当に知りたい疑問詞に対応する回答（数字）の表示部分を容易に見つけることができる。 (2): Because the display means highlights at least one of the number and the specified noun or the specified suffix in the answer article, it corresponds to the question word you really want to know in the displayed answer document The display part of the answer (number) to be performed can be easily found.

（３）：表示手段で、数字を回答の記事において強調表示するため、表示された回答文書の中で本当に知りたい疑問詞に対応する回答（数字）の表示部分を容易に見つけることができる。 (3): Since the number is highlighted in the answer article by the display means, it is possible to easily find the display portion of the answer (number) corresponding to the question word that the user really wants to know in the displayed answer document.

（４）：表示手段で、疑問詞の種類に対応する固有表現を抽出して、回答の記事において強調表示するため、表示された回答文書の中で本当に知りたい疑問詞に対応する固有表現の表示部分を容易に見つけることができる。 (4): Since the specific means corresponding to the type of question word is extracted by the display means and highlighted in the answer article, the proper expression corresponding to the question word that is really desired to be known in the displayed answer document is displayed. The display part can be easily found.

（５）：表示手段で、理由を示す所定の単語を回答の記事において強調表示するため、表示された回答文書の中で本当に知りたい疑問詞に対応する理由を示す表示部分を容易に見つけることができる。 (5): In order to highlight a predetermined word indicating the reason in the answer article by the display means, it is possible to easily find a display portion indicating the reason corresponding to the question word that the user really wants to know in the displayed answer document. Can do.

（６）：表示手段で、取り出した主要語を回答の記事において強調表示するため、回答文書の中で本当に知りたい疑問詞に対応する表示部分を容易に見つけることができる。 (6): Since the extracted main word is highlighted in the answer article by the display means, it is possible to easily find the display portion corresponding to the question word that the user really wants to know in the answer document.

（７）：表示手段で強調表示する主要語と他の強調表示では、異なる強調表示を行うため、表示された回答文書の中で本当に知りたい疑問詞に対応する表示部分をより簡単に見つけることができる。 (7): Since the main word highlighted by the display means and other highlights are displayed differently, it is easier to find the display part corresponding to the question word that you really want to know in the displayed answer document. Can do.

本発明のキーワード強調装置は、ユーザが指定した領域に含まれる語の中で、例えば、「疑問詞」＋「数字と結合できる名詞（時、月、年、歳、枚、など）」で示されるキーワードに対応して、本文中において「数値」＋「数字と結合できる名詞」で表される部分を強調表示することで、疑問詞の問いかけに対応する回答部分が容易に判るようにするものである。 The keyword emphasizing device of the present invention is indicated by, for example, “interrogative words” + “nouns that can be combined with numbers (hour, month, year, year, year, sheet, etc.)” among the words included in the region specified by the user. By highlighting the part of the text that is represented by “numerical value” + “noun that can be combined with a number”, the answer part corresponding to the questioning question is easily understood. It is.

Ｗｅｂ（ウエブ）サイトでの質問とその回答やＦＡＱ（よくある質問とその回答）のように、質問と回答の記事を人手で作成し蓄えておき、ユーザに提示するということが多くなってきている。そのときに、本発明のような強調表示を使用すると、質問に対する回答が容易に判るようになる。 Questions and answers on Web sites and their answers and FAQs (frequently asked questions and their answers) are frequently created and stored manually and presented to users. Yes. At that time, if the highlighting as in the present invention is used, the answer to the question can be easily understood.

（１）：キーワード強調装置の説明
図１はキーワード強調装置の説明図である。図１において、キーワード強調装置（システム）には、表示装置１、入力装置２、抽出装置３が設けてある。抽出装置３には、疑問詞後接語抽出装置４、主要語抽出装置５が設けてある。 (1): Explanation of Keyword Enhancement Device FIG. 1 is an explanatory diagram of a keyword enhancement device. In FIG. 1, a keyword emphasis device (system) is provided with a display device 1, an input device 2, and an extraction device 3. The extraction device 3 is provided with an interrogative postfix word extraction device 4 and a main word extraction device 5.

表示装置１は、情報を表示するＣＲＴ、液晶等の表示画面を備えた表示手段である。入力装置２は、情報を入力する入力手段である。抽出手段３は、単語の抽出処理等を行う抽出装置（処理手段）である。疑問詞後接語抽出装置４は、疑問詞の後ろにくる名詞や接尾辞を抽出する疑問詞後接語抽出手段である。主要語抽出装置５は、あまり意味のない単語（「もの」「こと」等の予め指定した単語）を除いた名詞や動詞等を抽出する主要語抽出手段である。 The display device 1 is a display means having a display screen such as a CRT or liquid crystal for displaying information. The input device 2 is input means for inputting information. The extraction means 3 is an extraction device (processing means) that performs word extraction processing and the like. The interrogative postfix extraction device 4 is interrogative postfix postfix extraction means for extracting a noun or suffix that comes after the interrogative. The main word extraction device 5 is a main word extraction unit that extracts nouns, verbs, and the like excluding words that are not very meaningful (previously designated words such as “things” and “things”).

（２）：疑問詞の後ろに付く単語を強調表示する説明(1)
図２は疑問詞の後ろに付く単語を強調表示するフローチャートである。以下、図２の処理Ｓ１〜Ｓ４に従って説明する。 (2): Explanation that highlights the word after the question (1)
FIG. 2 is a flowchart for highlighting a word after an interrogative word. Hereinafter, a description will be given according to the processes S1 to S4 of FIG.

Ｓ１：入力装置２により質問とその回答の記事のセットが与えられ、処理Ｓ２に移る。
Ｓ２：疑問詞後接語抽出装置４は、質問の文から疑問詞＋「名詞or接尾辞」を取り出し、処理Ｓ３に移る。 S1: A set of a question and an answer article is given by the input device 2, and the process proceeds to step S2.
S2: The interrogative postfix extraction device 4 extracts the interrogative + “noun or suffix” from the question sentence, and proceeds to processing S3.

Ｓ３：主要語抽出装置５は、質問の文から主要語を取り出し、処理Ｓ４に移る。
ここで主要語は、名詞や動詞などである。ただし、あらかじめ指定した所定の単語は除く（例えば、「もの」「こと」などのあまり意味をなさない単語) 。 S3: The main word extraction device 5 extracts the main word from the question sentence, and proceeds to processing S4.
The main words here are nouns and verbs. However, predetermined words designated in advance are excluded (for example, words that do not make much sense such as “things” and “things”).

Ｓ４：表示装置１は、回答の記事において取り出した主要語、疑問詞に後接していた「名詞or接尾辞」を強調表示（常にバックに黄色を出すなど）する。 S4: The display device 1 highlights the “noun or suffix” that is followed by the main word and question word extracted in the answer article (eg, always displays yellow on the back).

例：・・・・何大学・・・・・の質問の場合、回答本文で、大学を黄色で強調表示する。これにより、強調表示部分を見ることで、質問に対する回答を容易に見つけることができる。 Example: In the case of a question of how many universities ..., highlight the university in yellow in the answer text. Thereby, the answer to the question can be easily found by looking at the highlighted portion.

なお、ここで強調表示とは、文字の色を変えて表示する、文字の背景の色を変える又は網かけを行う、文字の字体を変える（太文字、斜体文字等）、下線付けや括弧で囲む、文字の上に記号等を設ける等で行うことができる。 Note that highlighting here is to change the character color, change the background color of the character or shade, change the character font (bold, italic, etc.), underline or parentheses. This can be done by enclosing or providing a symbol or the like on the character.

（ＦＡＱの具体例による説明）
（質問）東京で偏差値の高いのは何大学ですか．
（回答）受験する学部により偏差値の値は異なりますが、一般的に東京大学の偏差値が各学部とも高いようです．
キーワード強調装置では、以下のように強調表示する（ここでは「＜」、「＞」で強調表示）。
（質問）東京で偏差値の高いのは何＜大学＞ですか．
（回答）受験する学部により偏差値の値は異なりますが、一般的には東京＜大学＞の偏差値が各学部とも高いようです． (Explanation by specific example of FAQ)
(Question) How many universities have high deviation values in Tokyo?
(Answer) Although the value of the deviation varies depending on the faculty that takes the exam, in general, the deviation value of the University of Tokyo seems to be high in each faculty.
In the keyword emphasis device, highlighting is performed as follows (in this case, highlighting is performed with “<” and “>”).
(Question) What is the high degree of deviation in Tokyo?
(Answer) The value of the deviation varies depending on the faculty to take the exam, but in general, the deviation value of Tokyo <University> seems to be high in each faculty.

（３）：単語の切り出し品詞の特定の説明
疑問詞、名詞、接尾辞、動詞の単語の抽出は、形態素解析を使用して行うことができる。 (3): Specific explanation of word segmentation part of speech Extraction of interrogative words, nouns, suffixes, and verb words can be performed using morphological analysis.

（形態素解析システムの説明）
ここでは ChaSen （日本語）について説明する。奈良先端大で開発されている形態素解析システム茶筌 http://chasen.aist-nara.ac.jp/index.html.jaで公開されている。
これは、日本語文を分割し、さらに、各単語の品詞も推定してくれる。 (Description of morphological analysis system)
Here, ChaSen (Japanese) is explained. The morphological analysis system tea bowl developed at Nara Institute of Technology http://chasen.aist-nara.ac.jp/index.html.ja
This splits the Japanese sentence and also estimates the part of speech of each word.

例えば、「学校へ行く」を入力すると以下の結果をえる。
学校ガッコウ学校名詞- 一般
へヘへ助詞- 格助詞- 一般
行くイク行く動詞- 自立五段・カ行促音便基本形
EOS
このように、各行に一個の単語が入るように分割され、各単語に読みや品詞の情報が付与される。 For example, entering “go to school” gives the following results:
School Gakkou School Noun-General To He To Particle-Case Particle-General Go Iku Go Verb-Independence
EOS
In this way, each line is divided so that one word is included, and reading and part-of-speech information are given to each word.

英語の品詞のタグ付けの説明
英語の品詞タグつけシステムとしては、次の Brillのものが有名である。
Eric Brill,
Transformation-Based Error-Driven Learning and
Natural Language Processing: A Case Study in Part-of-Speech Tagging,
Computational Linguistics, Vol. 21, No. 4, p.543-565, 1995.
これは、英語文の各単語の品詞を推定してくれるものである。 Explanation of English part-of-speech tagging The following part-of-speech tagging systems in English are famous.
Eric Brill,
Transformation-Based Error-Driven Learning and
Natural Language Processing: A Case Study in Part-of-Speech Tagging,
Computational Linguistics, Vol. 21, No. 4, p.543-565, 1995.
This estimates the part of speech of each word in an English sentence.

（４）：疑問詞の後ろに付く単語を利用して強調表示する説明(2)
図３は疑問詞の後ろに付く単語を強調表示するフローチャートである。以下、図３の処理Ｓ１１〜Ｓ１４に従って説明する。 (4): Explanation highlighted using the word after the question (2)
FIG. 3 is a flowchart for highlighting a word after the question word. Hereinafter, a description will be given according to the processes S11 to S14 of FIG.

Ｓ１１：入力装置２により質問とその回答の記事のセットが与えられ、処理Ｓ１２に移る。 S11: A set of a question and an answer article is given by the input device 2, and the process proceeds to S12.

Ｓ１２：疑問詞後接語抽出装置４は、質問の文から疑問詞＋「数字と結合できる所定の名詞or接尾辞」を取り出し、処理Ｓ１３に移る。 S12: The interrogative postfix extraction unit 4 extracts the interrogative + “predetermined noun or suffix that can be combined with a number” from the question sentence, and proceeds to processing S13.

Ｓ１３：主要語抽出装置５は、質問の文から主要語を取り出し、処理Ｓ１４に移る。
ここで主要語は、名詞や動詞などである。ただし、あらかじめ指定した所定の単語（例えば、「もの」「こと」などのあまり意味をなさない単語) は除く。 S13: The main word extraction device 5 extracts the main word from the question sentence, and proceeds to processing S14.
The main words here are nouns and verbs. However, predetermined words specified in advance (for example, words that do not make much sense such as “things” and “things”) are excluded.

Ｓ１４：表示装置１は、回答の記事において取り出した主要語、数字＋「取り出した名詞or接尾辞」を強調表示する。数字＋「取り出した名詞or接尾辞」は、それ専用の強調表示（例えば、主要語とは異なる色（常にバックに黄色を出すなど））する。
例：・・・・何個・・・・・の質問の場合、回答本文で、「３個」を黄色で強調表示する。これにより、強調表示部分を見ることで、質問に対する回答を容易に見つけることができる。 S14: The display device 1 highlights the main word extracted in the answer article, the number + “the extracted noun or suffix”. The number + “taken out noun or suffix” is highlighted (for example, a color different from the main word (e.g., yellow is always displayed in the background)).
For example: ························································· “3” is highlighted in yellow Thereby, the answer to the question can be easily found by looking at the highlighted portion.

（ＦＡＱの具体例による説明）
（質問）睡眠時間は何時間くらいがいいですか．
（回答）諸説別れますが，７時間から８時間がよいという説が一般的です．でもいつ寝るかも重要に思います．昼間長時間寝ても，夜寝るのに比べて効果が低いと思います．
キーワード強調装置では、以下のように強調表示する（ここでは「＜」、「＞」で強調表示）。 (Explanation by specific example of FAQ)
(Question) How many hours should I sleep?
(Answer) There are various theories, but the theory that 7 to 8 hours is good is common. But I think it is important to go to bed. Even if you sleep for a long time in the daytime, it is less effective than sleeping at night.
In the keyword emphasis device, highlighting is performed as follows (in this case, highlighting is performed with “<” and “>”).

（質問）睡眠時間は何＜時間＞くらいがいいですか．
（回答）諸説別れますが，＜７時間＞から＜８時間＞がよいという説が一般的です．でもいつ寝るかも重要に思います．昼間長時間寝ても，夜寝るのに比べて効果が低いと思います．
また、以下のように強調表示することもできる（ここでは「＜」、「＞」で強調表示）。
（質問）睡眠時間は何＜時間＞くらいがいいですか．
（回答）諸説別れますが，７＜時間＞から８＜時間＞がよいという説が一般的です．でもいつ寝るかも重要に思います．昼間長＜時間＞寝ても，夜寝るのに比べて効果が低いと思います．
このように、すぐに７時間、８時間の表現に目がいき便利となる。 (Question) What is the best time to sleep?
(Answer) There are various theories, but the theory that <7 hours> to <8 hours> is good. But I think it is important to go to bed. Even if you sleep for a long time in the daytime, it is less effective than sleeping at night.
It can also be highlighted as follows (in this case, it is highlighted with “<” and “>”).
(Question) What is the best time to sleep?
(Answer) There are various theories, but the general theory is that 7 <time> to 8 <time> are good. But I think it is important to go to bed. Sleeping in the daytime <time> is less effective than sleeping at night.
In this way, it is immediately convenient to express 7 hours and 8 hours.

（５）：数量表現を指す疑問詞を利用して強調表示する説明
図４は数量表現を指す疑問詞を利用して強調表示するフローチャートである。以下、図４の処理Ｓ２１〜Ｓ２４に従って説明する。 (5): Explanation of highlighting using a question word indicating a quantity expression FIG. 4 is a flowchart for highlighting using a question word indicating a quantity expression. Hereinafter, a description will be given according to the processes S21 to S24 in FIG.

Ｓ２１：入力装置２により質問とその回答の記事のセットが与えられ、処理Ｓ２２に移る。 S21: A set of questions and their answers is given by the input device 2, and the process proceeds to S22.

Ｓ２２：抽出手段３は、質問の文から所定の数量表現を指す疑問詞があることを確認し、処理Ｓ２３に移る。 S22: The extraction means 3 confirms that there is an interrogative that indicates a predetermined quantity expression from the question sentence, and proceeds to processing S23.

Ｓ２３：主要語抽出装置５は、質問の文から主要語を取り出し、処理Ｓ２４に移る。
ここで主要語は、名詞や動詞などである。ただし、あらかじめ指定した所定の単語（例えば、「もの」「こと」などのあまり意味をなさない単語) は除く。 S23: The main word extraction device 5 extracts the main word from the question sentence, and proceeds to processing S24.
The main words here are nouns and verbs. However, predetermined words specified in advance (for example, words that do not make much sense such as “things” and “things”) are excluded.

Ｓ２４：表示装置１は、回答の記事において取り出した主要語、数字を強調表示する。ここで数字はそれ専用の強調表示（例えば、主要語（例えば赤）とは異なる色（常にバックに黄色を出すなど））する。
例：・・・・いくつ・・・・の質問の場合、回答本文で、３個の「３」を黄色で強調表示する。これにより、強調表示部分を見ることで、質問に対する回答を容易に見つけることができる。 S24: The display device 1 highlights the main words and numbers extracted in the answer article. Here, the numbers are highlighted (for example, a color different from the main word (for example, red) (e.g., yellow is always displayed in the background)).
Example: .... For some questions, three "3" s are highlighted in yellow in the answer text. Thereby, the answer to the question can be easily found by looking at the highlighted portion.

この場合、回答が数字となる疑問詞は、予めキーワード強調装置の格納手段（図示せず）に記憶して置くものである。回答が数字となる疑問詞として、「いかほど」、「どのくらい」等がある。 In this case, the question word whose answer is a number is stored in advance in storage means (not shown) of the keyword emphasis device. There are "how much", "how much", etc. as the question words whose answer is a number.

（ＦＡＱの具体例による説明）
（質問）睡眠時間はどのくらいがいいですか．
（回答）諸説別れますが，７時間から８時間がよいという説が一般的です．でもいつ寝るかも重要に思います．昼間長時間寝ても，夜寝るのに比べて効果が低いと思います．
キーワード強調装置では、以下のように強調表示する（ここでは「＜」、「＞」で強調表示）。
（質問）睡眠時間はどのくらいがいいですか．
（回答）諸説別れますが，＜７＞時間から＜８＞時間がよいという説が一般的です．でもいつ寝るかも重要に思います．昼間長時間寝ても，夜寝るのに比べて効果が低いと思います． (Explanation by specific example of FAQ)
(Question) What is the best sleeping time?
(Answer) There are various theories, but the theory that 7 to 8 hours is good is common. But I think it is important to go to bed. Even if you sleep for a long time in the daytime, it is less effective than sleeping at night.
In the keyword emphasis device, highlighting is performed as follows (in this case, highlighting is performed with “<” and “>”).
(Question) What is the best sleeping time?
(Answer) There are various theories, but the theory that <7> hours to <8> hours are good. But I think it is important to go to bed. Even if you sleep for a long time in the daytime, it is less effective than sleeping at night.

（６）：疑問詞の意味を利用して強調表示する説明(1)
図５は疑問詞の意味を利用して強調表示するフローチャートである。以下、図５の処理Ｓ３１〜Ｓ３４に従って説明する。 (6): Explanation highlighted using the meaning of interrogative words (1)
FIG. 5 is a flowchart for highlighting using the meaning of an interrogative word. Hereinafter, a description will be given according to processing S31 to S34 in FIG.

Ｓ３１：入力装置２により質問とその回答の記事のセットが与えられ、処理Ｓ３２に移る。 S31: A set of questions and their answers is given by the input device 2, and the process proceeds to S32.

Ｓ３２：抽出装置３は、質問の文から疑問詞の種類を特定し、処理Ｓ３３に移る。人名をさすか、地名をさすか、時間をさすか、など。どの疑問詞なら何の種類であるかといった所定の規則みたいなものは予め用意しておく。 S32: The extraction device 3 specifies the type of question word from the question sentence, and proceeds to processing S33. Do you name people, place names, time? A certain rule, such as which questionable type is what kind, is prepared in advance.

Ｓ３３：主要語抽出装置５は、質問の文から主要語を取り出し、処理Ｓ３４に移る。
ここで主要語は、名詞や動詞などである。ただし、あらかじめ指定した所定の単語（例えば、「もの」「こと」などのあまり意味をなさない単語) は除く。 S33: The main word extraction device 5 extracts the main word from the question sentence, and proceeds to processing S34.
The main words here are nouns and verbs. However, predetermined words specified in advance (for example, words that do not make much sense such as “things” and “things”) are excluded.

Ｓ３４：表示装置１は、回答の記事において取り出した主要語、
人名をさす疑問詞（例、「誰」）の場合は人名を
地名をさす疑問詞（例、「どこ」）の場合は地名を
時間をさす疑問詞（例、「いつ」）の場合は時間（春、夏等の季節も含む）を
それ専用の強調表示（常にバックに黄色を出すなど）する。 S34: The display device 1 reads the main word extracted in the answer article,
The name of a question name (eg, “who”) refers to the name of the person. The name of the question name (eg, “where”) refers to the time of the place name. (Including seasons such as spring, summer, etc.).

なお、ここで各単語が人名、地名、時間を指すかを判断するには、固有表現抽出の技術を利用する。 Here, in order to determine whether each word indicates a person name, place name, or time, a technique of extracting a specific expression is used.

（ＦＡＱの具体例による説明）
（質問）今年もっとも世間を騒がせた人物は誰でしょうか．
（回答）今年もいろいろとありましたが，総選挙，買収劇と，多方面に目立った人は，堀江氏でしょう．来年はどういった人物が出てくるか楽しみですね．
キーワード強調装置では、以下のように強調表示する（ここでは「＜」、「＞」で強調表示）。
（質問）今年もっとも世間を騒がせた人物は＜誰＞でしょうか．
（回答）今年もいろいろとありましたが，総選挙，買収劇と，多方面に目立った人は，＜堀江氏＞でしょう．来年はどういった人物が出てくるか楽しみですね． (Explanation by specific example of FAQ)
(Question) Who is the most disturbing person this year?
(Answer) There were various events this year, but Mr. Horie is the most prominent person in the general elections and acquisition plays. I'm looking forward to seeing who will come out next year.
In the keyword emphasis device, highlighting is performed as follows (in this case, highlighting is performed with “<” and “>”).
(Question) Who is the most disturbing person this year?
(Answer) There were various events this year, but Mr. Horie is the person who is conspicuous in various fields, such as general elections and acquisition plays. I'm looking forward to seeing who will come out next year.

（７）：疑問詞の意味を利用して強調表示する説明(2)
図６は疑問詞の意味を利用して強調表示するフローチャートである。以下、図６の処理Ｓ４１〜Ｓ４４に従って説明する。 (7): Explanation highlighted using the meaning of interrogative words (2)
FIG. 6 is a flowchart for highlighting using the meaning of a question word. Hereinafter, a description will be given according to processing S41 to S44 of FIG.

Ｓ４１：入力装置２により質問とその回答の記事のセットが与えられ、処理Ｓ４２に移る。 S41: A set of a question and an answer article is given by the input device 2, and the process proceeds to step S42.

Ｓ４２：抽出手段３は、質問の文から疑問詞の種類を特定し、処理Ｓ４３に移る。ここでは疑問詞が理由を指すもの（例えば、「なぜ」「どうして」）であるとする。どの疑問詞なら何の種類であるかといった所定の規則みたいなものは予め用意しておく。 S42: The extraction means 3 identifies the type of question word from the question sentence, and proceeds to processing S43. Here, it is assumed that the question word indicates the reason (for example, “why” “why”). A certain rule, such as which questionable type is what kind, is prepared in advance.

Ｓ４３：主要語抽出装置５は、質問の文から主要語を取り出し、処理Ｓ４４に移る。
ここで主要語は、名詞や動詞などである。ただし、あらかじめ指定した所定の単語（例えば、「もの」「こと」などのあまり意味をなさない単語) は除く。 S43: The main word extraction device 5 extracts the main word from the question sentence, and proceeds to processing S44.
The main words here are nouns and verbs. However, predetermined words specified in advance (for example, words that do not make much sense such as “things” and “things”) are excluded.

Ｓ４４：表示装置１は、回答の記事において取り出した主要語、理由を示す所定の単語「ので」「ため」「から」「だから」「理由」「原因」「このため」などを、それ専用の強調表示（常にバックに黄色を出すなど）を行う。 S44: The display device 1 uses the main words taken out in the reply article, the predetermined words “reason”, “for”, “from”, “so”, “reason”, “cause”, “for this”, etc. Emphasize the display (always put yellow on the back).

（ＦＡＱの具体例による説明）
（質問）なぜコンピュータは便利なのでしょうか．
（回答）コンピュータは計算機とも呼ばれるもので，人間に代わって様々な計算をしてくれる便利な機械です．コンピュータは，一般に演算装置と記憶装置からなります．コンピュータは，プログラムを与えると演算装置と記憶装置でそれを実行し様々な計算をします．与えるプログラムを変えると，コンピュータはそれに応じた異なった処理を実行することができます．このため，コンピュータは様々な処理をできて便利なのです． (Explanation by specific example of FAQ)
(Question) Why are computers useful?
(Answer) Computers, also called computers, are useful machines that perform various calculations on behalf of humans. A computer generally consists of a computing device and a storage device. When a computer gives a program, it executes it on a computing device and a storage device and performs various calculations. By changing the program to be given, the computer can execute different processes accordingly. For this reason, the computer can perform various processes and is convenient.

キーワード強調装置では、以下のように強調表示する（ここでは「＜」、「＞」で強調表示）。
（質問）なぜコンピュータは便利なのでしょうか．
（回答）コンピュータは計算機とも呼ばれるもので，人間に代わって様々な計算をしてくれる便利な機械です．コンピュータは，一般に演算装置と記憶装置からなります．コンピュータは，プログラムを与えると演算装置と記憶装置でそれを実行し様々な計算をします．与えるプログラムを変えると，コンピュータはそれに応じた異なった処理を実行することができます．＜このため＞，コンピュータは様々な処理をできて便利なのです．
このように、強調表示した「このため」の前方に理由が書いてあることがすぐにわかり，便利である。 In the keyword emphasis device, highlighting is performed as follows (in this case, highlighting is performed with “<” and “>”).
(Question) Why are computers useful?
(Answer) Computers, also called computers, are useful machines that perform various calculations on behalf of humans. A computer generally consists of a computing device and a storage device. When a computer gives a program, it executes it on a computing device and a storage device and performs various calculations. By changing the program to be given, the computer can execute different processes accordingly. <For this reason> Computers are useful because they can perform various processes.
In this way, it is easy to immediately see that the reason is written in front of the highlighted “for this purpose”.

なお、キーワード強調装置を使用するユーザにおいて、前記の強調表示は行わない設定も可能である。 It should be noted that the user who uses the keyword emphasizing apparatus can be set not to perform the emphasis display.

また、ここで各単語が人名、地名、時間を指すかを判断するには、固有表現抽出の技術を利用する。 In addition, in order to determine whether each word indicates a person name, place name, or time, a technique for extracting a specific expression is used.

（８）：固有表現抽出の説明
固有表現とは、人名、地名、組織名などの固有名詞、金額などの数値表現といった、特定の事物・数量を意味する言語表現のことである。固有表現抽出とは、そういった固有表現を文章中から計算機で自動で抽出する技術である。例えば、「日本の首相は小泉純一郎である」という文に対して固有表現抽出を行なうと、固有表現の「日本」と「小泉純一郎」が地名、人名として、抽出されるものである。 (8): Explanation of Specific Expression Extraction The specific expression is a linguistic expression that means a specific thing / quantity such as a proper noun such as a person name, a place name, or an organization name, or a numerical expression such as a monetary amount. Specific entity extraction is a technology that automatically extracts such specific expressions from a sentence using a computer. For example, if a specific expression is extracted for a sentence “The Japanese Prime Minister is Junichiro Koizumi”, the specific expressions “Japan” and “Junichiro Koizumi” are extracted as place names and personal names.

ａ、形態素解析を用いる場合の説明
固有表現を抽出するには、前に説明した形態素解析システム ChaSen を用いることができる。例えば、「日本の首都は東京です」を形態素解析システム ChaSen に入力すると、出力として、次のものが得られる。 a, Explanation in the case of using morpheme analysis To extract a specific expression, the morpheme analysis system ChaSen described above can be used. For example, if “Tokyo is the capital of Japan” is input to ChaSen, the following output is obtained.

出力
日本ニッポン日本名詞−固有名詞−地域−国
のノの助詞−連体化
首都シュト首都名詞−一般
はハは助詞−係助詞
東京トウキョウ東京名詞−固有名詞−地域−一般
ですデスです助動詞特殊・デス基本形
EOS
これだと名詞−固有名詞−地域という品詞が出力されるので、このシステムを使って地名の固有表現を取り出すことができる。 Output Japan Japan Japan Noun-proprietary noun-region-country noun-association capital capital capital noun-general is ha particle-subject tokyo tokyo noun-proprietary noun-region-general Basic form
EOS
In this case, the part-of-speech of noun-proprietary noun-region is output, and this system can be used to extract the proper expression of the place name.

また、例えば、前記システムに「村山首相が言った」を入力すると、出力として、次のものが得られる。 For example, if “Mr. Murayama said” is input to the system, the following is obtained as an output.

出力
村山ムラヤマ村山名詞−固有名詞−人名−姓
首相シュショウ首相名詞−一般
がガが助詞−格助詞−一般
言っイッ言う動詞−自立五段・ワ行促音便連用タ接続
たタた助動詞特殊・タ基本形
EOS
これだと名詞−固有名詞−人名という品詞が出力される。このシステムを使って人名の固有表現を取り出すことができる。 Output Murayama Murayama Murayama Noun-Proper noun-Personal name-Surname Prime Minister Shosho Prime Noun-General ga is a particle-case particle-general Say Veri
EOS
If this is the case, the part of speech of noun-proper noun-personal name will be output. Using this system, it is possible to retrieve a specific name of a person.

ｂ、人手でルールを作る場合の説明
形態素解析を用いる場合の他に、人手でルールを作って固有表現を取り出すという方法もある。 b. Explanation of manual rule creation In addition to the case of using morphological analysis, there is also a method of manually creating a rule to extract a specific expression.

例えば、人手でルールを作っておくことで、抽出手段（装置）では、次のルールで固有表現（人名、地名等）を取り出すことができる。
名詞＋「さん」だと人名とする
名詞＋「首相」だと人名とする
名詞＋「町」だと地名とする
名詞＋「市」だと地名とする For example, by creating a rule manually, the extraction means (device) can extract a specific expression (person name, place name, etc.) according to the following rule.
Noun + “san” is the name of the person Noun + “prime” is the name of the person Noun + “town” is the name of the place Noun + “city” is the name of the place

ｃ、機械学習を用いる場合の説明
（ユーザ依存型固有表現抽出表示システムの説明）
一部のコーパス（言語資源、例えば、新聞の電子データ）で固有表現をユーザがタグづけし、他のデータでそれら固有表現を自動抽出する技術である。 c, Explanation when using machine learning (Explanation of user-dependent specific expression extraction and display system)
This is a technique in which a user tags specific expressions with some corpus (language resources, for example, electronic data of newspapers), and automatically extracts the specific expressions with other data.

固有表現の抽出には、学習結果を利用して、入力データの所定の単位のデータについてその素性の場合になりやすい分類先を推定するものである。 In the extraction of the specific expression, a learning destination is used to estimate a classification destination that is likely to be the case for the data of a predetermined unit of the input data.

例えば、固有表現の抽出に、サポートベクトルマシン法を用いる場合には、機械学習手段では、教師データから解となりうる分類先を特定し、その分類先を正例と負例に分割し、所定のカーネル関数を用いたサポートベクトルマシン法を実行する関数にしたがって素性の集合を次元とする空間上で正例と負例の間隔を最大にして正例と負例を超平面で分割する超平面を求め、その超平面を学習結果とし、その超平面を学習結果記憶手段に記憶する。そして、この学習結果記憶手段に記憶されている学習結果の超平面を利用して、入力データの素性の集合がこの超平面で分割された空間において正例側か負例側のどちらにあるかを特定し、その特定された結果に基づいて定まる分類先を、入力データの素性の集合の場合になりやすい分類先と推定する。 For example, when the support vector machine method is used for extracting a proper expression, the machine learning means specifies a classification destination that can be a solution from the teacher data, divides the classification destination into a positive example and a negative example, A hyperplane that divides the positive and negative examples into hyperplanes by maximizing the interval between the positive and negative examples on a space whose dimension is a set of features according to a function that executes a support vector machine method using a kernel function The hyperplane is obtained as a learning result, and the hyperplane is stored in the learning result storage means. Then, using the learning result hyperplane stored in the learning result storage means, whether the set of features of the input data is on the positive example side or the negative example side in the space divided by this hyperplane And a classification destination determined based on the identified result is estimated as a classification destination that is likely to be a set of input data features.

固有表現抽出処理とは、テキストデータから地名、人名、組織名、数値表現などの固有な表現を抽出する処理をいう。固有表現抽出処理において解析結果となる分類先は、例えば地名、人名、組織名、日付表現、時間表現、金額表現、割合表現などである。教師データには、これらの分類先それぞれに対応する分類ラベルが付与される。 The unique expression extraction process is a process of extracting unique expressions such as place names, person names, organization names, and numerical expressions from text data. The classification destination that is the analysis result in the specific expression extraction processing is, for example, a place name, a person name, an organization name, a date expression, a time expression, a monetary expression, and a ratio expression. The teacher data is assigned a classification label corresponding to each of these classification destinations.

教師データ作成のためのタグ登録手段は、ユーザが、入力装置を介して、以下のような固有表現抽出処理の分類先とそれに対応する分類タグを指定すると、ユーザが指定した分類先およびその分類タグ（開始タグと終了タグ）を入力してタグ記憶手段に記憶する。 The tag registration means for creating teacher data, when the user designates the classification destination of the following specific expression extraction process and the corresponding classification tag via the input device, the classification destination specified by the user and the classification Tags (start tag and end tag) are input and stored in the tag storage means.

＜PERSON＞＜/PERSON ＞：分類先＝人名、
＜LOCATION＞＜/LOCATION ＞：分類先＝地名、
＜ORGANIZATION＞＜/ORGANIZATION ＞：分類先＝組織名、
＜ARTIFACT＞＜/ARTIFACT ＞：分類先＝固有物名、
＜DATE＞＜/DATE ＞：分類先＝日付表現、
＜TIME＞＜/TIME ＞：分類先＝時間表現、
＜MONEY ＞＜/MONEY＞：分類先＝金額表現、
＜PERCENT ＞＜/PERCENT＞：分類先＝割合表現、…。 <PERSON></PERSON>: Classification destination = person name,
<LOCATION></LOCATION>: Classification destination = place name,
<ORGANIZATION></ORGANIZATION>: Classification destination = organization name
<ARTIFACT></ARTIFACT>: Destination = unique name,
<DATE></DATE>: Classification destination = date expression,
<TIME></TIME>: Classification destination = time expression
<MONEY></MONEY>: Classification destination = amount expression,
<PERCENT></PERCENT>: Classification destination = ratio expression.

本例では、付与する分類ラベルを文字単位に付与した教師データを作成する。例えば、＜PERSON＞＜/PERSON ＞分類タグが対応する分類先「人名」の分類ラベルは、先頭文字を示す「B-」または先頭以外の文字を示す「I-」を付けて、「B-PERSON」、「I-PERSON」とする。また、分類先に該当しない文字に付与するラベルとして、「OTHER 」を登録する。 In this example, teacher data is created by assigning classification labels to be assigned to each character. For example, the <PERSON> </ PERSON> classification tag corresponding to the classification label “person name” has “B-” indicating the first character or “I-” indicating a character other than the first character. “PERSON” and “I-PERSON”. Also, “OTHER” is registered as a label to be added to characters that do not fall under the classification destination.

また、固有表現抽出処理の分類先として字種を用いる場合には、以下のような分類先および分類タグをタグ記憶手段に格納する。 Further, when a character type is used as a classification destination in the unique expression extraction process, the following classification destination and classification tag are stored in the tag storage unit.

＜KANJI ＞＜/KANJI＞：分類先＝漢字、
＜KATAKANA＞＜/KATAKANA ＞：分類先＝カタカナ、
＜ALPHABETIC＞＜/ALPHABETIC ＞：分類先＝英字、
＜NUMERIC ＞＜/NUMERIC＞：分類先＝数字。 <KANJI></KANJI>: Classification destination = Kanji,
<KATAKANA></KATAKANA>: Classification destination = Katakana
<ALPHABETIC></ALPHABETIC>: Classification destination = English characters
<NUMERIC></NUMERIC>: Classification destination = number.

そして、コーパス入力手段が、固有表現抽出処理の分類先が付与されていないテキストデータで構成されるコーパスを入力すると、タグ付与手段は、コーパスのテキストデータを表示しユーザにタグ付与操作を促すタグ付与画面を表示装置に表示する。 Then, when the corpus input means inputs a corpus composed of text data to which the classification destination for the specific expression extraction processing is not assigned, the tag assignment means displays the corpus text data and prompts the user for a tagging operation. The grant screen is displayed on the display device.

ユーザによって、分類先を付与したい箇所および付与する分類先が指定されたら、タグ付与手段は、タグ付与画面で指定された箇所に対応する文字列の前後に選択された分類タグを挿入する。 When the user wants to assign the classification destination and the classification destination to be assigned, the tag assignment unit inserts the selected classification tags before and after the character string corresponding to the place designated on the tag assignment screen.

例えば、入力されたコーパスに、テキストデータ「…日本の首相は小泉さんです。小泉さんはいつも思いきったことをしています。…」が含まれていたとする。ユーザが、タグ付与画面の指定項目に表示されたテキストデータ上で、マウスドラッグ操作などにより、分類先を付与する単語「日本」を指定する。さらにマウスの右ボタンクリック操作を行って表示させた選択項目から、マウス左ボタンクリック操作などにより分類先「地名」を選択する。同様に、指定項目で単語「小泉」を指定し、選択項目から分類先「人名」を選択する。 For example, it is assumed that the input corpus includes text data “… The prime minister of Japan is Mr. Koizumi. Mr. Koizumi always does what he thought.” The user designates the word “Japan” to which the classification destination is to be given by a mouse drag operation or the like on the text data displayed in the designated item on the tag assignment screen. Further, from the selection items displayed by clicking the right button of the mouse, the classification destination “place name” is selected by clicking the left button of the mouse. Similarly, the word “Koizumi” is designated in the designated item, and the classification destination “person name” is selected from the selection item.

タグ付与手段は、タグ付与画面で指定された箇所に対応するテキストデータ中の文字列の前後に、選択された分類タグを挿入する。分類タグが付与されたテキストデータは以下のようになる。
「…＜LOCATION＞日本＜/LOCATION ＞の首相は＜PERSON＞小泉＜/PERSON ＞さんです。小泉さんはいつも思いきったことをしています。…」
さらに、ユーザによって、指定項目で分類先を付与する作業を行い教師データとして使用する範囲が指定されると、タグ付与手段は、タグ付与画面で指定された範囲に対応するテキストデータの文字列の前後に範囲指定タグの開始タグおよび終了タグを付加する。例えば、ユーザが、マウスドラッグにより文「日本の首相は小泉さんです。」を範囲として指定したとする。タグ付与手段は、指定された範囲に対応するテキストデータの文字列の前後に範囲指定タグを挿入する。範囲指定タグが付与されたテキストデータは以下のようになる。
「…＜UC＞＜LOCATION＞日本＜/LOCATION ＞の首相は＜PERSON＞小泉＜/PERSON ＞さんです。＜/UC ＞小泉さんはいつも思いきったことをしています。…」
一方、ユーザが、分類先を付与した後、教師データとして使用する範囲を指定しなかった場合には、タグ付与手段は、指定項目で分類先が付与された箇所を含む所定の箇所をユーザが選択した範囲とみなし、その範囲の前後に範囲指定タグを付加する。例えば、タグ付与手段は、テキストデータ中の分類タグが付与された文字列に単語の前後に連なる所定の文字数や単語数などの範囲を、ユーザが選択した範囲とみなし、みなした範囲の前後に範囲指定タグを付加する。 The tag assigning means inserts the selected classification tag before and after the character string in the text data corresponding to the location specified on the tag assignment screen. The text data to which the classification tag is assigned is as follows.
“… <LOCATION> Japan </ LOCATION>'s prime minister is <PERSON> Koizumi </ PERSON>. Mr. Koizumi always does what he thought ....
Further, when the user assigns the classification destination by the designated item and the range to be used as the teacher data is designated, the tag assigning means reads the character string of the text data corresponding to the range designated on the tag assignment screen. Add start tag and end tag of range specification tag before and after. For example, assume that the user designates the sentence “The Japanese Prime Minister is Mr. Koizumi” as a range by dragging the mouse. The tag assigning means inserts a range designation tag before and after the character string of the text data corresponding to the designated range. The text data to which the range specification tag is attached is as follows.
“... <UC><LOCATION> Japan </ LOCATION>'s prime minister is <PERSON> Koizumi </ PERSON>. </ UC>
On the other hand, when the user does not specify the range to be used as the teacher data after assigning the classification destination, the tag assignment means allows the user to select a predetermined location including the location where the classification destination is assigned in the designated item. It is regarded as the selected range, and a range specification tag is added before and after the range. For example, the tag assigning means regards a range such as a predetermined number of characters and the number of words consecutively before and after a word in a character string to which a classification tag in text data is assigned as a range selected by the user, and before and after the considered range. Add a range specification tag.

そして、タグ付与手段は、テキストデータに分類タグおよび範囲指定タグを付加したテキストデータ（タグ付きコーパス）をコーパス記憶手段に記憶する。 Then, the tag assigning means stores the text data (tagged corpus) obtained by adding the classification tag and the range designation tag to the text data in the corpus storage means.

その後、ユーザ範囲抽出手段は、コーパス記憶手段のタグ付きコーパスから、範囲指定タグの開始タグ＜UC＞と終了タグ＜/UC ＞とに囲まれた範囲のテキストデータ（ユーザ範囲データ）を抽出する。なお、ここではユーザがUCのタグを付ける説明をしたが、システム作成者がこのタグを付与することもでき、また、UCのタグを付けずに全データを教師データとして使用することも可能である。 Thereafter, the user range extraction unit extracts text data (user range data) in a range surrounded by the start tag <UC> and end tag </ UC> of the range designation tag from the tagged corpus of the corpus storage unit. . Note that the user explained here that the UC tag is attached, but the system creator can also attach this tag, and it is also possible to use all data as teacher data without attaching the UC tag. is there.

そして、教師データ変換手段は、抽出されたテキストデータを所定の単位（ここでは文字単位とする）に分割し、抽出されたテキストデータから分類タグに囲まれた文字列を検出し、各単位（文字）のうち分類タグが付与されている文字に分類タグに対応する分類ラベルを付与し、分類タグが付与されていない文字に分類先がないことを示す分類ラベルを付与して、教師データとする。 Then, the teacher data conversion means divides the extracted text data into predetermined units (here, character units), detects a character string surrounded by the classification tag from the extracted text data, and each unit ( Character) is assigned a classification label corresponding to the classification tag, and a character that is not assigned a classification tag is assigned a classification label indicating that there is no classification destination. To do.

例えば、教師データとして、範囲指定タグに囲まれたテキストデータ「＜UC＞＜LOCATION＞日本＜/LOCATION ＞の首相は＜PERSON＞小泉＜/PERSON ＞さんです。＜/UC ＞」が抽出されたとする。教師データ変換手段は、例えば、テキストデータの分類タグ＜PERSON＞と＜/PERSON ＞に囲まれた文字列「小、泉」の先頭文字「小」に、分類先「人名」の先頭を示す分類ラベル「B-PERSON」を、同じく次の文字「泉」に分類先「人名」の先頭以外を示す分類ラベル「I-PERSON」を付与する。また、テキストデータのうち分類タグに囲まれていない部分「の、首、相、は、さ、ん、で、す、。」について、各文字にユーザが指定した分類先に該当しない旨を示す分類ラベル「０」を付与する。 For example, the text data “<UC> <LOCATION> Japan </ LOCATION> 's prime minister is <PERSON> Koizumi </ PERSON>. </ UC>” is extracted as teacher data. To do. The teacher data conversion means, for example, classifies the first character “small” in the character string “small, fountain” surrounded by the text data classification tags <PERSON> and </ PERSON>, indicating the beginning of the classification destination “person name”. Similarly, the label “B-PERSON” is given the classification label “I-PERSON” indicating the other than the head of the classification destination “person name” to the next character “Izumi”. In addition, a portion of the text data that is not surrounded by the classification tag “No, Neck, Phase, Sasan, N, D, S, etc.” indicates that each character does not correspond to the classification destination designated by the user. A classification label “0” is assigned.

そして、素性抽出手段により、教師データに対して形態素解析処理を行い、所定の単位（例えば文字）ごとの素性を抽出し、素性の集合と分類ラベルとの組を生成する。 Then, the feature extraction means performs morpheme analysis processing on the teacher data, extracts features for each predetermined unit (for example, character), and generates a set of feature sets and classification labels.

素性として、例えば、品詞情報（名詞、固有名詞、人名、姓、などの分類）、形態素における文字の位置情報（先頭、それ以外などの分類）、字種情報（漢字、カタカナ、英字、数字などの分類）、分類先などが抽出される。 Features include, for example, parts of speech information (classification of nouns, proper nouns, personal names, surnames, etc.), character position information in morphemes (classification of the first, other, etc.), character type information (kanji, katakana, English letters, numbers, etc.) Classification), classification destination, and the like are extracted.

言語解析処理は、機械学習手段では、素性の集合と分類ラベルの組を利用して、各単位（文字）について、その素性の集合の場合にどのような分類先になりやすいかを学習し、学習結果を学習結果記憶手段に記憶する。 In the language analysis process, the machine learning means uses a set of feature sets and classification labels to learn for each unit (character) what kind of classification destination is likely to be in the case of that feature set, The learning result is stored in the learning result storage means.

機械学習手段は、例えば、各文字の素性と分類ラベルとの組において、文字「小」についての学習には、素性の集合を用いて行う。 The machine learning means, for example, uses a set of features to learn about the character “small” in the combination of the feature of each character and the classification label.

ここで、機械学習法としては、多分類に対応できる拡張したサポートベクトルマシン法を用いる。 Here, as the machine learning method, an extended support vector machine method capable of dealing with multiple classifications is used.

サポートベクトルマシン法は、空間を超平面で分割することにより２つの分類からなるデータを分類する手法である。このとき、２つの分類が正例と負例からなるものとすると、学習データにおける正例と負例の間隔（マージン）が大きいものほど、オープンデータで誤った分類をする可能性が低いと考えられ、このマージンを最大にする超平面を求め、求めた超平面を用いて分類を行う。 The support vector machine method is a method of classifying data composed of two classifications by dividing a space by a hyperplane. At this time, if the two classifications consist of a positive example and a negative example, the larger the interval (margin) between the positive example and the negative example in the learning data, the lower the possibility of incorrect classification with open data. The hyperplane that maximizes the margin is obtained, and classification is performed using the obtained hyperplane.

サポートベクトルマシン法の最大マージンは、ある空間で求める分離超平面と、分類超平面に平行かつ等距離にある超平面の距離（マージン）が最大になるような分離超平面を求める。 The maximum margin of the support vector machine method is to determine a separation hyperplane that maximizes the distance (margin) between the separation hyperplane obtained in a certain space and the hyperplane parallel to and equidistant from the classification hyperplane.

サポートベクトルマシン法では、通常、学習データにおいて、マージンの内部領域に小量の事例が含まれてもよいとする手法の拡張や、超平面の線形の部分を非線形にする拡張（カーネル関数の導入）がなされたものが用いられる。このような拡張された方法は、識別関数を用いて分類することと等価であり、その識別関数の出力値が正か負かによって、２つの分類を判別することができる。 In the support vector machine method, in general, in the training data, an extension of the method that a small amount of cases may be included in the inner area of the margin, or an extension that makes the linear part of the hyperplane nonlinear (introduction of kernel function ) Is used. Such an extended method is equivalent to classification using a discriminant function, and two classes can be discriminated depending on whether the output value of the discriminant function is positive or negative.

なお、サポートベクトルマシンは、正例・負例の二値分類であるため、ワン・バーサス・レスト（One v.s. Rest ）法、ペア・ワイズ(Pair Wise )法などの手法を用いて二値分類を多値分類に拡張する。 In addition, since support vector machines are binary classification of positive examples and negative examples, binary classification is performed using techniques such as the One vs Rest method and the Pair Wise method. Extend to multi-value classification.

ワン・バーサス・レスト（One v.s. Rest ）法では、例えば３つの分類先ａ、ｂ、ｃがある場合に、「ａとその他」、「ｂとその他」、「ｃとその他」という３つの組の二値分類器（ある分類先か、それ以外の分類先か）を用意し、それぞれをサポートベクトルマシンで学習する。そして、解である分類先を推定する場合には、３つのサポートベクトルマシンの学習結果を利用する。推定するべき入力データが、これらの３つのサポートベクトルマシンでは、どのように推定されるかをみて、３つのサポートベクトルマシンのうち、その他でない側（正例）に分類されかつサポートベクトルマシンの分離平面から最も離れた場合のものの分類先を、求める解とする。 In the One vs Rest method, for example, when there are three classification destinations a, b, and c, there are three sets of “a and other”, “b and other”, and “c and other”. Prepare a binary classifier (whether it is a certain classification destination or another classification destination), and learn each with a support vector machine. And when estimating the classification destination which is a solution, the learning result of three support vector machines is utilized. See how the input data to be estimated is estimated in these three support vector machines, and are classified into the non-other side (positive example) of the three support vector machines, and the separation of the support vector machines The solution to be obtained is the classification destination of the one farthest from the plane.

ペア・ワイズ(Pair Wise )法では、ｋ個の分類先から任意の２つの分類先についての二値分類器を_kＣ₂個用意して、分類先同士の総当たり戦を行い、このうち最も分類先として選ばれた回数が多い分類先を求める解とする。 The pair-wise (Pair Wise) method, and binary classifier _k C ₂ pieces prepared for any two grouping destination of k grouping destination performs Round Robin of grouping destinations among the most among the A solution for obtaining a classification destination having a large number of times selected as a classification destination.

機械学習の学習終了後、データ入力手段では、言語解析の対象のテキストデータを入力する。素性抽出手段では、教師データ作成処理と同様に、入力されたテキストデータ（入力データ）に対して形態素解析を行い、所定の単位（例えば文字）ごとの素性を抽出する。 After the completion of the machine learning, the data input unit inputs text data to be analyzed. The feature extraction means performs morpheme analysis on the input text data (input data) and extracts features for each predetermined unit (for example, character) as in the teacher data creation process.

そして、解推定手段では、学習結果記憶手段に記憶された学習結果を利用して、入力データの所定の単位（文字）について、その素性の場合になりやすい分類ラベルを推定する。 Then, the solution estimation means estimates a classification label that is likely to be the case for a predetermined unit (character) of the input data, using the learning result stored in the learning result storage means.

そして、タグ付与手段は、解と推定された分類ラベルに対応する分類タグを、入力データの該当する文字または文字列の前後に挿入する。 And a tag provision means inserts the classification tag corresponding to the classification label estimated to be the solution before and after the corresponding character or character string of the input data.

解析結果表示処理手段では、分類タグが付加された入力データを、所定の表示規則に従った表示態様で表示装置に表示する。ここで、分類タグ＜PERSON＞＜/PERSON ＞で囲まれた文字列及び＜LOCATION＞＜/LOCATION ＞で囲まれた文字列を、特定の固有表現として抽出することができる。 The analysis result display processing means displays the input data to which the classification tag is added on the display device in a display mode according to a predetermined display rule. Here, a character string surrounded by the classification tags <PERSON> </ PERSON> and a character string surrounded by <LOCATION> </ LOCATION> can be extracted as specific specific expressions.

（９）：プログラムインストールの説明
表示装置（表示手段）１、入力装置（入力手段）２、抽出手段（抽出装置）３、疑問詞後接語抽出装置（疑問詞後接語抽出手段）４、主要語抽出装置（主要語抽出手段）５等は、プログラムで構成でき、主制御部（ＣＰＵ）が実行するものであり、主記憶に格納されているものである。このプログラムは、一般的な、コンピュータで処理されるものである。このコンピュータは、主制御部、主記憶、ファイル装置、表示装置、キーボード等の入力手段である入力装置などのハードウェアで構成されている。 (9): Description of program installation Display device (display means) 1, input device (input means) 2, extraction means (extraction device) 3, interrogative postfix excerpt extraction device (interrogative postfix postfix extraction means) 4, The main word extraction device (main word extraction means) 5 and the like can be configured by a program, executed by a main control unit (CPU), and stored in the main memory. This program is generally processed by a computer. This computer is composed of hardware such as an input device as input means such as a main control unit, main memory, file device, display device, and keyboard.

このコンピュータに、本発明のプログラムをインストールする。このインストールは、フロッピィ、光磁気ディスク等の可搬型の記録（記憶）媒体に、これらのプログラムを記憶させておき、コンピュータが備えている記録媒体に対して、アクセスするためのドライブ装置を介して、或いは、ＬＡＮ等のネットワークを介して、コンピュータに設けられたファイル装置にインストールされる。そして、このファイル装置から処理に必要なプログラムステップを主記憶に読み出し、主制御部が実行するものである。 The program of the present invention is installed on this computer. In this installation, these programs are stored in a portable recording (storage) medium such as a floppy disk or a magneto-optical disk, and a drive device for accessing the recording medium provided in the computer is used. Alternatively, it is installed in a file device provided in the computer via a network such as a LAN. Then, the program steps necessary for processing are read from the file device into the main memory and executed by the main control unit.

本発明のキーワード強調装置の説明図である。It is explanatory drawing of the keyword emphasis apparatus of this invention. 本発明の疑問詞の後ろに付く単語を強調表示するフローチャートである。It is a flowchart which highlights and displays the word attached after the interrogative word of this invention. 本発明の疑問詞の後ろに付く単語を強調表示するフローチャートである。It is a flowchart which highlights and displays the word attached after the interrogative word of this invention. 本発明の数量表現を指す疑問詞を利用して強調表示するフローチャートである。It is a flowchart which highlights using the question word which points to the quantity expression of this invention. 本発明の疑問詞の意味を利用して強調表示するフローチャートである。It is a flowchart which highlights using the meaning of the question word of this invention. 本発明の疑問詞の意味を利用して強調表示するフローチャートである。It is a flowchart which highlights using the meaning of the question word of this invention.

Explanation of symbols

１表示装置（表示手段）
２入力装置（入力手段）
３抽出手段（抽出装置）
４疑問詞後接語抽出装置（疑問詞後接語抽出手段）
５主要語抽出装置（主要語抽出手段） 1. Display device (display means)
2 Input device (input means)
3 Extraction means (extraction device)
4 Interrogative postfix extraction device (interpreter postfix extraction means)
5. Main word extraction device (main word extraction means)

Claims

An input means for entering a set of questions and their answer articles;
An interrogative postfix extraction means for extracting a noun that follows the interrogative from the sentence of the question or a suffix that follows the interrogative;
A keyword emphasizing apparatus comprising: a display unit that highlights a noun or suffix that is suffixed to the question word extracted in the answer article.

An input means for entering a set of questions and their answer articles;
A predetermined noun that can be combined with a number after the question sentence from the sentence of the question, or an interrogative postfix extraction means that extracts a predetermined suffix that can be combined with the number after the question word;
A keyword emphasizing apparatus comprising: a display means for emphasizing at least one of a number and the extracted predetermined noun or predetermined suffix in the answer article.

An input means for entering a set of questions and their answer articles;
Extraction means for confirming that there is a question word indicating a predetermined quantity expression from the sentence of the question;
The question sentence confirmed by the extraction means includes a questioning unit that points to the predetermined quantity expression, and the answer to the question is a number, thereby including display means for highlighting the number in the article of the answer. Keyword emphasis device characterized by.

An input means for entering a set of questions and their answer articles;
From the question sentence, extraction means for specifying the type of question word specified in advance, which indicates a person name, a place name, or a time;
A keyword emphasizing apparatus comprising: display means for extracting and highlighting a specific expression of a person name, place name, and time corresponding to the type of question word in the answer article.

An input means for entering a set of questions and their answer articles;
An interrogative postfix extraction means for extracting a noun that follows the interrogative from the sentence of the question or a suffix that follows the interrogative;
As a display means for highlighting the noun or suffix that followed the question word extracted in the answer article,
A program that allows a computer to function.