JPH0498461A

JPH0498461A - Keyword extracting device

Info

Publication number: JPH0498461A
Application number: JP2213491A
Authority: JP
Inventors: Fuirisu Anuiru; アンウィル・フィリス
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-08-10
Filing date: 1990-08-10
Publication date: 1992-03-31

Abstract

PURPOSE:To improve working efficiency in retrieving and the retrieving performance of a device by calculating significance as a keyword based on the number of times of the appearance of a keyword candidate and the constituting word of it and extracting the keyword from among the candidates based on the result of it. CONSTITUTION:A keyword candidate is prepared by a keyword candidate prepar ing means 3 based on a word extracted by a word extracting means 2 and the storage contents of a word information storage means 1 to correspond to the word and siginificance as the keyword of the keyword candidate is calcu lated by a siginificance calculating means 4 based on the number of times of the appearance of the keyword candidate and the constituting word of it. Since the keyword is extracted by a keyword extracting means 5 from among the keyword candidates based on a calculated result, the extracted keyword is not limited to the specified one. Thus, working efficiency in retirieving and the retrieving performance of a device can be improved.

Description

【発明の詳細な説明】産業上の利用分野本発明は、データベースシステム、ワードプロセッサ等
の電子機器において、文書情報の検索時に用いられる可
能性のある語句をキーワードとして抽出するキーワード
抽出装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a keyword extraction device for extracting words and phrases that may be used when searching document information as keywords in electronic devices such as database systems and word processors.

従来の技術従来、データベースシステム、ワードプロセッサ等の電
子機器において、既に作成された大量の情報から希望す
る情報の検索を行う場合、予め個々の情報に対してキー
ワードを付与して蓄積しておき、その検索時にキーワー
ドを含む条件式を入力し、この検索条件に基づき、それ
にマツチするキーワードを有する情報を検索して出力す
るというのが一般的である。Conventional technology Conventionally, when searching for desired information from a large amount of information that has already been created in electronic devices such as database systems and word processors, keywords are assigned to each piece of information in advance and stored. Generally, a conditional expression including a keyword is input at the time of a search, and based on this search condition, information having a keyword matching the search condition is searched and output.

このような電子機器の場合、一般に、情報の登録時に使
用者がキーワードの登録を行わなければならない。この
登録作業は非常に面倒であるため、一つの情報に対して
その検索を容易とするために必要十分な数のキーワード
を付与することは運用上不可能なことである。In the case of such electronic devices, the user generally has to register keywords when registering information. Since this registration work is extremely troublesome, it is operationally impossible to assign a sufficient number of keywords to one piece of information in order to facilitate its search.

また、一つの情報に最低一つのキーワードを登録しなけ
ればならないため、大量（多種類）の情報を短時間に登
録することが出来ない。Furthermore, since at least one keyword must be registered for each piece of information, a large amount (many types) of information cannot be registered in a short period of time.

一方、人の手を介さずキーワードの抽出を行なうものと
して、キーワードとして用いる単語を予め記憶手段に設
定して単語リストを形成しておき、対象文書中の単語で
前記単語リストに設定されているものをキーワードとし
て自動的に抽出するキーワード抽出手段がある。On the other hand, in order to extract keywords without human intervention, words to be used as keywords are set in advance in a storage means to form a word list, and the words in the target document are set in the word list. There is a keyword extraction method that automatically extracts things as keywords.

このようなキーワード抽出手段の場合、単語リストに設
定された非常に制限された単語の中からキーワードが抽
出されるので、その単語リスト中の単語で表現できる制
限、すなわち、分野別の制限、または、複合語の制限が
ある。さらに、新しい文書を登録する場合には、その文
書に対応するキーワードが用意されているか否かが問題
となる。In the case of such a keyword extraction method, keywords are extracted from among the very limited words set in the word list, so there are no restrictions on what can be expressed with the words in the word list, i.e. field-specific restrictions, or , there are restrictions on compound words. Furthermore, when registering a new document, it becomes a problem whether or not a keyword corresponding to the document is prepared.

このような問題を解決したものとして、上述のような単
語リストを用いることなく自動的にキーワードの抽出を
行なうシステムがある。このようなシステムの代表とし
て、ＳＭＡＲＴと呼ばれている検索システムがある。As a solution to this problem, there is a system that automatically extracts keywords without using a word list as described above. A typical example of such a system is a search system called SMART.

この検索システムは、単語を認識する形態素解析手段と
、不用語リストと、この不用語リストによる機能的な高
頻度の単語を取り消す手段と、活用、接頭辞、接尾辞等
を取り消して単語の原形を残す手段と、−件の文書中に
おける原形の出現回数から重要度を算出してｄｏｃｕｍ
ｅｎｔ　ｖｅｃｔｏｒを作成する手段と、重要度の高い
原形を組み合わせて複合語のキーワードを作成する手段
等とからなるキーワード抽出手段を有するものである。This search system includes a morphological analysis means for recognizing words, a list of unused words, a means for functionally canceling high-frequency words from this unused list, and a means for canceling conjugations, prefixes, suffixes, etc. to form the original word. and docum by calculating the importance from the number of occurrences of the original form in - documents.
The keyword extracting means includes a means for creating an entry vector, a means for creating a keyword of a compound word by combining original forms of high importance, and the like.

発明が解決しようとする課題このようなキーワード抽出手段においては、自然な複合
語であっても、不用語を含むもの（例：ｒｅｓｅａｒｃ
ｈ　ａｎｄ　ｄｅｖｅｌｏｐｍｅｎｔ）はキーワードと
して作成することが出来ない。Problems to be Solved by the Invention In this type of keyword extraction means, even if it is a natural compound word, it is difficult to extract words that include unused words (e.g. research
h and development) cannot be created as a keyword.

一方、英語における活用語には不規則的な単語が多く、
接尾辞等を取り消して原形に戻す手段を用いた場合であ
ってもその対象となる単語を完全に原形に戻すことは出
来ない。On the other hand, many conjugated words in English are irregular.
Even if a method is used to return a word to its original form by canceling a suffix or the like, it is not possible to completely return the target word to its original form.

また、原形に戻さない方が良い場合もある。なぜなら、
原形しか残さなかった場合には、品詞の情報がなくなる
ため、修飾や語順の関係を見ることが出来なくなるから
である（例：　Ｊａｐａｎｅｓｅｂｏｏｋ　ｃｆ、ｂｏ
ｏｋｓ　ｏｎ　Ｊａｐａｎｅｓｅ）。In addition, there are cases where it is better not to return it to its original form. because,
This is because if only the original form is left, there will be no information on parts of speech, and it will be impossible to see the relationships between modifications and word order (e.g., Japanese book cf, BO).
oks on Japanese).

さらに、原形の組み合わせにより複合語のキーワードを
作成しても、検索を行なう場合には、オペレータは原形
に基づく複合語をキーワードとして用いず、もっと自然
な表現で検索条件としてのキーワードを設定する場合が
多いと予想される。Furthermore, even if keywords for compound words are created by combining the original forms, when performing a search, the operator does not use compound words based on the original forms as keywords, but instead sets keywords as search conditions using more natural expressions. It is expected that there will be many.

したがって、このようにキーワードを自然な表現で設定
して検索を行なった場合には、目的とする文書を検索す
ることが出来ない場合がある。Therefore, when a keyword is set in a natural expression and a search is performed in this way, it may not be possible to search for a target document.

課題を解決するための手段予め定められた多数の単語とその各々の品詞とを含む単
語情報を予め記憶した単語情報記憶手段と、英文情報か
ら単語を抽出する単語抽出手段と、この単語抽出手段に
より抽出された単語とこの単語に対応する前記単語情報
記憶手段の記憶内容とに基づきキーワード候補を作成す
るキーワード候補作成手段と、二のキーワード候補作成
手段により作成されたキーワード候補とその構成単語と
の８現回数に基づき前記キーワード候補のキーワードと
しての重要度を算出する重要度算出手段と、この重要度
算出手段による算出結果に基づき前記キーワード候補中
からキーワードを抽出するキーワード抽出手段とより構
成した。Means for Solving the Problems Word information storage means that stores in advance word information including a large number of predetermined words and their respective parts of speech; word extraction means for extracting words from English information; and the word extraction means. a keyword candidate creation means for creating a keyword candidate based on the word extracted by the second keyword candidate creation means and the stored content of the word information storage means corresponding to the word; and a keyword candidate created by the second keyword candidate creation means and its constituent words. 8, an importance calculation means for calculating the importance of the keyword candidate as a keyword based on the number of current occurrences; and a keyword extraction means for extracting a keyword from the keyword candidates based on the calculation result by the importance calculation means. .

作用単語抽出手段により抽出された単語とこの単語に対応す
る単語情報記憶手段の記憶内容とに基づいてキーワード
候補作成手段によりキーワード候補を作成し、このキー
ワード候補とその構成単語との出現回数に基づきキーワ
ード候補のキーワードとしての重要度を重要度算出手段
により算出し、この算出結果に基づきキーワード候補中
からキーワード抽出手段によりキーワードを抽出するの
で、抽出されたキーワードが特定のものに制限されると
いうことがない。また、検索時に用いられると予想され
る自然な表現に基づいてキーワード候補を作成すること
ができ、さらに、このキーワード候補の中からキーワー
ドとしての重要度が高いものを抽出することが出来る。A keyword candidate is created by the keyword candidate creation means based on the word extracted by the action word extraction means and the memory content of the word information storage means corresponding to this word, and based on the number of occurrences of this keyword candidate and its constituent words. The importance of the keyword candidate as a keyword is calculated by the importance calculation means, and based on the calculation result, keywords are extracted from the keyword candidates by the keyword extraction means, so the extracted keywords are limited to specific ones. There is no. Furthermore, keyword candidates can be created based on natural expressions expected to be used during a search, and keywords with high importance as keywords can be extracted from among these keyword candidates.

実施例本発明の一実施例を第１図ないし第３図に基づいて説明
する。このキーワード抽出装置は、第１図に示すように
、予め定められた多数の単語とその各々に対応する単語
情報とを予め記憶した単語情報記憶手段１と、英語文書
情報から単語を抽出する単語抽出手段２と、この単語抽
出手段２により抽出された単語とこの単語に対応する前
記単語情報記憶手段１の記憶内容とに基づきキーワード
候補を作成するキーワード候補作成手段３と、このキー
ワード候補作成手段３により作成されたキーワード候補
とその構成単語との出現回数に基づき前記キーワード候
補のキーワードとしての重要度を算出する重要度算出手
段４と、この重要度算出手段４による算出結果に基づき
前記キーワード候補作成手段３により作成されたキーワ
ード候補中からキーワードとなるものを抽出するキーワ
ード抽出手段５とよりなるものである。Embodiment An embodiment of the present invention will be described with reference to FIGS. 1 to 3. As shown in FIG. 1, this keyword extraction device includes a word information storage means 1 that stores in advance a large number of predetermined words and word information corresponding to each word, and a word information storage means 1 for extracting words from English document information. an extraction means 2; a keyword candidate creation means 3 for creating keyword candidates based on the word extracted by the word extraction means 2 and the stored content of the word information storage means 1 corresponding to the word; and this keyword candidate creation means. an importance calculation means 4 for calculating the importance of the keyword candidate as a keyword based on the number of occurrences of the keyword candidate created in step 3 and its constituent words; It consists of a keyword extraction means 5 for extracting keywords from among the keyword candidates created by the creation means 3.

但し、前記単語情報記憶手段ｌに記憶された単語情報は
品詞情報と構成品詞情報とよりなるものである。さらに
、前記品詞情報は単語とその品詞とよりなるものであり
、その−例を第１表に示す。However, the word information stored in the word information storage means 1 consists of part-of-speech information and component part-of-speech information. Furthermore, the part-of-speech information consists of a word and its part of speech, an example of which is shown in Table 1.

また、前記構成品詞情報は品詞とその品詞がキーワード
候補の構成分子となれる位置とよりなるものであり、そ
の−例を第２表に示す。Further, the constituent part-of-speech information consists of a part of speech and a position where the part of speech can be a constituent molecule of a keyword candidate, and an example thereof is shown in Table 2.

第１表第２表二のような構成において、キーワード候補を抽出する場
合、第２図に示すように、キーワード候補作成手段によ
り、英語文書情報が記憶されたテキストファイルを対象
として、キーワード候補の作成処理を行なう。すなわち
、まず最初に、テキストファイルが終了したか否かを判
定し、終了していない場合には、このテキストファイル
から単語抽出手段により単語を抽出し、単語情報記憶手
段１に記憶された品詞情報を参照してこの単語に対応す
る品詞を読み出した後、キーワード候補作成中であるか
否かを判定する。When extracting keyword candidates in the configuration shown in Table 1 and Table 2, as shown in Figure 2, the keyword candidate generation means extracts keyword candidates from a text file in which English document information is stored. Perform the creation process. That is, first of all, it is determined whether the text file has ended or not, and if it has not ended, words are extracted from this text file by the word extraction means, and the part-of-speech information stored in the word information storage means 1 is extracted. After reading out the part of speech corresponding to this word with reference to , it is determined whether keyword candidates are being created.

キーワード候補作成中でないと判定された場合、単語情
報記憶手段１に記憶された構成品詞情報を参照して（以
下省略）前−記単語に対応する品詞が先頭になれるか否
かを判定し、先頭になれると判定された場合にはキーワ
ード候補の作成を開始し、さらに、前記単語に対応する
品詞が後尾になれるか否かを判定する。If it is determined that keyword candidates are not being created, refer to constituent part-of-speech information stored in the word information storage means 1 (hereinafter omitted) to determine whether the part-of-speech corresponding to the word can be the first word; If it is determined that the word can be at the beginning, creation of keyword candidates is started, and it is further determined whether the part of speech corresponding to the word can be at the end.

単語に対応する品詞が後尾になれると判定された場合に
は、一つのキーワード候補の作成を終了した後、このキ
ーワード候補の後尾に継続して次のキーワード候補の作
成を開始する。If it is determined that the part of speech corresponding to the word can be at the end, after completing the creation of one keyword candidate, creation of the next keyword candidate is started following the end of this keyword candidate.

そして、次のキーワード候補の作成を開始したとき、ま
たは、品詞が先頭になれるか否かの上述の判定処理にお
いて品詞が先頭になれないと判定されたとき、あるいは
、品詞が後尾になれるか否かの上述の判定処理において
品詞が後尾になれないと判定されたときには、テキスト
ファイルが終了したか否かの上述の判定処理以降の処理
を繰り返す。Then, when you start creating the next keyword candidate, or when it is determined that the part of speech cannot be the first part in the above-mentioned process of determining whether the part of speech can be the first, or whether the part of speech can be the last. If it is determined in the above-described determination process that the part of speech cannot be the last part of speech, the process subsequent to the above-described determination process for determining whether the text file has ended is repeated.

一方、キーワード候補作成中であるか否かの上述の判定
処理において、キーワード候補作成中であると判定され
た場合には、抽出した単語に対応する品詞が後尾になれ
るか否かを判定する。On the other hand, in the above-described process for determining whether keyword candidates are being created, if it is determined that keyword candidates are being created, it is determined whether the part of speech corresponding to the extracted word can be the last word.

単語に対応する品詞が後尾になれると判定された場合に
は、作成中のキーワード候補の後尾に単語を継続した後
、上述のキーワード候補の作成終了処理以降の処理を繰
返し、単語に対応する品詞が後尾になれないと判定され
た場合には、この単語が真中になれるか否かを判定する
。If it is determined that the part of speech corresponding to the word can be the last part of speech, the word is continued at the end of the keyword candidate being created, and then the process from the end of creating the keyword candidate described above is repeated, and the part of speech corresponding to the word is continued. If it is determined that this word cannot be placed at the end, it is determined whether this word can be placed in the middle.

単語が真中になれると判定された場合には、作成中のキ
ーワード候補の後尾に単語を継続した後、テキストファ
イルが終了したか否かの上述の判定処理以降の処理を繰
り返し、単語が真中になれないと判定された場合には、
作成中のキーワード候補を棄却してキーワード候補の作
成を終了した後、テキストファイルが終了したか否かの
上述の判定処理以降の処理を繰り返す。If it is determined that the word can be placed in the middle, the word is continued at the end of the keyword candidate being created, and the above-mentioned process for determining whether the text file has ended is repeated until the word is placed in the middle. If it is determined that it is not possible,
After the keyword candidate being created is rejected and the creation of the keyword candidate is completed, the above-described process of determining whether the text file is finished or not is repeated.

さらに、テキストファイルが終了したか否かの上述の判
定処理において、テキストファイルが終了したと判定さ
れたとき、このキーワード候補作成処理を終了する。Further, in the above-described determination process as to whether the text file has ended, when it is determined that the text file has ended, this keyword candidate creation process is ended.

ここで、ｒｅｃｅｎｔｌｙ　ｅｌｅｃｔｅｄ　ａｓ　ｔｈｅ　Ｐ
ｒｅｓｉｄｅｎｔ　ｏｆＣｚｅｃｈｏｓｌｏｖａｋｉａ
、Ｖａｃｌａｖ　）ｌａｖｅｌ、ｉｎｔｅｒｎａｔｉｏ
ｎａｌｌｙｒｅｃｏｇｎｉｚｅｄ　ａｓ　ａ　ｌｅａｄ
ｉｎｇ　ｆｉｇｕｒｅ　ｉｎ　ｔｈｅ　ｗｏｒｌｄｏｆ
　ａｒｔｓ　ａｎｄ　１ｅｔｔｅｒｓ、ｗａｓ、、。Here, recently selected as the P
resident ofCzechoslovakia
, Vaclav) label, international
nally recognized as a lead
figure in the world
arts and 1etters, was...

という比較的長い文節に対して上述のキーワード候補作
成処理を行なった場合における抽出単語と、その品詞と
、その品詞がキーワード候補の構成分子となれる位置と
、その抽出単語に対してどこまで処理を行なったかを示
すステップ番号とを第３表に示す。さらに、このような
処理が行なわれた結果として作成されたキーワード候補
は、ＰｒｅｓｉｄｅｎｔＰｒｅｓｉｄｅｎｔ　ｏｆ　Ｃｚｅｃｈｏｓｌｏｖａｋ
ｉａａｃｌａｖＶａｃｌａｖ　Ｈａｖｅｌ１ｅａｄｉｎｇｆｉｇｕｒｅｗｏｒｌｄｗｏｒｌｄ　　ｏｆ　　ａｒｔｓｉＭｏｒｌｄ　ｏｆ　ａｒｔｓ　ａｎｄ　１ｅｔｔｅｒ
ｓとなる。When the above-mentioned keyword candidate generation process is performed on a relatively long phrase, the extracted word, its part of speech, the position where that part of speech can be a constituent molecule of a keyword candidate, and the extent to which processing is performed on the extracted word. Table 3 shows step numbers indicating the number of steps. Furthermore, the keyword candidates created as a result of such processing are
iaaclav Vaclav Havel 1eadingfigure world world of arts iMorld of arts and 1etter
It becomes s.

また、ｒｅｃｅｎｔ　ｒｅｓｅａｒｃｈ　ａｎｄ　ｄｅｖｅｌ
ｏｐｍｅｎｔ　ｐｒｏｊｅｃｔｓ。Also, recent research and devel
opment projects.

ｕｎｄｅｒｔａｋｅｎ　ｂｙ　Ｊａｐａｎｅｓｅ　ｓｅ
ｍｉｃｏｎｄｕｃｔｏｒ　ｍａｋｅｒｓ。Undertaken by Japanese se
Microconductor makers.

第３表ｈａｖｅ、　、。Table 3 have...

という文節に対して上述のキーワード候補作成処理を行
なった場合、その作成結果は、ｒｅｓｅａｒｃｈｒｅｓｅａｒｃｈ　ａｎｄ　ｄｅｖｅｌｏｐｍｅｎｔｒ
ｅｓｅａｒｃｈ　ａｎｄ　ｄｅｖｅｌｏｐｍｅｎｔ　ｐ
ｒｏｊｅｃｔｓＪａｐａｎｅｓｅ　ｓｅｍｉｃｏｎｄｕ
ｃｔｏｒＪａｐａｎｅｓｅ　ｓｅｍｉｃｏｎｄｕｃｔｏ
ｒ　ｍａｋｅｒｓとなり、その際の処理は第４表に示す
通りである。When the above-mentioned keyword candidate creation process is performed on the phrase , the creation result is research research and development
search and development p
projectsJapanese semicondu
ctorJapanese semiconductor
r makers, and the processing at that time is as shown in Table 4.

第４表次に、このようにして作成されたキーワード候補の全て
に対し、第３図に示すように、重要度算出手段によりキ
ーワードとしての重要度の算出をキーワード候補毎に行
なう。Table 4 Next, as shown in FIG. 3, for all of the keyword candidates created in this way, the importance calculation means calculates the importance of each keyword candidate.

すなわち、それぞれのキーワード候補を構成する単語で
名詞、形容詞、動詞のいずれかであるものをそれぞれ一
語として抽出し、その各々の出現回数ｗｃを算出する。That is, each word constituting each keyword candidate, which is either a noun, an adjective, or a verb, is extracted as one word, and the number of occurrences wc of each word is calculated.

さらに、このようにして算出された出現回数ｗｃに基づ
き、各々のキーワード候補の重要度Ｓを算出する。この
重要度Ｓの算出に用いる数式は、各々のキーワード候補
における構成単語数ｎによって異なるものである。Furthermore, the importance S of each keyword candidate is calculated based on the number of appearances wc calculated in this way. The formula used to calculate the importance level S differs depending on the number n of constituent words in each keyword candidate.

すなわち、一つのキーワード候補に対する構成単語数ｎ
が所定値（ｌｉｍｎ）以上である場合、そのキーワード
候補の重要度Ｓは、Ｓ−ΣＷｃ１１という数式により算出され、構成単語の出現回数ＷＣの
みに依存するものとなる。In other words, the number of constituent words n for one keyword candidate
is greater than or equal to a predetermined value (limn), the importance S of the keyword candidate is calculated by the formula S-ΣWc11 and depends only on the number of appearances WC of the constituent words.

また、一つのキーワード候補に対する構成単語数ｎが所
定値（ｌｉｍｎ）より小さい場合には、キーワード候補
自体の出現回数をｐｃとし、この出現回数ｐｃに対して
付加する重みをｗしとした時、そのキーワード候補の重
要度Ｓは、Ｓ＝Σｗｅ　　＋ｗｔＸｐｃという数式により算出され、構成単語の出現回数ＷＣに
加え、キーワード候補自体の出現回数をｐｃに依存する
ものとなる。In addition, when the number of constituent words n for one keyword candidate is smaller than a predetermined value (limn), when the number of appearances of the keyword candidate itself is pc, and the weight to be added to this number of appearances pc is w, The importance level S of the keyword candidate is calculated by the formula S=Σwe +wtXpc, and in addition to the number of appearances WC of the constituent words, the number of appearances of the keyword candidate itself depends on pc.

さらに、このようにして算出されたキーワード候補の重
要度Ｓに基づき、キーワード抽出手段により、キーワー
ド候補の中からキーワードとなるものを抽出する。Further, based on the degree of importance S of the keyword candidates calculated in this way, keyword extraction means extracts keywords from among the keyword candidates.

このようにしてキーワードを抽出するので、抽出結果が
特定のものに制限されるということがない。また、検索
時に用いられると予想される自然な表現に基づいてキー
ワード候補が作成され、さらに、このキーワード候補の
中からキーワードとしての重要度が高いもののみがキー
ワードとじて抽出される。Since keywords are extracted in this way, the extraction results are not limited to specific ones. Further, keyword candidates are created based on natural expressions that are expected to be used during a search, and furthermore, only keywords with high importance as keywords are extracted from these keyword candidates.

このため、文書の検索時に、適切なキーワードを検索条
件として容易に作成することができ、同時に、検索時に
おける作業効率と装置の検索能力とを向上させることが
出来る。Therefore, when searching for documents, it is possible to easily create appropriate keywords as search conditions, and at the same time, it is possible to improve the work efficiency during the search and the search ability of the device.

発明の効果本発明は上述のように、単語抽出手段により抽出された
単語とこの単語に対応する単語情報記憶手段の記憶内容
とに基づいてキーワード候補作成手段によりキーワード
候補を作成し、このキーワード候補とその構成単語との
出現回数に基づきキーワード候補のキーワードとしての
重要度を重要度算出手段により算出し、この算出結果に
基づきキーワード候補中からキーワード抽出手段により
キーワードを抽出するので、抽出されたキーワードが特
定のものに制限されるということがない。Effects of the Invention As described above, the present invention creates a keyword candidate by the keyword candidate creation means based on the word extracted by the word extraction means and the stored content of the word information storage means corresponding to this word, and creates the keyword candidate. The importance calculation means calculates the importance of the keyword candidate as a keyword based on the number of occurrences of the keyword and its constituent words, and the keyword extraction means extracts the keyword from the keyword candidates based on the calculation result. is not limited to specific things.

また、検索時に用いられると予想される自然な表現に基
づいてキーワード候補を作成することができ、さらに、
このキーワード候補の中からキーワードとしての重要度
が高いものを抽出することが出来る。このため、文書の
検索時に、適切なキーワードを検索条件として容易に作
成することが可能となり、同時に、検索時における作業
効率と装置の検索能力とを向上させることが出来る。In addition, keyword candidates can be created based on natural expressions that are expected to be used during searches, and
From these keyword candidates, keywords with high importance can be extracted. Therefore, when searching for documents, it is possible to easily create appropriate keywords as search conditions, and at the same time, it is possible to improve the work efficiency during searching and the search ability of the device.

[Brief explanation of the drawing]

図面は本発明の一実施例を示すもので、第１図はクレー
ム対応図、第２図はキーワード候補作成処理を示すフロ
ーチャート、第３図は重要度算出処理を示すフローチャ
ートである。ｌ・・・単語情報記憶手段、２・・・単語抽出手段、３
・・・キーワード候補作成手段、４・・・重要度算出手
段、５・・・キーワード抽出手段１−１斗啓図The drawings show an embodiment of the present invention; FIG. 1 is a complaint correspondence diagram, FIG. 2 is a flowchart showing keyword candidate creation processing, and FIG. 3 is a flowchart showing importance calculation processing. l... Word information storage means, 2... Word extraction means, 3
...Keyword candidate creation means, 4.Importance calculation means, 5.Keyword extraction means 1-1 Dou Keizu

Claims

[Claims]

word information storage means that stores in advance word information including a large number of predetermined words and their respective parts of speech; word extraction means that extracts words from English text information; keyword candidate creation means for creating keyword candidates based on the memory content of the word information storage means corresponding to the word; and the keyword candidate based on the number of occurrences of the keyword candidate created by the keyword candidate creation means and its constituent words. A keyword extraction device comprising: an importance calculation means for calculating the importance of a keyword as a keyword; and a keyword extraction means for extracting a keyword from the keyword candidates based on the calculation result by the importance calculation means.