JPH10334102A

JPH10334102A - Key word extraction device and medium where control program is recorded

Info

Publication number: JPH10334102A
Application number: JP9163257A
Authority: JP
Inventors: Mitsuo Shimohata; 光夫下畑
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-06-04
Filing date: 1997-06-04
Publication date: 1998-12-18

Abstract

PROBLEM TO BE SOLVED: To acquire a key word without requiring information depending upon a data base. SOLUTION: Information on characters as a predetermined key word is stored in a character kind information part 3. A primary key word segmenting process part 2 extracts a character string as a primary key word from the data base according to the information of the character kind information part 3 and stores it in a primary key word storage part 4. When an arbitrary primary key word stored in the primary key word storage part 4 is a composite word of primary key words, an unnecessary word removing process part 5 removes it as an unnecessary word. Consequently, the composite word is not registered in a key word information storage part 6 and the key words which constitutes the composite word are registered.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データベースから
キーワードを抽出するキーワード抽出装置に関する。The present invention relates to a keyword extracting device for extracting a keyword from a database.

【０００２】[0002]

【従来の技術】従来より、テキストデータからキーワー
ドを抽出する構成が考えられており、例えば、この種の
キーワード抽出の構成としては特開平８−３０６２７号
公報に記載されているもの等があった。このような従来
のキーワード抽出装置では、テキストデータから字種を
基に語を切り出す方法が開示されている。例えば、上記
公報の文献には、テキスト文から字種に着目して文字を
切り出しており、更に、切り出した語を更に細かい字種
により分割し、キーワードの種類を増している。また、
基本語と呼ぶ語の辞書を持っており、基本語と同一の文
字列を削除するという機能も有している。2. Description of the Related Art Hitherto, a configuration for extracting a keyword from text data has been considered. For example, a configuration of this type of keyword extraction is disclosed in Japanese Patent Application Laid-Open No. H8-30627. . Such a conventional keyword extracting device discloses a method of extracting words from text data based on character types. For example, in the literature of the above publication, characters are cut out from a text sentence by paying attention to character types, and the cut-out words are further divided into finer character types to increase the types of keywords. Also,
It has a dictionary of words called basic words, and has a function of deleting the same character string as the basic words.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来のキーワード抽出装置では、字種による分割でキーワ
ードを切り出しているが、その場合には非常に多くの語
が抽出されてしまう。また、キーワードを削減するため
に基本語を設定し、切り出された語が基本語であれば削
除しているが、この方法では、データベースの内容を考
慮して基本語情報を設定する必要がある上、複数の基本
語が結合してできた語や、基本語に接頭語や接尾語が結
合してできた語を削除することができない。However, in the above-mentioned conventional keyword extracting apparatus, keywords are cut out by division according to character types, but in that case, a very large number of words are extracted. In addition, basic words are set in order to reduce keywords, and if the extracted words are basic words, they are deleted. However, in this method, it is necessary to set basic word information in consideration of the contents of the database. In addition, words formed by combining a plurality of basic words or words formed by combining a prefix or suffix with a basic word cannot be deleted.

【０００４】このような点から、データベースに依存す
る情報を必要とせず、キーワードを獲得することのでき
るキーワード抽出装置の実現が望まれていた。[0004] In view of the above, it has been desired to realize a keyword extracting apparatus which can acquire a keyword without requiring information depending on a database.

【０００５】[0005]

【課題を解決するための手段】本発明は、前述の課題を
解決するため次の構成を採用する。〈請求項１の構成〉予め決められたキーワードとなる文
字の情報を示す字種情報を有する字種情報部と、字種情
報に基づき、データベースから一次キーワードとなる文
字列を抽出する一次キーワード切出処理部と、一次キー
ワード切出処理部で抽出された一次キーワードを格納す
る一次キーワード格納部と、一次キーワード切出処理部
で抽出された複数の一次キーワードに基づき、一次キー
ワード格納部に格納された一次キーワードから不要な一
次キーワードを除去し、キーワード情報として出力する
不要語除去処理部とを備えたことを特徴とするキーワー
ド抽出装置である。The present invention employs the following structure to solve the above-mentioned problems. <Structure of Claim 1> A character type information section having character type information indicating information of a character serving as a predetermined keyword, and a primary keyword cutout for extracting a character string serving as a primary keyword from a database based on the character type information An output processing unit; a primary keyword storage unit for storing the primary keywords extracted by the primary keyword extraction processing unit; and a primary keyword storage unit based on the plurality of primary keywords extracted by the primary keyword extraction processing unit. An unnecessary word removal processing unit that removes unnecessary primary keywords from the primary keywords and outputs the same as keyword information.

【０００６】〈請求項１の説明〉字種情報部の字種情報
とは、キーワードとなる文字と、そうでない文字との情
報を示す情報である。例えば、非キーワード文字として
は、ひらがなの全部と一部の記号であるが、これ以外の
設定であってもよい。一次キーワード切出処理部では、
データベースから、字種情報のキーワードとなる文字を
一次キーワードとして抽出し、一次キーワード格納部に
格納する。不要語除去処理部では、一次キーワード格納
部に格納された一次キーワードの情報のみで、不要な一
次キーワードを除去し、最終的なキーワードとして出力
する。この除去処理としては、例えば、ある一次キーワ
ードである語が、一次キーワードとして格納されている
複数の語の合成語であった場合に不要な語として除去す
るといった処理であるが、これ以外の除去方法であって
もよい。<Explanation of Claim 1> The character type information of the character type information part is information indicating information of characters that are keywords and characters that are not. For example, the non-keyword characters are all and some symbols of Hiragana, but other settings may be used. In the primary keyword extraction processing unit,
Characters to be used as keywords of character type information are extracted from the database as primary keywords and stored in the primary keyword storage. The unnecessary word removal processing unit removes unnecessary primary keywords using only the primary keyword information stored in the primary keyword storage unit and outputs the result as a final keyword. The removal process is, for example, a process of removing a word as a primary keyword as an unnecessary word when the word is a composite word of a plurality of words stored as the primary keyword. It may be a method.

【０００７】このような構成により、キーワードの抽出
を行う場合にデータベースに依存する情報を利用しない
ため、あらゆる分野のテキストに対して適用できるとい
う効果を備えている。[0007] With such a configuration, since the information depending on the database is not used when extracting the keyword, it is possible to apply the invention to texts in all fields.

【０００８】〈請求項２の構成〉請求項１において、キ
ーワードの文字が１文字である文字列を除いて一次キー
ワードを作成する一次キーワード切出処理部を備えたこ
とを特徴とするキーワード抽出装置である。<Structure of Claim 2> A keyword extracting device according to Claim 1, further comprising a primary keyword extraction processing unit for creating a primary keyword except for a character string in which the keyword is one character. It is.

【０００９】〈請求項２の説明〉キーワードとなる文字
であっても、例えば漢字１文字の場合は、ひらがなが後
に続いて形容詞や動詞を形成することがほとんどであ
り、このような語は検索のキーワードとして余り用いら
れない。また、漢字以外の文字種で１文字の文字列とし
て出現することは少ない。このような観点から、一次キ
ーワードの切出処理において、キーワード文字が１文字
である文字列を除いて一次キーワードを作成する。これ
により、検索に余り有用でない語を除去することができ
る。<Explanation of Claim 2> In the case of a single character such as a kanji character, for example, a hiragana is followed by an adjective or a verb in most cases. Is not often used as a keyword. In addition, a character type other than a kanji rarely appears as a single character string. From such a viewpoint, in the extraction processing of the primary keyword, the primary keyword is created except for the character string in which the keyword character is one character. As a result, words that are not very useful for the search can be removed.

【００１０】〈請求項３の構成〉請求項１または２にお
いて、一次キーワード格納部に格納された一次キーワー
ドのうち、一次キーワード切出処理部で作成された複数
の一次キーワードを結合したキーワードと等しい一次キ
ーワードを不要語として除去する不要語除去処理部を備
えたことを特徴とするキーワード抽出装置である。<Structure of Claim 3> According to Claim 1 or 2, of the primary keywords stored in the primary keyword storage unit, it is equal to a keyword obtained by combining a plurality of primary keywords created by the primary keyword extraction processing unit. A keyword extraction device comprising an unnecessary word removal processing unit that removes a primary keyword as an unnecessary word.

【００１１】〈請求項３の説明〉請求項３の発明は、あ
る一次キーワードである語が、一次キーワードとして格
納されている複数の語の合成語であった場合に不要な語
として除去するようにしたものである。これにより、一
次キーワードの情報のみで、基本語が連結してできる語
を除去することができ、無駄なキーワードを出力するこ
とがないという効果がある。<Explanation of Claim 3> According to the invention of claim 3, when a word as a certain primary keyword is a compound word of a plurality of words stored as a primary keyword, it is removed as an unnecessary word. It was made. Thus, it is possible to eliminate words formed by connecting basic words with only the information of the primary keywords, and there is an effect that unnecessary keywords are not output.

【００１２】〈請求項４の構成〉データベースから、予
め決められた文字のキーワードを抽出する処理と、キー
ワードの抽出処理後、抽出された複数のキーワードに基
づいて、不要となるキーワードを求め、最終的なキーワ
ードを得る処理とをコンピュータで行うことを特徴とす
るキーワード抽出制御プログラムを記録した媒体であ
る。<Structure of Claim 4> A process of extracting a keyword of a predetermined character from a database, and a process of extracting a keyword, and after the keyword is extracted, an unnecessary keyword is obtained based on a plurality of extracted keywords. A computer-readable storage medium storing a keyword extraction control program in which a process of obtaining a basic keyword is performed by a computer.

【００１３】〈請求項４の説明〉請求項４の発明は、請
求項１のキーワード抽出装置をコンピュータ上に実現す
る制御プログラムを記録した媒体に関するものである。<Explanation of Claim 4> The invention of claim 4 relates to a medium in which a control program for realizing the keyword extracting device of claim 1 on a computer is recorded.

【００１４】〈請求項５の構成〉請求項３において、予
め決められた接頭語と接尾語の情報を格納する接頭語・
接尾語情報部と、一次キーワード格納部に格納された一
次キーワードのうち、前記接頭語または接尾語を除いた
文字列に対して不要語除去処理を行う不要語除去処理部
を備えたことを特徴とするキーワード抽出装置である。<Structure of claim 5> According to claim 3, a prefix or a prefix for storing information of a predetermined prefix and suffix.
A suffix information section; and an unnecessary word removal processing section that performs an unnecessary word removal process on a character string excluding the prefix or the suffix among the primary keywords stored in the primary keyword storage section. Is a keyword extraction device.

【００１５】〈請求項５の説明〉請求項５の発明は、請
求項３の発明に加えて、接頭語・接尾語の情報を備え、
一次キーワードから、接頭語や接尾語を除いた語が合成
語であった場合に、これを一次キーワードから除去する
ようにしたものである。これにより、接頭語や接尾語が
一次キーワード中になくても、これらを含む合成語を除
去することができる効果がある。<Explanation of Claim 5> The invention of claim 5 includes, in addition to the invention of claim 3, information of a prefix / suffix,
When a word obtained by removing a prefix or a suffix from a primary keyword is a compound word, this is removed from the primary keyword. As a result, even if the prefix or the suffix is not included in the primary keyword, it is possible to remove a compound word including the prefix and the suffix.

【００１６】〈請求項６の構成〉請求項３において、予
め決められた基本語文字列と、この基本語文字列を修飾
し、かつ、異なる字種で構成された修飾文字列の情報を
格納する部分文字列情報部と、一次キーワード格納部に
格納された一次キーワードのうち、修飾語文字列が、一
次キーワードに存在するキーワードは除去して最終的な
キーワードを出力する不要語除去処理部を備えたことを
特徴とするキーワード抽出装置である。<Structure of Claim 6> In claim 3, in accordance with Claim 3, information of a predetermined basic character string and a modified character string composed of different character types for modifying the basic character string is stored. A partial character string information section, and an unnecessary word removal processing section for outputting a final keyword by removing a keyword existing in the primary keyword from the primary keywords stored in the primary keyword storage section. It is a keyword extraction device characterized by comprising:

【００１７】〈請求項６の説明〉基本語文字列と修飾語
文字列の文字種は、例えば、基本語文字列が「英数字」
や「カタカナ」であり、修飾語文字列が「漢字」である
が、基本語文字列に対して修飾語となる文字列の文字種
であれば、これに限定されない。<Explanation of Claim 6> The character type of the basic word character string and the modifier word character string is, for example, as follows.
And “Katakana”, and the modifier character string is “Kanji”, but the character type of the character string that is a modifier for the basic character string is not limited to this.

【００１８】このような構成により、あるキーワードを
除去するためには、構成する部分文字列が全て一次キー
ワード中に存在する必要はなく、従って、更に検索に有
用なキーワードを出力することができる効果がある。With such a configuration, in order to remove a certain keyword, it is not necessary that all of the constituent partial character strings exist in the primary keyword, and therefore, it is possible to further output a keyword useful for retrieval. There is.

【００１９】〈請求項７の構成〉請求項６において、先
頭文字列が修飾語文字列である一次キーワードは、不要
語として除去しない不要語除去処理部を備えたことを特
徴とするキーワード抽出装置である。<Structure of claim 7> A keyword extracting apparatus according to claim 6, further comprising an unnecessary word removal processing unit that does not remove a primary keyword whose leading character string is a modifier character string as an unnecessary word. It is.

【００２０】〈請求項７の説明〉請求項７の発明は、先
頭に修飾語文字列が存在する場合は文字列全体で有用な
キーワードであることが多いという観点から、このよう
な文字種の並びの場合は除去しないようにしたものであ
る。<Explanation of Claim 7> The invention of claim 7 is based on the fact that, when a modifier character string is present at the beginning, it is often a useful keyword for the entire character string. In the case of, it is not removed.

【００２１】〈請求項８の構成〉請求項６または７にお
いて、基本語文字列の字種は、英文字と数字とカタカナ
であり、修飾語文字列の字種は漢字であることを特徴と
するキーワード抽出装置である。<Structure of Claim 8> In Claim 6 or 7, the character type of the basic word character string is English letters, numbers and katakana, and the character type of the modifier word character string is Kanji. This is a keyword extraction device.

【００２２】〈請求項８の説明〉請求項８の発明は、基
本語文字列と修飾語文字列の字種を指定したものであ
る。これにより、あるキーワードを除去するためは、構
成する部分文字列が全て一次キーワード中に存在する必
要はなく、このような文字種の組み合わせであれば除去
できる。<Explanation of Claim 8> The invention of claim 8 specifies the character type of the basic character string and the modifier character string. Thus, in order to remove a certain keyword, it is not necessary that all of the constituent character strings are present in the primary keyword, and such a combination of character types can be removed.

【００２３】〈請求項９の構成〉請求項３において、予
め決められた不要語の情報を格納する不要語情報部と、
一次キーワードを抽出する場合、不要語情報部に格納さ
れている不要語であった場合は、一次キーワードから除
外する一次キーワード切出処理部と、予め決められた合
成語の情報を格納する合成語情報部と、一次キーワード
格納部に格納された一次キーワードのうち、合成語と等
しい文字列は不要語除去処理しない不要語除去処理部と
を備えたことを特徴とするキーワード抽出装置である。<Structure of Claim 9> In Claim 3, an unnecessary word information section for storing information of a predetermined unnecessary word,
When extracting a primary keyword, a primary keyword extraction processing unit that excludes an unnecessary word from the primary keyword if it is an unnecessary word stored in the unnecessary word information unit, and a compound word that stores information of a predetermined compound word A keyword extraction device comprising: an information unit; and an unnecessary word removal processing unit that does not perform unnecessary word removal processing on a character string equal to a composite word among primary keywords stored in a primary keyword storage unit.

【００２４】〈請求項９の説明〉請求項９の発明は、一
次キーワードとして抽出しない不要語の情報と、不要語
除去処理で、合成語であると判定される語であっても、
合成語としては除去しない合成語情報を備えたものであ
る。これにより、一次キーワードの切出処理の段階で不
要な一次キーワードを除去することができる。また、合
成語であるがキーワードとして登録すべき語を発見し、
保存することができる。<Description of Claim 9> According to the invention of claim 9, the information of the unnecessary word which is not extracted as the primary keyword and the word which is determined to be a compound word in the unnecessary word removal processing is provided.
It is provided with compound word information which is not removed as a compound word. Thus, unnecessary primary keywords can be removed at the stage of the primary keyword extraction process. In addition, we find a word that is a compound word but should be registered as a keyword,
Can be saved.

【００２５】〈請求項１０の構成〉請求項９において、
予め決められた合成語としない対象外文字種の情報を格
納する対象外文字種情報部と、一次キーワード格納部に
格納された一次キーワードのうち、一種類の対象外文字
種で構成された文字列の部分は合成部分として処理しな
い不要語除去処理部を備えたことを特徴とするキーワー
ド抽出装置である。<Structure of Claim 10> In Claim 9,
A non-target character type information section that stores information on non-target character types that are not a predetermined compound word, and a portion of a character string composed of one type of non-target character type among the primary keywords stored in the primary keyword storage section Is a keyword extraction device provided with an unnecessary word removal processing unit that does not process as a synthesis part.

【００２６】〈請求項１０の説明〉請求項１０の発明
は、不要語除去処理で、合成語であると判定される語で
あっても、その合成語が一種類の文字種であれば合成語
としては除去しない文字種の情報を備えたものである。
この文字種としては、例えば、カタカナや英文字であ
る。これにより、一次キーワードに「キー」と「ワー
ド」が存在していても「キーワード」といった語を、キ
ーワードとして出力することができる。<Explanation of Claim 10> According to the invention of claim 10, even if a word determined to be a compound word in the unnecessary word removing process is a compound word of one kind, the compound word Is provided with character type information that is not removed.
This character type is, for example, katakana or English characters. Thereby, even if the "key" and the "word" exist in the primary keyword, a word such as "keyword" can be output as a keyword.

【００２７】〈請求項１１の構成〉請求項９または１０
において、一次キーワード格納部に格納された一次キー
ワードのうち、不要語除去処理を行う場合は、一次キー
ワード格納部に格納されている一次キーワードと共に、
不要語情報に格納されている不要語を使用する不要語除
去処理部を備えたことを特徴とするキーワード抽出装置
である。<Structure of Claim 11> Claim 9 or 10
In the case of performing unnecessary word removal processing among the primary keywords stored in the primary keyword storage unit, together with the primary keywords stored in the primary keyword storage unit,
A keyword extraction device comprising an unnecessary word removal processing unit that uses unnecessary words stored in unnecessary word information.

【００２８】〈請求項１１の説明〉請求項１１の発明
は、不要語除去処理において、不要語情報を一次キーワ
ードと同様に利用するようにしたものである。これによ
り、データベース中に存在しない語による合成語を発見
し除去することができる。<Explanation of Claim 11> In the invention of claim 11, unnecessary word information is used in the unnecessary word removal processing in the same manner as a primary keyword. As a result, it is possible to find and remove a compound word based on a word that does not exist in the database.

【００２９】〈請求項１２の構成〉請求項３において、
データベース中の各一次キーワードの出現回数を示す統
計情報部と、一次キーワード格納部に格納された一次キ
ーワードのうち、合成語として判定された語が、統計情
報部の一次キーワードとして、予め決められたしきい値
よりも高い出現回数であった場合は、不要語として除去
しない不要語除去処理部とを備えたことを特徴とするキ
ーワード抽出装置である。<Structure of Claim 12> In Claim 3,
A statistical information part indicating the number of appearances of each primary keyword in the database, and among the primary keywords stored in the primary keyword storage part, a word determined as a composite word is predetermined as a primary keyword of the statistical information part. A keyword extraction device comprising: an unnecessary word removal processing unit that does not remove as an unnecessary word if the number of appearances is higher than a threshold value.

【００３０】〈請求項１２の説明〉請求項１２の発明
は、不要語除去処理において、合成語として判定される
語のデータベース中の出現回数が、ある値以上であれば
キーワードとして出力するようにしたものである。これ
により、キーワードとする価値のある合成語を全て情報
として持つ必要がなく、このような合成語を自動的に判
別し保存することができる。<Description of Claim 12> According to the invention of claim 12, in the unnecessary word removal processing, if the number of appearances of a word determined as a compound word in the database is equal to or more than a certain value, the word is output as a keyword. It was done. Thus, it is not necessary to have all the valuable composite words as keywords as information, and such composite words can be automatically determined and stored.

【００３１】〈請求項１３の構成〉請求項３において、
データベースに格納された複数のデータのうち、特定の
一次キーワードを含むデータ数を、一次キーワードに対
応して示す統計情報部と、一次キーワード格納部に格納
された一次キーワードのうち、合成語として判定された
語が、統計情報部の一次キーワードとして、予め決めら
れたしきい値よりも高いデータ数であった場合は、不要
語として除去しない不要語除去処理部とを備えたことを
特徴とするキーワード抽出装置である。<Structure of Claim 13> In Claim 3,
Among a plurality of data stored in the database, the number of data including a specific primary keyword is determined as a compound word among a statistical information part corresponding to the primary keyword and a primary keyword stored in the primary keyword storage. An unnecessary word removal processing unit that does not remove the word as an unnecessary word as a primary keyword of the statistical information unit when the number of data is higher than a predetermined threshold value. It is a keyword extraction device.

【００３２】〈請求項１３の説明〉請求項１３の発明
は、不要語除去処理において、合成語として判定される
語を有するデータベース中のデータ数が、ある値以上で
あればキーワードとして出力するようにしたものであ
る。即ち、この発明では、語が出現したデータの個数を
利用するようにしたものであり、一つのデータ中に何回
その語が出現するかは問わない。このような構成によ
り、特定のデータに特定の語が多く出現するようなデー
タベースであっても、偏りのないキーワード抽出を行う
ことができる。<Explanation of Claim 13> According to the invention of claim 13, in the unnecessary word removal processing, if the number of data in a database having a word determined as a compound word is a certain value or more, it is output as a keyword. It was made. That is, in the present invention, the number of data in which a word appears is used, and it does not matter how many times the word appears in one data. With such a configuration, even in a database in which specific words frequently appear in specific data, keyword extraction without bias can be performed.

【００３３】〈請求項１４の構成〉請求項１２または１
３において、しきい値より高い場合とは、合成語として
判定された語の値が、この語を構成する基本語の値より
も高い場合であることを特徴とするキーワード抽出装置
である。<Structure of Claim 14> Claim 12 or 1
3, the keyword extraction device is characterized in that the case where the value is higher than the threshold value is a case where the value of a word determined as a compound word is higher than the value of a basic word constituting the word.

【００３４】〈請求項１４の説明〉請求項１４の発明
は、例えば、合成語「携帯電話」の出現回数が、基本語
「携帯」と「電話」の出現回数よりも高いといった場合
を、しきい値より高いと定義したものである。これによ
り、キーワードとして有用な語を出力することができ
る。<Explanation of Claim 14> The invention of claim 14 relates to, for example, a case where the number of appearances of the composite word “mobile phone” is higher than the number of occurrences of the basic words “mobile phone” and “telephone”. It is defined as higher than the threshold. Thereby, a word useful as a keyword can be output.

【００３５】[0035]

【発明の実施の形態】以下、本発明の実施の形態を図面
を用いて詳細に説明する。《具体例１》具体例１は、データベースのデータから、
先ず一次キーワードを作成し、次に、得られた一次キー
ワードから合成語を削除するようにしたものである。Embodiments of the present invention will be described below in detail with reference to the drawings. << Specific Example 1 >> Specific example 1 is based on data from a database.
First, a primary keyword is created, and then a compound word is deleted from the obtained primary keyword.

【００３６】〈構成〉図１は本発明のキーワード抽出装
置の具体例１を示す構成図である。図の装置は、例えば
マイクロコンピュータ等から構成され、データベース
１、一次キーワード切出処理部２、字種情報部３、一次
キーワード格納部４、不要語除去処理部５、キーワード
情報格納部６からなる。<Structure> FIG. 1 is a diagram showing a specific example 1 of a keyword extracting apparatus according to the present invention. The apparatus shown in the figure is constituted by, for example, a microcomputer or the like, and comprises a database 1, a primary keyword extraction processing unit 2, a character type information unit 3, a primary keyword storage unit 4, an unnecessary word removal processing unit 5, and a keyword information storage unit 6. .

【００３７】データベース１はデータの集合体であり、
ハードディスク等に格納され、各データにはテキスト部
が存在する。The database 1 is an aggregate of data.
Each data is stored in a hard disk or the like, and each data has a text portion.

【００３８】一次キーワード切出処理部２は、データベ
ース１から個々のテキスト部を読み出し、字種情報部３
の字種情報を用いて、キーワードの可能性が高い文字列
（一次キーワード）を抽出する機能を有している。字種
情報部３は、テキストから切り出す文字の種類の情報を
格納している。The primary keyword extraction processing unit 2 reads out each text part from the database 1 and
Has a function of extracting a character string (primary keyword) that is likely to be a keyword using the character type information. The character type information section 3 stores information on the type of character cut out from the text.

【００３９】図２は、字種情報部３の内容説明図であ
る。字種情報部３では、キーワードとなる得る文字と、
そうでない文字との情報を示すものである。ここで、キ
ーワードとなる得る文字をキーワード文字、そうでない
文字を非キーワード文字と呼ぶ。全ての文字は必ずどち
らかだけに属するよう構成されており、字種情報部３で
は、図示のように、非キーワード文字を全て記述してい
る。即ち、非キーワード文字とは、ひらがなの全部と一
部の記号（、。・等）であると定義している。FIG. 2 is an explanatory diagram of the contents of the character type information section 3. In the character type information section 3, characters that can be keywords are
It indicates information with characters that are not. Here, characters that can be keywords are called keyword characters, and characters that are not are called non-keyword characters. All characters always belong to only one of them, and the character type information section 3 describes all non-keyword characters as shown in the figure. That is, non-keyword characters are defined as all and some symbols (, .., etc.) in Hiragana.

【００４０】一次キーワード格納部４は、ハードディス
クやメモリ上に構成され、一次キーワード切出処理部２
で切り出された一次キーワードを格納する格納部であ
る。不要語除去処理部５は、一次キーワード切出処理部
２が、一次キーワード切出処理を終えた信号を受け取っ
てから、一次キーワード格納部４に格納された一次キー
ワードから不要と思われる語を削除して最終的なキーワ
ードを生成する処理部である。また、キーワード情報格
納部６は、ハードディスクやメモリ上に構成され、不要
語除去処理部５から出力されたキーワードを格納する格
納部である。The primary keyword storage unit 4 is configured on a hard disk or a memory, and includes a primary keyword extraction processing unit 2.
This is a storage unit for storing the primary keywords extracted by. The unnecessary word removal processing unit 5 deletes a word that is considered unnecessary from the primary keywords stored in the primary keyword storage unit 4 after receiving the signal after the primary keyword extraction processing unit 2 has completed the primary keyword extraction processing. The processing unit generates the final keyword. The keyword information storage unit 6 is a storage unit that is configured on a hard disk or a memory and stores the keyword output from the unnecessary word removal processing unit 5.

【００４１】〈動作〉先ず、対象となるデータベース１
が次のように構成されているとする。図３は、データベ
ース１の内容説明図である。ここで、データ番号とは、
ひとまとまりのデータに付与されている番号であり、テ
キスト部とは、各データのテキスト部のデータを表して
いる。<Operation> First, the target database 1
Is configured as follows. FIG. 3 is an explanatory diagram of the contents of the database 1. Here, the data number is
The text part is a number assigned to a group of data, and the text part represents the data of the text part of each data.

【００４２】一次キーワード切出処理部２は、データベ
ース１を読み出し、各データのテキスト部に対し、一次
キーワード切出処理を行う。この切出処理とは、字種情
報部３の字種情報に基づき、テキスト中でキーワード文
字から構成される文字列を抽出することで行われる。即
ち、字種情報部３で示されている非キーワード文字以外
の文字からなる文字列を抽出する。この際、キーワード
文字であっても、１文字の文字列は対象としては削除す
る。これは、漢字１文字の場合はひらがなが後に続いて
形容詞や動詞を形成することがほとんどであり、このよ
うな語は検索のキーワードとして余り用いられないこと
による。また、漢字以外の文字種で１文字の文字列とし
て出現することは少ない。この処理により、検索に余り
有用でない語を除去することができる。例えば、図３に
示した例では、データ番号３の「調」「表」「違」がキ
ーワード文字で構成されているが、１文字であるために
一次キーワード格納部４には出力しない。The primary keyword extracting unit 2 reads the database 1 and performs a primary keyword extracting process on the text part of each data. This extraction processing is performed by extracting a character string composed of keyword characters in the text based on the character type information of the character type information unit 3. That is, a character string composed of characters other than the non-keyword characters indicated in the character type information section 3 is extracted. At this time, even if it is a keyword character, a character string of one character is deleted as a target. This is because in the case of one kanji character, hiragana is followed by an adjective or verb in most cases, and such words are rarely used as search keywords. In addition, a character type other than a kanji rarely appears as a single character string. By this processing, words that are not very useful for the search can be removed. For example, in the example shown in FIG. 3, “key”, “table”, and “difference” of data number 3 are composed of keyword characters, but are not output to the primary keyword storage unit 4 because they are one character.

【００４３】図４は、一次キーワード切出処理部２によ
って抽出された一次キーワードを示す説明図である。図
示のように、字種情報部３に示された非キーワード文字
と、１文字の文字列以外の文字列が、各データ番号毎に
一次キーワードとして抽出されている。FIG. 4 is an explanatory diagram showing primary keywords extracted by the primary keyword extraction processing unit 2. As shown in the figure, the non-keyword characters shown in the character type information section 3 and character strings other than one character string are extracted as primary keywords for each data number.

【００４４】不要語除去処理部５では、このようにして
得られた一次キーワードから不要な語を除去する。この
処理では、複数の一次キーワードを結合させてできる文
字列と等しい一次キーワードを除去する。ここで、複数
の一次キーワードを結合させてできる文字列を合成語と
いう。即ち、合成語とは、「意味のある自立した最小単
位の語（＝基本語）を複数連結することにより構成され
る語」である。The unnecessary word removing section 5 removes unnecessary words from the primary keywords obtained in this way. In this process, a primary keyword equal to a character string formed by combining a plurality of primary keywords is removed. Here, a character string formed by combining a plurality of primary keywords is called a composite word. That is, a compound word is a "word formed by connecting a plurality of meaningful and independent minimum unit words (= basic words)".

【００４５】図５は、不要語除去処理のフローチャート
である。図中、ｃｈｅｃｋ（ｗ［１，ｎ］）とは、対象
となるｎ文字の一次キーワードｗ［１，ｎ］が除去文字
列（除去される合成語）であるかを判定する処理であ
る。尚、ｗ［ｘ，ｙ］とは、対象文字列のｘ文字からｙ
文字までの部分文字列を表す。例えば、対象文字列とし
て、一次キーワード格納部４に格納されている「開発コ
スト」を例として説明する。従って、この場合は、ｃｈ
ｅｃｋ（ｗ［１，５］）となる。FIG. 5 is a flowchart of the unnecessary word removing process. In the figure, check (w [1, n]) is a process of determining whether the primary keyword w [1, n] of the target n characters is a removed character string (a compound word to be removed). Note that w [x, y] is obtained from the x character of the target character string to y.
Represents a substring up to the character. For example, “development cost” stored in the primary keyword storage unit 4 will be described as an example of the target character string. Therefore, in this case, ch
eck (w [1,5]).

【００４６】このような不要語除去処理では、先ず、ｉ
＝１とする（ステップＳ１）。即ち、１文字目から処理
を開始する。そして、対象文字列は２文字以上であるか
ら、ｉの値を＋１して２文字目までを対象とし（ステッ
プＳ２）、次のステップＳ３でｉ＝ｎであるかを判定す
る。即ち、対象文字列が２文字のみであるかを判定す
る。このステップＳ３において、対象文字列が２文字の
みであった場合は、ｆａｌｓｅとなり、その語をそのま
ま残す。これは、上述したように、１文字はキーワード
として対象外であるため、１文字２個の合成語も存在し
ないことになるためである。In such unnecessary word removal processing, first, i
= 1 (step S1). That is, the process starts from the first character. Then, since the target character string is two or more characters, the value of i is incremented by 1 and the second character is targeted (step S2), and it is determined in the next step S3 whether i = n. That is, it is determined whether the target character string is only two characters. In this step S3, if the target character string is only two characters, it becomes false and the word is left as it is. This is because, as described above, since one character is not a target as a keyword, there is no compound word of two characters.

【００４７】一方、ステップＳ３において、ｉ＝ｎでな
かった場合、即ち、「開発コスト」のように、対象文字
列が３文字以上であった場合は、ステップＳ４に進み、
ｗ［１，ｉ］が一次キーワードに存在するかを判定す
る。この場合は、「開発」が判定対象となり、データ番
号３に一次キーワード「開発」が存在するため、ステッ
プＳ５に進み、ｉ＝ｎかを判定する。このステップＳ５
において、ｉ＝ｎではないため、ステップＳ６に進む。
また、ステップＳ４において、ｗ［１，ｉ］が一次キー
ワードに存在しない場合は、ステップＳ２に戻る。On the other hand, if i = n is not satisfied in step S3, that is, if the target character string is three or more characters like "development cost", the process proceeds to step S4.
It is determined whether w [1, i] exists in the primary keyword. In this case, since “development” is to be determined and the primary keyword “development” exists in data number 3, the process proceeds to step S5, and it is determined whether i = n. This step S5
Since i is not equal to n, the process proceeds to step S6.
If w [1, i] does not exist in the primary keyword in step S4, the process returns to step S2.

【００４８】ステップＳ６では、ｃｈｅｃｋ（ｗ［ｉ＋
１，ｎ］）を行う。この例では、３文字目から５文字目
の「コスト」が一次キーワードに存在するかを判定す
る。この場合の処理は、上記の処理と同様に、３文字の
文字列が一次キーワードに存在するかを判定する。その
結果、データ番号２に“コスト”が存在することから、
ステップＳ４からステップＳ５に進み、ステップＳ５に
おいて、ｉ＝ｎとなるため、結果がｔｒｕｅとなり、そ
の語「開発コスト」を一次キーワードから削除する。In step S6, check (w [i +
1, n]). In this example, it is determined whether the “cost” of the third to fifth characters exists in the primary keyword. In this case, similarly to the above-described process, it is determined whether a three-character character string exists in the primary keyword. As a result, since "cost" exists in data number 2,
The process proceeds from step S4 to step S5. Since i = n in step S5, the result is true, and the word “development cost” is deleted from the primary keyword.

【００４９】このような処理を全ての一次キーワードに
施すことで、合成語の除去が行われる。例えば、上記の
例として挙げたように、データ番号１の一次キーワード
「開発コスト」は、データ番号３の「開発」とデータ番
号２の「コスト」の合成語である。また、データ番号４
の「売上動向」は、データ番号２の「売上」とデータ番
号１の「動向」の合成語である。従って、これら二つの
語はキーワードとして除去される。図６は、最終的に得
られたキーワードの説明図である。By applying such processing to all the primary keywords, the compound word is removed. For example, as described above, the primary keyword “development cost” of data number 1 is a composite word of “development” of data number 3 and “cost” of data number 2. Also, data number 4
“Sales trend” is a composite word of “sales” of data number 2 and “trend” of data number 1. Therefore, these two words are removed as keywords. FIG. 6 is an explanatory diagram of the finally obtained keyword.

【００５０】〈効果〉以上のように、具体例１では、テ
キスト部を含むデータからキーワードを自動的に付与す
ることが可能となる。しかも、基本語が連結してできる
語を除去することができるという効果を有している。更
に、このようなキーワードの抽出を行う場合に、データ
ベースに依存する情報を利用しないため、あらゆる分野
のテキストに対して本具体例を適用することが可能であ
る。<Effect> As described above, in the first embodiment, it is possible to automatically assign a keyword from data including a text portion. In addition, there is an effect that words formed by connecting basic words can be removed. Further, in extracting such a keyword, since the information depending on the database is not used, the present specific example can be applied to texts in all fields.

【００５１】《具体例２》具体例２は、具体例１の構成
に加えて、接頭語・接尾語情報を備え、この接頭語・接
尾語情報に基づき不要語除去処理を行うようにしたもの
である。<< Specific Example 2 >> In the specific example 2, in addition to the configuration of the specific example 1, prefix / suffix information is provided, and unnecessary word removal processing is performed based on the prefix / suffix information. It is.

【００５２】〈構成〉図７は、本発明のキーワード抽出
装置の具体例２の構成図である。図の装置は、データベ
ース１、一次キーワード切出処理部２、字種情報部３、
一次キーワード格納部４、不要語除去処理部５ａ、キー
ワード情報格納部６、接頭語・接尾語情報部７からな
る。ここで、データベース１〜キーワード情報格納部６
のうち、不要語除去処理部５ａを除く構成は、具体例１
と同様であるため、これらの説明は省略する。<Structure> FIG. 7 is a diagram showing the structure of a second embodiment of the keyword extracting apparatus according to the present invention. The apparatus shown in the figure includes a database 1, a primary keyword extraction processing unit 2, a character type information unit 3,
It comprises a primary keyword storage unit 4, an unnecessary word removal processing unit 5a, a keyword information storage unit 6, and a prefix / suffix information unit 7. Here, database 1 to keyword information storage unit 6
Among them, the configuration excluding the unnecessary word removal processing unit 5a is the specific example 1.
The description is omitted here.

【００５３】接頭語・接尾語情報部７は、予め決められ
た接頭語と接尾語の情報格納部であり、例えば次のよう
に構成されている。図８は、接頭語・接尾語情報部７の
内容説明図である。The prefix / suffix information section 7 is an information storage section for a predetermined prefix and suffix, and is configured as follows, for example. FIG. 8 is an explanatory diagram of the contents of the prefix / suffix information section 7.

【００５４】また、不要語除去処理部５ａは、具体例１
の機能に加えて、一次キーワード格納部４に格納された
一次キーワードのうち、接頭語・接尾語情報部７に格納
された接頭語または接尾語のうち少なくとも一方の接辞
を除いた文字列に対して不要語除去処理を行う機能を有
している。即ち、一次キーワードのうち、接頭語または
接尾語を除いた部分文字列が既に一次キーワード格納部
４に一次キーワードとして存在する一次キーワードを除
去する機能を有している。Further, the unnecessary word removal processing section 5a has a specific example 1.
In addition to the functions described above, among the primary keywords stored in the primary keyword storage unit 4, for the character string excluding at least one of the prefix or suffix stored in the prefix / suffix information unit 7 And has a function of performing unnecessary word removal processing. That is, the primary keyword has a function of removing a primary keyword whose partial character string excluding the prefix or suffix already exists as the primary keyword in the primary keyword storage unit 4.

【００５５】〈動作〉ここでは、具体例１と共通する動
作の説明は省略し、具体例２の特徴的な動作のみを説明
する。<Operation> The description of the operation common to the first embodiment will be omitted, and only the characteristic operation of the second embodiment will be described.

【００５６】不要語除去処理部５ａによる不要語の除去
処理は、以下の手順で行う。１．対象となる一次キーワードの先頭の文字列が接頭語
と一致した場合は、その接頭語部分を削除する。２．対象となる一次キーワードの末尾の文字列が接尾語
と一致した場合は、その接尾語部分を削除する。３．接頭語、接尾語を削除して残った文字列が一次キー
ワードとして存在するかを判定する。存在すれば対象と
なった一次キーワードを削除する。The unnecessary word removal processing by the unnecessary word removal processing section 5a is performed in the following procedure. 1. If the first character string of the target primary keyword matches the prefix, the prefix is deleted. 2. If the character string at the end of the target primary keyword matches the suffix, the suffix part is deleted. 3. It is determined whether the character string remaining after removing the prefix and suffix exists as a primary keyword. If it exists, delete the target primary keyword.

【００５７】上記の処理を全ての一次キーワードに対し
て行うことで、不要語の除去処理を行うことができる。
例えば、具体例１の例を用いて説明すると、先ず、対象
とするデータベース１が図３に示すように与えられてい
るとすると、一次キーワードの切出処理によって生成さ
れる一次キーワードは図４のようになる。ここで、デー
タ番号２の「各メーカー」は接頭語・接尾語情報部７に
定義されている接頭語「各」を持ち、この接頭語「各」
を除いた文字列「メーカー」は一次キーワードに存在し
ている。従って、除去条件に適合するので、「各メーカ
ー」は、一次キーワードから除去する。また、データ番
号３の「メーカー別」も接尾語「別」を持ち、かつ接尾
語「別」を除いた文字列「メーカー」が、一次キーワー
ドに存在する。従って、「メーカー別」も除去される。
同様にして、全一次キーワードについてこの処理を施
し、最終的なキーワードを獲得する。By performing the above processing for all the primary keywords, it is possible to remove unnecessary words.
For example, to explain using the example of the specific example 1, first, assuming that the target database 1 is given as shown in FIG. 3, the primary keyword generated by the extraction processing of the primary keyword is shown in FIG. Become like Here, "each manufacturer" of the data number 2 has a prefix "each" defined in the prefix / suffix information part 7, and this prefix "each".
The character string "maker" excluding is present in the primary keyword. Therefore, since each of the conditions matches the removal condition, “each maker” removes the primary keyword. Also, “manufacturer-specific” of data number 3 also has the suffix “other”, and a character string “maker” excluding the suffix “other” exists in the primary keyword. Therefore, “by manufacturer” is also eliminated.
Similarly, this process is performed for all primary keywords to obtain a final keyword.

【００５８】図９に、このようにして抽出されたキーワ
ードを示す。上記の接頭語・接尾語に基づく不要語除去
処理を行うため、例えば、図６で示した具体例１のキー
ワードに比べて、データ番号２の「各メーカー」とデー
タ番号３の「メーカー別」が除去されている。FIG. 9 shows the keywords extracted in this way. In order to perform the unnecessary word removal processing based on the prefix / suffix, for example, as compared with the keyword of the specific example 1 shown in FIG. 6, “each manufacturer” of data number 2 and “by manufacturer” of data number 3 Has been removed.

【００５９】〈効果〉以上のように、具体例２によれ
ば、具体例１に加えて次のような効果がある。即ち、具
体例１では、あるキーワードを除去するためには、構成
する部分文字列が全て一次キーワード中に存在しなけれ
ばならなかった。これに対し、具体例２では、基本語に
接頭語や接尾語が付属して一次キーワードとして切り出
された場合に、接頭語や接尾語が一次キーワード中に存
在しなくても除去できるという効果がある。<Effects> As described above, according to the specific example 2, the following effects are obtained in addition to the specific example 1. That is, in the first specific example, in order to remove a certain keyword, all of the constituent partial character strings had to be present in the primary keyword. On the other hand, in the specific example 2, when a prefix or a suffix is attached to a basic word and cut out as a primary keyword, the effect that the prefix or the suffix can be removed even if it does not exist in the primary keyword is obtained. is there.

【００６０】《具体例３》具体例３は、具体例１の構成
に加えて、予め決められた基本語文字列の字種と、この
基本語文字列を修飾し、かつ、異なる字種で構成された
修飾文字列の情報を格納する部分文字列情報を備え、こ
の部分文字列情報に基づいて不要語除去処理を行うよう
にしたものである。尚、基本語文字列とは、キーワード
として有効な文字列を指している。<< Specific Example 3 >> In the specific example 3, in addition to the structure of the specific example 1, the character type of the predetermined basic word character string, the basic character string is modified, and the character type is different. The apparatus is provided with partial character string information for storing information on the composed decoration character string, and performs an unnecessary word removal process based on the partial character string information. The basic word character string indicates a character string that is valid as a keyword.

【００６１】〈構成〉図１０は、本発明のキーワード抽
出装置の具体例３の構成図である。図の装置は、データ
ベース１、一次キーワード切出処理部２、字種情報部
３、一次キーワード格納部４、不要語除去処理部５ｂ、
キーワード情報格納部６、部分文字列情報部８からな
る。ここで、データベース１〜キーワード情報格納部６
のうち、不要語除去処理部５ｂを除く構成は、具体例１
と同様であるため、これらの説明は省略する。<Structure> FIG. 10 is a diagram showing the structure of a third embodiment of the keyword extracting apparatus according to the present invention. The apparatus shown in the figure includes a database 1, a primary keyword extraction processing unit 2, a character type information unit 3, a primary keyword storage unit 4, an unnecessary word removal processing unit 5b,
A keyword information storage unit 6 and a partial character string information unit 8 are provided. Here, database 1 to keyword information storage unit 6
Among them, the configuration excluding the unnecessary word removal processing unit 5b is the specific example 1.
The description is omitted here.

【００６２】本具体例では、キーワードとして有効な文
字列（＝基本語文字列）に修飾語的な文字列が連結して
できた文字列を除去することを目的としている。ここ
で、キーワードとして有効な文字列の文字種を基本語文
字種、修飾語的な文字列の文字種を修飾語文字種と呼
ぶ。このため、部分文字列情報部８には、英文字、数
字、カタカナを基本語文字種に、漢字を修飾語文字種と
したルールが記述されている。The purpose of this specific example is to remove a character string formed by connecting a character string effective as a keyword (= basic word character string) with a modifier character string. Here, the character type of a character string valid as a keyword is called a basic word character type, and the character type of a modifier-like character string is called a modifier character type. For this reason, the partial character string information section 8 describes rules in which English characters, numbers, and katakana are used as basic word character types, and kanji are used as modifier word character types.

【００６３】不要語除去処理部５ｂは、このような部分
文字列情報部８のルールに基づき、一次キーワード格納
部４の一次キーワードのうち、字種が「英文字ｏｒ数字
ｏｒカタカナ」＋「漢字」で構成されている文字列は、
「英文字ｏｒ数字ｏｒカタカナ」と「漢字」とに分割
し、「漢字」の部分が一次キーワードとして存在してい
た場合は、その一次キーワードを削除する。また、先頭
に修飾語文字列が存在する場合は、文字列全体で有用な
キーワードであることが多いことから、このような字種
の並びの場合は除去しない。The unnecessary word removal processing unit 5b, based on the rules of the partial character string information unit 8, sets the character type of the primary keyword in the primary keyword storage unit 4 to “English character or number or katakana” + “Kanji”. Is a string consisting of
It is divided into “English characters or numbers or katakana” and “Kanji”, and if the “Kanji” part exists as a primary keyword, the primary keyword is deleted. Also, if a modifier character string is present at the beginning, it is often a useful keyword for the entire character string. Therefore, such a character type arrangement is not removed.

【００６４】〈動作〉図１１は、本具体例の不要語除去
処理のフローチャートである。また、図１２は、本具体
例における一次キーワードの説明図である。<Operation> FIG. 11 is a flowchart of the unnecessary word removing process of this embodiment. FIG. 12 is an explanatory diagram of a primary keyword in this specific example.

【００６５】不要語除去処理部５ｂは、先ず、部分文字
列情報部８の情報に基づき、一次キーワードを、基本語
文字列と修飾語文字列に分割する（ステップＳ１）。例
えば、図１２に示すデータ番号１の「コンピュータ産
業」や「新型メモリ」およびデータ番号２の「ＥＣ首
脳」は、字種が「カタカナ＋漢字」であるため分割対象
となり、「コンピュータ」と「産業」、「新型」と「メ
モリ」および「ＥＣ」と「首脳」に分割される。The unnecessary word removal processing section 5b first divides the primary keyword into a basic word character string and a modifier word character string based on the information of the partial character string information section 8 (step S1). For example, “computer industry” and “new memory” of data number 1 and “EC leader” of data number 2 shown in FIG. 12 are subject to division because the character type is “Katakana + Kanji”, and “computer” and “ It is divided into "Industry", "New" and "Memory" and "EC" and "Leader".

【００６６】次に、ステップＳ２では、最初の部分文字
列が基本語文字列であるかを判定する。ここで、「コン
ピュータ」「産業」と「ＥＣ」「首脳」は最初の部分文
字列が基本語文字列であるため、次のステップＳ３に進
むが、「新型」「メモリ」は、最初の部分文字列が修飾
語文字列であるため、除去条件には適合せず、ステップ
Ｓ５に移行して「Ｒｅｔｕｒｎ」となる。Next, in step S2, it is determined whether or not the first partial character string is a basic word character string. Here, since the first partial character string of “computer”, “industry”, “EC”, and “leader” is the basic word character string, the process proceeds to the next step S3. Since the character string is a modifier character string, the character string does not satisfy the removal condition, and the process proceeds to step S5 to be “Return”.

【００６７】ステップＳ３では、全ての修飾語文字列
が、一次キーワードとして存在するかを判定する。例え
ば、「コンピュータ」「産業」は、修飾語文字列「産
業」が一次キーワードに存在するため、ステップＳ４に
進み、「コンピュータ産業」を除去し、「Ｒｅｔｕｒ
ｎ」となる（ステップＳ）。また、「ＥＣ」「首脳」
も、修飾語文字列「首脳」が一次キーワードに存在する
ため、「ＥＣ首脳」も除去される。このような処理を全
一次キーワードについて施すことにより、最終的なキー
ワードを獲得する。図１３は、最終的に得られたキーワ
ードの説明図である。In step S3, it is determined whether all the modifier character strings exist as primary keywords. For example, in the case of “computer” and “industry”, since the modifier character string “industry” exists in the primary keyword, the process proceeds to step S4, in which “computer industry” is removed, and “Retur” is removed.
n ”(step S). In addition, "EC""leaders"
Also, since the modifier character string “leader” exists in the primary keyword, “EC leader” is also removed. By performing such processing for all primary keywords, a final keyword is obtained. FIG. 13 is an explanatory diagram of keywords finally obtained.

【００６８】〈効果〉以上のように具体例３によれば、
具体例１の効果に加えて次のような効果がある。即ち、
具体例１では、あるキーワードを除去するためには、構
成する部分文字列が全て一次キーワード中に存在しなけ
ればならなかったが、本具体例では、字種が異なる基本
語が結合して切り出された場合に、漢字の文字列が一次
キーワードとして存在すれば、このような語を除去する
ことができるという効果を備えている。<Effects> As described above, according to the third embodiment,
The following effects are obtained in addition to the effects of the first embodiment. That is,
In the first specific example, in order to remove a certain keyword, all of the constituent character strings must exist in the primary keyword, but in this specific example, basic words having different character types are combined and cut out. If a kanji character string exists as a primary keyword in such a case, such a word can be eliminated.

【００６９】しかも、具体例２でも、基本語に修飾語的
な語が結合した語の除去を行っているが、具体例２で
は、修飾語的な語を具体的に記述する必要があるのに対
し、本具体例ではその必要がなく、効率的な不要語除去
を行うことができる。Moreover, even in the specific example 2, the word in which the modifier word is combined with the basic word is removed, but in the specific example 2, the modifier word needs to be specifically described. On the other hand, in this specific example, this is not necessary, and efficient unnecessary word removal can be performed.

【００７０】《具体例４》具体例４は、一次キーワード
を切り出す場合に、一次キーワードとして除去するため
の不要語を示す不要語情報と、不要語除去処理で残す合
成語を示す合成語情報と、不要語除去処理で除去しない
文字列の字種の情報を示す対象外文字種情報と備えたも
のである。<< Example 4 >> In Example 4, when extracting a primary keyword, unnecessary word information indicating an unnecessary word to be removed as a primary keyword, and synthetic word information indicating a synthetic word to be left in the unnecessary word removal processing And non-target character type information indicating character type information of a character string that is not removed by the unnecessary word removal processing.

【００７１】〈構成〉図１４は、本発明のキーワード抽
出装置の具体例４の構成図である。図の装置は、データ
ベース１、一次キーワード切出処理部２、字種情報部
３、一次キーワード格納部４、不要語除去処理部５ｃ、
キーワード情報格納部６、不要語情報部９、合成語情報
部１０、対象外文字種情報部１１からなる。ここで、デ
ータベース１〜キーワード情報格納部６のうち、一次キ
ーワード切出処理部２ａと不要語除去処理部５ｃを除く
構成は、具体例１と同様であるため、これらの説明は省
略する。<Structure> FIG. 14 is a diagram showing the structure of a fourth embodiment of the keyword extracting apparatus according to the present invention. The apparatus shown in the figure includes a database 1, a primary keyword extraction processing unit 2, a character type information unit 3, a primary keyword storage unit 4, an unnecessary word removal processing unit 5c,
It comprises a keyword information storage section 6, an unnecessary word information section 9, a composite word information section 10, and a non-target character type information section 11. Here, the configuration of the database 1 to the keyword information storage unit 6 except for the primary keyword extraction processing unit 2a and the unnecessary word removal processing unit 5c is the same as that of the first embodiment, and thus the description thereof is omitted.

【００７２】不要語情報部９は、キーワード文字の１文
字以上から構成される文字列ではあるが、キーワードと
して不要な語を示す情報の格納部である。図１５は不要
語情報部９の内容説明図である。The unnecessary word information section 9 is a storage section for information indicating a word which is a character string composed of one or more keyword characters but is unnecessary as a keyword. FIG. 15 is an explanatory diagram of the contents of the unnecessary word information section 9.

【００７３】一次キーワード切出処理部２ａは、データ
ベース１より切り出した文字列が不要語情報部９にある
語と一致した場合は、その語を一次キーワード格納部４
に出力しないよう構成されている。また、不要語情報部
９に示された不要語は、不要語除去処理部５ｃにおい
て、具体例１で述べた合成語の除去処理を行う際、一次
キーワードと同等の語として扱う。When the character string extracted from the database 1 matches a word in the unnecessary word information section 9, the primary keyword extraction processing section 2a stores the word in the primary keyword storage section 4.
Is configured not to output. The unnecessary words shown in the unnecessary word information section 9 are treated as words equivalent to the primary keywords when the unnecessary word removal processing section 5c performs the composite word removal processing described in the first specific example.

【００７４】合成語情報部１０は、複数の一次キーワー
ドまたは不要語情報部９に示されている不要語が結合し
てできる可能性のある文字列の中で、独立したキーワー
ドとして抽出する文字列を格納している。The synthesized word information section 10 extracts a character string to be extracted as an independent keyword from a plurality of primary keywords or a character string which may be formed by combining unnecessary words indicated in the unnecessary word information section 9. Is stored.

【００７５】図１６は、合成語情報部１０の内容説明図
である。図示のように、合成語情報部１０に記述されて
いる語は、「携帯電話」のように二つの語が組み合わさ
れて固有の意味を持つ語である。FIG. 16 is an explanatory diagram of the contents of the composite word information section 10. As shown in the figure, the words described in the composite word information section 10 are words having a unique meaning in which two words are combined, such as “mobile phone”.

【００７６】対象外文字種情報部１１には、不要語除去
処理部５ｃにおける不要語除去処理において、複数の一
次キーワードが連結されている場合でも、それが全て同
一文字種であれば、合成語とはしない文字種の情報が格
納されている。ここでは、その文字種として、英文字、
数字、カタカナが指定されている。In the non-target character type information section 11, even when a plurality of primary keywords are connected in the unnecessary word removal processing in the unnecessary word removal processing section 5c, if all of the primary keywords are of the same character type, the word is not a composite word. The information of the character type not to be stored is stored. Here, the character types are English characters,
Numbers and katakana are specified.

【００７７】不要語除去処理部５ｃは、合成語の除去処
理を行う際、上述したように、不要語情報部９に示され
た語を一次キーワードと同等の語として扱うが、合成語
情報部１０に示されている語および対象外文字種情報部
１１で指定されている文字種の語は、合成語としては対
象外として処理するよう構成されている。The unnecessary word removal processing unit 5c treats the word indicated in the unnecessary word information unit 9 as a word equivalent to the primary keyword as described above when performing the compound word removal process. The word shown in 10 and the word of the character type specified in the non-target character type information section 11 are configured to be processed as non-target words as composite words.

【００７８】〈動作〉本具体例では、例えば、一次キー
ワード切出処理部２ａが切り出す対象テキストを「携帯
電話の急激な普及増加は電話のあり方を一変させた」と
する。<Operation> In this specific example, for example, it is assumed that the text to be extracted by the primary keyword extraction processing unit 2a is "a sudden increase in the use of mobile phones has completely changed the way of telephones".

【００７９】先ず、一次キーワード切出処理部２ａで
は、対象テキストを、具体例１〜３の場合と同様に、キ
ーワード文字で分割し、その中から２文字以上で構成さ
れる文字列を抽出する。その結果、（携帯電話、急激、
普及増加、電話、一変）という語が抽出される。このう
ち、「急激」は、不要語情報部９に示されているので
（図１５参照）、一次キーワードとしては除去される。
従って、一次キーワード格納部４に格納された一次キー
ワードは、（携帯電話、普及増加、電話、一変）とな
る。First, the primary keyword extraction processing section 2a divides the target text into keyword characters, as in the first to third examples, and extracts a character string composed of two or more characters from the text. . As a result, (mobile, sudden,
(Increase in popularity, telephone, change). Of these, "rapid" is indicated in the unnecessary word information section 9 (see FIG. 15), and thus is removed as a primary keyword.
Therefore, the primary keyword stored in the primary keyword storage unit 4 is (mobile phone, increase in spread, telephone, change).

【００８０】不要語除去処理部５では、先ず、一次キー
ワードそれぞれについて、合成語情報部１０に記載され
ていないかをチェックする。もし、合成語情報部１０に
記載されていれば、その語をキーワード情報格納部６に
出力する。例えば、「携帯電話」は合成語情報部１０に
記載されているので、合成語判定をスキップし、キーワ
ード情報格納部６に出力される。一方、一次キーワード
で合成語情報部１０に記載されていないものについては
次のような不要語除去処理を行う。The unnecessary word removal processing section 5 first checks whether or not each primary keyword is described in the composite word information section 10. If the word is described in the composite word information section 10, the word is output to the keyword information storage section 6. For example, since “mobile phone” is described in the composite word information section 10, the composite word determination is skipped and output to the keyword information storage section 6. On the other hand, for primary keywords that are not described in the composite word information section 10, the following unnecessary word removal processing is performed.

【００８１】図１７は、不要語除去処理のフローチャー
トである。図中、ｃｈｅｃｋ（ｗ［１，ｎ］）とは、図
５に示した具体例１と同様に、対象となるｎ文字の一次
キーワードｗ［１，ｎ］が除去文字列（除去される合成
語）であるかを判定する処理である。FIG. 17 is a flowchart of the unnecessary word removing process. In the figure, check (w [1, n]) is a character string in which the primary keyword w [1, n] of the target n characters is a removed character string (synthesized to be removed), as in the specific example 1 shown in FIG. Word).

【００８２】不要語除去処理では、具体例１の動作と同
様に、ステップＳ１でｉ＝１およびｃｔ＝０とし、ステ
ップＳ２において、ｉの値を＋１する。尚、ステップＳ
１におけるｃｔとは文字種を示し、ｃｔ＝０とは文字種
の情報をリセットすることを表している。次に、ステッ
プＳ３でｉ＝ｎであるかを判定する。即ち、対象文字列
が２文字のみであるかを判定する。このステップＳ３に
おいて、対象文字列が２文字のみであった場合は、ｆａ
ｌｓｅとなり、その語をそのまま残す。これは、具体例
１と同様に、１文字２個の合成語は存在しないことによ
るからである。In the unnecessary word removing process, i = 1 and ct = 0 in step S1, and the value of i is incremented by one in step S2, as in the operation of the first embodiment. Step S
Ct in 1 indicates a character type, and ct = 0 indicates that information on the character type is reset. Next, it is determined in step S3 whether i = n. That is, it is determined whether the target character string is only two characters. In this step S3, if the target character string is only two characters, fa
1se, and leave the word as it is. This is because, as in the first embodiment, there is no compound word of two characters.

【００８３】一方、ステップＳ３において、ｉ＝ｎでな
かった場合、例えば「普及増加」のように、対象文字列
が３文字以上であった場合は、ステップＳ４に進み、ｗ
［１，ｉ］が一次キーワードまたは不要語情報部９に存
在するかを判定する。この場合、「普及」が判定対象と
なり、この語は不要語情報部９に存在するため、次のス
テップＳ５に進む。On the other hand, if i = n is not satisfied in step S3, for example, if the target character string has three or more characters, such as "increase in spread", the process proceeds to step S4, where w
It is determined whether [1, i] exists in the primary keyword or unnecessary word information section 9. In this case, “spread” is a determination target, and since this word exists in the unnecessary word information section 9, the process proceeds to the next step S5.

【００８４】ステップＳ５では、対象となる語が対象外
文字種情報部１１に記載されている文字種であるかを判
定する。この場合は、対象外文字種ではないためステッ
プＳ８に進み、ｉ＝ｎかを判定する。ステップＳ８で
は、ｉ＝ｎではないためステップＳ９に進み、ｃｈｅｃ
ｋ（ｗ［ｉ＋１，ｎ］）を行う。「増加」も不要語情報
部９に記載されているため、結果はｔｒｕｅとなり、合
成語と判定される。従って、キーワード情報格納部６に
は出力されない。In step S5, it is determined whether the target word is a character type described in the non-target character type information section 11. In this case, since the character type is not a non-target character type, the process proceeds to step S8 to determine whether i = n. In step S8, since i = n is not satisfied, the process proceeds to step S9, where check
k (w [i + 1, n]) is performed. Since “increase” is also described in the unnecessary word information section 9, the result is true, and it is determined that the word is a composite word. Therefore, it is not output to the keyword information storage 6.

【００８５】また、ステップＳ５〜ステップＳ７のは、
次のような処理を行うために設けられている。即ち、合
成語除去処理の際、特定の文字種の語は対象外とする
と、キーワード抽出の精度が向上することがある。例え
ば、英数字、カタカナ等から構成される語は偶然に合成
語であったり、合成語であっても重要な語であることが
多い。例えば、「ｉｎ」＋「ｐｕｔ」＝「ｉｎｐｕｔ」
や、「キー」＋「ワード」＝「キーワード」等である。
従って、対象外文字種情報部１１に、このような文字種
を指定することで、これらの文字種による合成語は、除
去対象の合成語としないことができる。Steps S5 to S7 are as follows:
It is provided to perform the following processing. That is, when words of a specific character type are excluded from the target in the compound word removal process, the accuracy of keyword extraction may be improved. For example, words composed of alphanumeric characters, katakana, and the like are accidentally compound words, and even compound words are important words in many cases. For example, “in” + “put” = “input”
Or “key” + “word” = “keyword”.
Therefore, by specifying such a character type in the non-target character type information unit 11, a compound word based on these character types can be excluded from the compound words to be removed.

【００８６】例えば、一次キーワードに「キー」「ワー
ド」「キーワード」が存在していたとする。不要語除去
処理において、「キーワード」という語が対象になった
とすると、先ず、「キー」という部分文字列で基本語が
発見される（ステップＳ１〜ステップＳ４）。そして、
ステップＳ５において、「キーワード」は一種類の対象
外文字種であるから、ステップＳ６に進む。ステップＳ
６では、ｃｔとｗ［１，ｉ］の文字種が同じであるかを
判定するが、ｃｔはステップＳ１でリセットされている
ため文字種が異なると判定され、ステップＳ７に進み、
ｃｔにカタカナという情報を格納する。For example, it is assumed that “key”, “word”, and “keyword” exist as primary keywords. Assuming that the word "keyword" is targeted in the unnecessary word removal processing, first, a basic word is found in a partial character string "key" (steps S1 to S4). And
In step S5, since the "keyword" is one kind of non-target character type, the process proceeds to step S6. Step S
In 6, it is determined whether the character type of ct and w [1, i] are the same. However, since ct has been reset in step S1, it is determined that the character type is different, and the process proceeds to step S7.
The information called katakana is stored in ct.

【００８７】残った「ワード」という文字列について
も、一次キーワードに存在するが、「ワード」はカタカ
ナであり、ｃｔと同じ文字種であると判定されるため、
ステップＳ６からステップＳ２に戻り、文字列はもう残
っていないため、最終的にステップＳ３からｆａｌｓｅ
となる。The remaining character string “word” also exists in the primary keyword, but “word” is katakana and is determined to be the same character type as ct.
Returning from step S6 to step S2, since no more character strings remain, finally, from step S3 false
Becomes

【００８８】これ以外で、例えば「ＮＧシーン」「Ｎ
Ｇ」「シーン」という語が一次キーワードとして存在し
た場合を考える。この場合は、「ＮＧ」も「シーン」も
対象外文字種であるが、「ＮＧ」は英文字であり、「シ
ーン」はカタカナであるため、ステップＳ６において、
ｃｔとｗ［３，５］の字種が一致せず、ステップＳ８に
進む。ここで、文字列はもう残っていないため、最終的
にｔｒｕｅとなり、「ＮＧシーン」は合成語と判定さ
れ、キーワード情報格納部６には出力されない。Other than this, for example, “NG scene”, “N
Consider the case where the words "G" and "scene" exist as primary keywords. In this case, both “NG” and “scene” are non-target character types, but “NG” is an English character and “scene” is katakana.
Since the character types of ct and w [3,5] do not match, the process proceeds to step S8. Here, since the character string no longer remains, it is finally true, and the “NG scene” is determined to be a composite word, and is not output to the keyword information storage unit 6.

【００８９】このような処理により、上記テキスト「携
帯電話の急激な普及増加は電話のあり方を一変させた」
の最終的なキーワードは、（携帯電話、電話、一変）と
なる。With the above processing, the above-mentioned text "The rapid increase in the use of mobile phones has completely changed the way telephones work."
Will be (mobile phone, phone, all-in-one).

【００９０】〈効果〉以上のように具体例４によれば、
具体例１の効果に加えて次のような効果がある。即ち、
不要語情報部９を備えることにより、一次キーワード切
出処理の段階で不要語を除去することが可能である。ま
た、不要語情報部９を利用して合成語の分解を行うこと
から、データベース１中に存在しない語による合成語を
発見し、除去することができる。更に、合成語情報部１
０により、合成語であるが、キーワードとして登録すべ
き語を発見し、保存することができるという効果を備え
ている。<Effects> As described above, according to the fourth embodiment,
The following effects are obtained in addition to the effects of the first embodiment. That is,
By providing the unnecessary word information section 9, unnecessary words can be removed at the stage of the primary keyword extraction processing. Further, since the compound word is decomposed using the unnecessary word information section 9, it is possible to find and remove a compound word by a word that does not exist in the database 1. Further, the compound word information section 1
0 has the effect that a word that is a compound word but should be registered as a keyword can be found and stored.

【００９１】《具体例５》具体例５は、一次キーワード
の統計値を求め、この統計値を利用して不要語除去処理
を行うようにしたものである。<< Example 5 >> In Example 5, a statistical value of a primary keyword is obtained, and unnecessary word removal processing is performed using the statistical value.

【００９２】〈構成〉図１８は、本発明のキーワード抽
出装置の具体例５の構成図である。図の装置は、データ
ベース１、一次キーワード切出処理部２ｂ、字種情報部
３、一次キーワード格納部４、不要語除去処理部５ｄ、
キーワード情報格納部６、統計情報部１２からなる。こ
こで、データベース１〜キーワード情報格納部６のう
ち、一次キーワード切出処理部２ｂと不要語除去処理部
５ｄを除く構成は、具体例１と同様であるため、これら
の説明は省略する。<Structure> FIG. 18 is a diagram showing the structure of a fifth embodiment of the keyword extracting apparatus according to the present invention. The apparatus shown in the figure includes a database 1, a primary keyword extraction processing unit 2b, a character type information unit 3, a primary keyword storage unit 4, an unnecessary word removal processing unit 5d,
It comprises a keyword information storage section 6 and a statistical information section 12. Here, the configuration of the database 1 to the keyword information storage unit 6 except for the primary keyword extraction processing unit 2b and the unnecessary word removal processing unit 5d is the same as that of the first embodiment, and thus the description thereof is omitted.

【００９３】一次キーワード切出処理部２ｂは、上記各
具体例と同様に、データベース１から一次キーワードを
抽出し、これを一次キーワード格納部４に格納すると共
に、統計情報部１２に出力するよう構成されている。ま
た、統計情報部１２は、一次キーワード切出処理部２ｂ
からの情報に基づき、データベース１中の一次キーワー
ドに関する統計情報（統計的指標）を記録する格納部で
ある。この統計情報としては、一次キーワードのデータ
ベース１中の出現回数とする。The primary keyword extraction processing unit 2b extracts primary keywords from the database 1, stores them in the primary keyword storage unit 4, and outputs them to the statistical information unit 12, as in the above specific examples. Have been. Further, the statistical information section 12 includes a primary keyword cutout processing section 2b.
This is a storage unit for recording statistical information (statistical index) related to the primary keyword in the database 1 based on the information from. The statistical information is the number of appearances of the primary keyword in the database 1.

【００９４】図１９は、統計情報の説明図である。この
統計情報は、一次キーワードを見出しとし、各見出しに
対応した整数変数を、データベース１中の出現回数とし
ている。FIG. 19 is an explanatory diagram of statistical information. In this statistical information, a primary keyword is used as a heading, and an integer variable corresponding to each heading is used as the number of appearances in the database 1.

【００９５】不要語除去処理部５ｄは、合成語の除去処
理を行う際、このような統計情報部１２に示された統計
情報を使用し、一次キーワード格納部４に格納された一
次キーワードのうち、合成語として判定された語が、予
め決められたしきい値よりも高い出現回数であった場合
は、不要語として除去しないよう構成されている。The unnecessary word removing section 5d uses the statistical information shown in the statistical information section 12 when performing the compound word removing processing, and uses the statistical information stored in the primary keyword storing section 4 for the primary keyword. When a word determined as a compound word has a higher number of appearances than a predetermined threshold, the word is not removed as an unnecessary word.

【００９６】〈動作〉一次キーワード切出処理部２ｂで
は、データベース１から一次キーワードを抽出すると、
この一次キーワードを一次キーワード格納部４に格納す
ると共に、統計情報部１２に出力する。統計情報部１２
では、入力された語をインデックスとする整数変数を１
増加させる。入力された語をインデックスとする整数変
数が存在しなかった場合は、新たにその語をインデック
スとする整数変数を作成し、初期値として１を代入す
る。従って、一次キーワード抽出処理が終了した時点で
は、統計情報部１２には、一次キーワードと、各一次キ
ーワードのデータベース１中の出現回数が組になって格
納されている。<Operation> In the primary keyword extraction processing unit 2b, when the primary keyword is extracted from the database 1,
This primary keyword is stored in the primary keyword storage unit 4 and output to the statistical information unit 12. Statistical information section 12
Then, an integer variable whose index is the input word is 1
increase. If there is no integer variable with the input word as an index, a new integer variable with the word as an index is created and 1 is substituted as an initial value. Therefore, when the primary keyword extraction processing is completed, the statistical information section 12 stores the primary keywords and the number of appearances of each primary keyword in the database 1 as a set.

【００９７】尚、統計情報部１２では、上記の出現回数
の代わりに出現データ数を用いてもよい。出現データ数
とは、データベース１中に複数のデータがあった場合
に、語が出現したデータの個数を指し、一つのデータ中
に何回その語が出現するかは問わない。特定のデータに
特定の語が多く出現するようなデータベースの場合、デ
ータ中の出現回数では不都合が生じることが多い。この
ような性質を持つデータベースに対しては出現データ数
を用いる方がよい。In the statistical information section 12, the number of appearance data may be used instead of the number of appearances. The number of appearing data refers to the number of data in which a word appears when there are a plurality of data in the database 1, and it does not matter how many times the word appears in one data. In the case of a database in which specific words frequently appear in specific data, inconvenience often occurs in the number of appearances in the data. For a database having such properties, it is better to use the number of occurrence data.

【００９８】次に、不要語除去処理部５ｄでは、統計情
報部１２の統計情報を基に不要な合成語の除去を行う。
この除去処理は、先ず、各一次キーワードに対し、図５
で示した具体例１の不要語の判定処理を行い、合成語で
あるかをチェックする。次に、合成語であると判定され
た一次キーワードと、この合成語である一次キーワード
を構成する基本語の統計情報を統計情報部１２から取り
出す。そして、合成語と基本語の統計情報を比較し、合
成語が有用であると判定した場合は、この一次キーワー
ドをキーワード情報格納部６に出力する。ここで、合成
語を有用と判定する基準としては、合成語の出現回数
が、構成するどの基本語の出現回数よりも多い場合が考
えられる。Next, the unnecessary word removing section 5d removes unnecessary compound words based on the statistical information of the statistical information section 12.
In this removal process, first, for each primary keyword, FIG.
The unnecessary word determination process of the specific example 1 shown in (1) is performed to check whether the word is a compound word. Next, the statistical information of the primary keyword determined to be a compound word and the basic word constituting the primary keyword as the compound word is extracted from the statistical information unit 12. Then, the statistical information of the compound word and the basic word is compared, and when it is determined that the compound word is useful, the primary keyword is output to the keyword information storage unit 6. Here, as a criterion for determining that a compound word is useful, it is conceivable that the number of appearances of the compound word is larger than the number of occurrences of any of the constituent basic words.

【００９９】例えば、データベース１から一次キーワー
ドを抽出した結果の統計情報部１２が図１９に示すよう
な状態であった場合、「携帯電話」は、基本語の「携
帯」と「電話」の合成語であるが、これらの基本語の出
現回数よりも多いため、有用な語であると判定し、最終
的なキーワードとして出力する。For example, when the statistical information section 12 resulting from the extraction of the primary keyword from the database 1 is in a state as shown in FIG. 19, “mobile phone” is a combination of the basic words “mobile” and “phone”. Although these words are more than the number of appearances of these basic words, they are determined to be useful words and are output as final keywords.

【０１００】〈効果〉以上のように具体例５によれば、
具体例１の効果に加えて次のような効果がある。即ち、
具体例４では、キーワードとする価値のある合成語は全
て情報として持っていなければならないが、本具体例で
は、このような合成語を自動的に判別し保存することが
できるという効果を備えている。<Effects> As described above, according to the fifth embodiment,
The following effects are obtained in addition to the effects of the first embodiment. That is,
In the specific example 4, all the compound words having a value as a keyword must be held as information, but in this specific example, such a compound word can be automatically determined and stored. I have.

【０１０１】尚、上記各具体例では、最終的なキーワー
ドはキーワード情報格納部６に格納するようにしたが、
得られたキーワードは、データベース１に付加情報とし
て付け加えることで統合してもよい。In each of the above specific examples, the final keyword is stored in the keyword information storage unit 6.
The obtained keywords may be integrated by adding them to the database 1 as additional information.

【０１０２】また、上記各具体例の全ての動作は、キー
ワード抽出装置の役割を行うコンピュータのプログラム
による制御で実現することができる。従って、そのプロ
グラムをフロッピーディスクやＣＤ−ＲＯＭ等の記録媒
体に記録してから、コンピュータにインストールした
り、あるいはネットワークからダウンロードしてインス
トールするといった方法や、そのプログラムをハードデ
ィスク等に予めインストールするといった方法によっ
て、本発明のキーワード抽出装置を実現することができ
る。Further, all the operations of each of the above specific examples can be realized by control by a program of a computer serving as a keyword extracting device. Therefore, a method of recording the program on a recording medium such as a floppy disk or a CD-ROM and then installing the program on a computer, or downloading and installing the program from a network, or a method of previously installing the program on a hard disk or the like Thereby, the keyword extracting device of the present invention can be realized.

[Brief description of the drawings]

【図１】本発明のキーワード抽出装置の具体例１の構成
図である。FIG. 1 is a configuration diagram of a specific example 1 of a keyword extraction device of the present invention.

【図２】本発明のキーワード抽出装置における字種情報
部の内容説明図である。FIG. 2 is an explanatory diagram of the contents of a character type information section in the keyword extraction device of the present invention.

【図３】本発明のキーワード抽出装置の具体例１におけ
るデータベースの内容説明図である。FIG. 3 is an explanatory diagram of the contents of a database in a specific example 1 of the keyword extracting device of the present invention.

【図４】本発明のキーワード抽出装置の具体例１におけ
る抽出された一次キーワードの説明図である。FIG. 4 is an explanatory diagram of an extracted primary keyword in a specific example 1 of the keyword extraction device of the present invention.

【図５】本発明のキーワード抽出装置の具体例１におけ
る不要語除去処理のフローチャートである。FIG. 5 is a flowchart of an unnecessary word removing process in the specific example 1 of the keyword extracting device of the present invention.

【図６】本発明のキーワード抽出装置の具体例１で獲得
されたキーワードの説明図である。FIG. 6 is an explanatory diagram of a keyword obtained in a specific example 1 of the keyword extracting device of the present invention.

【図７】本発明のキーワード抽出装置の具体例２の構成
図である。FIG. 7 is a configuration diagram of a specific example 2 of the keyword extraction device of the present invention.

【図８】本発明のキーワード抽出装置の具体例２におけ
る接頭語・接尾語情報部の内容説明図である。FIG. 8 is a diagram illustrating the contents of a prefix / suffix information section in a specific example 2 of the keyword extracting device of the present invention.

【図９】本発明のキーワード抽出装置の具体例２で獲得
されたキーワードの説明図である。FIG. 9 is an explanatory diagram of keywords acquired in a specific example 2 of the keyword extracting device of the present invention.

【図１０】本発明のキーワード抽出装置の具体例３の構
成図である。FIG. 10 is a configuration diagram of Example 3 of the keyword extraction device of the present invention.

【図１１】本発明のキーワード抽出装置の具体例３にお
ける不要語除去処理のフローチャートである。FIG. 11 is a flowchart of an unnecessary word removing process in a specific example 3 of the keyword extracting device of the present invention.

【図１２】本発明のキーワード抽出装置の具体例３にお
ける一次キーワードの説明図である。FIG. 12 is an explanatory diagram of a primary keyword in a specific example 3 of the keyword extraction device of the present invention.

【図１３】本発明のキーワード抽出装置の具体例３で獲
得されたキーワードの説明図である。FIG. 13 is an explanatory diagram of a keyword obtained in a specific example 3 of the keyword extracting device of the present invention.

【図１４】本発明のキーワード抽出装置の具体例４の構
成図である。FIG. 14 is a configuration diagram of a specific example 4 of the keyword extraction device of the present invention.

【図１５】本発明のキーワード抽出装置の具体例４にお
ける不要語情報部の内容説明図である。FIG. 15 is an explanatory diagram of the contents of an unnecessary word information section in a specific example 4 of the keyword extracting device of the present invention.

【図１６】本発明のキーワード抽出装置の具体例４にお
ける合成語情報部の内容説明図である。FIG. 16 is an explanatory diagram of the contents of a compound word information part in a specific example 4 of the keyword extracting device of the present invention.

【図１７】本発明のキーワード抽出装置の具体例４にお
ける不要語除去処理のフローチャートである。FIG. 17 is a flowchart of an unnecessary word removing process in a specific example 4 of the keyword extracting device of the present invention.

【図１８】本発明のキーワード抽出装置の具体例５の構
成図である。FIG. 18 is a configuration diagram of a specific example 5 of the keyword extraction device of the present invention.

【図１９】本発明のキーワード抽出装置の具体例５にお
ける統計情報の説明図である。FIG. 19 is an explanatory diagram of statistical information in a specific example 5 of the keyword extraction device of the present invention.

[Explanation of symbols]

１データベース２、２ａ、２ｂ一次キーワード切出処理部３字種情報部４一次キーワード格納部５、５ａ、５ｂ、５ｃ、５ｄ不要語除去処理部６キーワード情報格納部７接頭語・接尾語情報部８部分文字列情報部９不要語情報部１０合成語情報部１１対象外文字種情報部１２統計情報部 DESCRIPTION OF SYMBOLS 1 Database 2, 2a, 2b Primary keyword extraction processing part 3 Character type information part 4 Primary keyword storage part 5, 5a, 5b, 5c, 5d Unnecessary word removal processing part 6 Keyword information storage part 7 Prefix / suffix information part 8 Partial character string information part 9 Unnecessary word information part 10 Compound word information part 11 Non-target character type information part 12 Statistical information part

Claims

[Claims]

1. A character type information section having character type information indicating information of a character serving as a predetermined keyword, and a primary keyword extraction for extracting a character string serving as a primary keyword from a database based on the character type information. A processing unit, a primary keyword storage unit that stores the primary keywords extracted by the primary keyword extraction processing unit, and a plurality of primary keywords extracted by the primary keyword extraction processing unit, based on the primary keyword storage unit A keyword extraction device comprising: an unnecessary word removal processing unit that removes unnecessary primary keywords from stored primary keywords and outputs the same as keyword information.

2. The keyword extraction device according to claim 1, further comprising a primary keyword extraction processing unit that creates a primary keyword except for a character string in which the keyword character is one character.

3. The unnecessary keyword according to claim 1, wherein, among the primary keywords stored in the primary keyword storage, a primary keyword equal to a keyword obtained by combining a plurality of primary keywords created by the primary keyword extraction processing unit is used. A keyword extraction device comprising an unnecessary word removal processing unit for removing a keyword.

4. A process of extracting a keyword of a predetermined character from a database, and after the keyword extraction process, an unnecessary keyword is obtained based on a plurality of extracted keywords, and a final keyword is determined. A medium on which a keyword extraction control program is recorded, wherein the processing for obtaining is performed by a computer.

5. The prefix or suffix information unit for storing information of a predetermined prefix and suffix, and the prefix or suffix of the primary keywords stored in the primary keyword storage unit. A keyword extraction device comprising an unnecessary word removal processing unit for performing unnecessary word removal processing on a character string excluding a suffix.

6. The partial character string information according to claim 3, wherein a predetermined basic word character string and a modified character string composed of different character types for modifying the basic word character string are stored. And an unnecessary word removal processing unit that outputs the final keyword by removing the keyword existing in the primary keyword from the primary keyword stored in the primary keyword storage unit. Keyword extraction device featuring.

7. The keyword extracting device according to claim 6, further comprising an unnecessary word removal processing unit that does not remove a primary keyword whose first character string is a modifier character string as an unnecessary word.

8. The keyword extracting device according to claim 6, wherein the character types of the basic character string are English characters, numbers, and katakana characters, and the character type of the modifier character string is a Chinese character.

9. The unnecessary word information section for storing information of a predetermined unnecessary word, and a case where a primary keyword is extracted, wherein the unnecessary word is stored in the unnecessary word information section. Is a primary keyword extraction processing unit to be excluded from the primary keyword, a composite word information unit that stores information of a predetermined composite word, and is equal to the composite word among the primary keywords stored in the primary keyword storage unit. A keyword extraction device comprising: an unnecessary word removal processing unit that does not perform unnecessary word removal processing on a character string.

10. The non-target character type information section for storing information on a non-target character type that is not a predetermined compound word and a primary keyword stored in a primary keyword storage section according to claim 9, wherein A keyword extraction device comprising an unnecessary word removal processing unit that does not process a character string portion composed of a non-target character type as a combined portion.

11. The unnecessary keyword along with the primary keyword stored in the primary keyword storage unit when performing the unnecessary word removal processing among the primary keywords stored in the primary keyword storage unit. A keyword extraction device comprising an unnecessary word removal processing unit that uses unnecessary words stored in information.

12. The statistical information section according to claim 3, wherein a statistical information section indicating the number of appearances of each primary keyword in the database, and a primary keyword stored in the primary keyword storage section, wherein the word determined as a compound word is the statistical keyword. A keyword extracting device, comprising: an unnecessary word removal processing unit that does not remove, as a primary keyword of an information part, an unnecessary word when the number of appearances is higher than a predetermined threshold value.

13. The statistical information section according to claim 3, wherein the number of data including a specific primary keyword among a plurality of data stored in the database is stored in a primary keyword storage section corresponding to the primary keyword. In the case where the word determined as a compound word among the primary keywords thus determined has a data count higher than a predetermined threshold value as the primary keyword of the statistical information section, the unnecessary word not removed as an unnecessary word A keyword extraction device comprising a removal processing unit.

14. The method according to claim 12, wherein the case where the value is higher than the threshold value is a case where a value of a word determined as a compound word is higher than a value of a basic word constituting the word. Keyword extraction device.