JPH07262217A

JPH07262217A - Text retrieval device

Info

Publication number: JPH07262217A
Application number: JP6053580A
Authority: JP
Inventors: Takehiro Koyama; 剛弘小山
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1994-03-24
Filing date: 1994-03-24
Publication date: 1995-10-13

Abstract

PURPOSE:To provide a text retrieval device which precisely gives priority by giving priority to retrieval results by keyword and cooccurrence words. CONSTITUTION:A keyword designation part 1 receives the keywords inputted from users and respectively transmits the keywords to a text retrieval part 2 and a cooccurrence word retrieval part 4. The text retrieval part 2 retrieves a text data base 3 by the obtained keywords. On the other hand, the cooccurrence word retrieval part 4 retrieves a cooccurrence word data base 5 by the obtained keywords and obtains information on the cooccurrence word. A priority calculation part 6 calculates the priority of a retrieval item retrieved in the text retrieval part 2 based on the keywords and information on the cooccurrence word obtained in the cooccurrence word retrieval part 4 and sorts and displays the retrieval results in the order of priority by a text display part 7.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、キーワードによるテキ
ストの検索を行なうテキスト検索装置に関するものであ
り、特に、キーワードの共起情報（キーワードと関係の
強い語の情報）を用いて検索された項目の優先度付けを
行なう機能を有するテキスト検索装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text search device for searching a text by a keyword, and in particular, an item searched by using keyword co-occurrence information (information of a word strongly related to the keyword). The present invention relates to a text search device having a function of prioritizing.

【０００２】[0002]

【従来の技術】キーワードによるテキストの検索におい
て、テキストの規模が大きくなるにしたがって検索され
る（ヒットする）項目も多くなり、チェックするのが大
変になってきている。また、キーワードとヒットした項
目の関係も重要なものから軽く触れられたものまで様々
である。そこで、検索された項目の優先度付けを行な
い、重要な項目から必要なレベルのものまでチェックす
ることにより、効率的に必要な情報を得ることができ
る。2. Description of the Related Art In a text search using a keyword, the number of items to be searched (hit) increases as the size of the text increases, which makes it difficult to check. Also, the relationship between keywords and hit items varies from important to lightly touched. Therefore, by prioritizing the retrieved items and checking from the important items to the required level, the necessary information can be efficiently obtained.

【０００３】検索された項目が重要であるかどうかを判
断するには、その項目中のキーワードに関する記述量が
重要である。キーワードに関する記述には、キーワード
が直接記述されている場合と、間接的に記述されている
場合が考えられるが、キーワードの個数はキーワードに
関する直接記述の量をある程度反映していると考えられ
る。そのため、従来の装置では、テキスト中に含まれる
キーワードの個数によって優先度付けを行ない、キーワ
ードの出現した回数が多いほど優先度が高いとするもの
がある。In order to judge whether or not the retrieved item is important, the amount of description regarding the keyword in the item is important. The description regarding the keyword may be a case where the keyword is directly described or a case where the keyword is described indirectly, and it is considered that the number of keywords reflects the amount of the direct description regarding the keyword to some extent. Therefore, in some conventional devices, prioritization is performed according to the number of keywords included in the text, and the higher the number of times a keyword appears, the higher the priority.

【０００４】しかし、テキスト中に含まれるキーワード
の個数が十分多い場合には、直接記述の量だけである程
度重要性を判断できるが、キーワードの個数が少ない場
合は判断ができなくなったり、ユーザの要求とかけ離れ
た判断がなされることもある。そのため、キーワードが
直接記述されている量だけでなく、間接的な記述の量も
考慮する必要がある。すなわち、従来の方式では、検索
結果の優先度付けをキーワードの個数で行なうので、キ
ーワードが直接記述されている量しか反映できず、キー
ワードの個数が少ない場合、精度良く優先度付けを行な
うことができない。特に、見出しとその説明の繰り返し
である辞典タイプのテキストの場合、１項目当たりの文
章量が少なく、その中に含まれるキーワードの個数も少
なくなるため問題である。However, when the number of keywords contained in the text is large enough, the importance can be judged to some extent only by the amount of the direct description, but when the number of keywords is small, it cannot be judged or the user's request. There are also cases where the judgment is far from being made. Therefore, it is necessary to consider not only the amount of keywords directly described but also the amount of indirect descriptions. That is, in the conventional method, since the search results are prioritized by the number of keywords, only the amount of directly describing the keywords can be reflected. When the number of keywords is small, the prioritization can be performed accurately. Can not. In particular, in the case of dictionary-type text in which headings and their explanations are repeated, the amount of sentences per item is small, and the number of keywords contained in them is also small, which is a problem.

【０００５】別の従来の装置として、例えば、特開平４
−２８１５６５号公報に記載されている文書検索装置が
ある。この装置では、テキストを前書き、本文等の部分
に分割し、どの部分にキーワードが出現したかにより重
み付けを行ない、優先度を算出している。この装置によ
れば、テキストの重要な部分にキーワードが現われるテ
キストを優先してユーザに提示することができる。その
ため、例えば、辞典タイプのテキストであっても、見出
しにキーワードが存在するテキストを、本文にキーワー
ドが多く存在するテキストよりも優先してユーザに提示
することができる。しかし、この装置においても、間接
的な記述の量は考慮されていない。そのため、例えば、
見出しに間接的な表現があっても、直接的な記述がなけ
れば優先度は低くなり、所望の結果が得られない場合が
あるなどの問題がある。As another conventional apparatus, for example, Japanese Patent Laid-Open No. Hei.
There is a document search device described in Japanese Patent Publication No. 281565. In this device, the text is divided into parts such as the introductory text and the body, and the priority is calculated by weighting the part in which the keyword appears. According to this device, the text in which the keyword appears in the important part of the text can be preferentially presented to the user. Therefore, for example, even in the case of dictionary type text, the text in which the keyword is present in the headline can be presented to the user in preference to the text in which the keyword is present in the main body. However, even in this device, the amount of indirect description is not taken into consideration. So, for example,
Even if there is an indirect expression in the headline, if there is no direct description, the priority becomes low, and there is a problem that the desired result may not be obtained.

【０００６】[0006]

【発明が解決しようとする課題】本発明は、上述した事
情に鑑みてなされたもので、間接的な記述の量を共起語
の個数によって反映し、キーワードとその共起語で検索
結果の優先度付けを行なうことにより、キーワードの個
数が少ない場合でも、より精度良く優先度付けを行なう
ことができるテキスト検索装置を提供することを目的と
している。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned circumstances, and reflects the amount of indirect description by the number of co-occurrence words, and a keyword and its co-occurrence word are used as search results. An object of the present invention is to provide a text search device that can perform prioritization with higher accuracy by performing prioritization even when the number of keywords is small.

【０００７】[0007]

【課題を解決するための手段】本発明は、キーワードに
よるテキストの検索を行なうテキスト検索装置におい
て、キーワードの入力を受け付けるキーワード指定部
と、キーワードにより検索可能なテキストが格納されて
いるテキストデータベースと、前記キーワード指定部で
受け付けたキーワードにより前記テキストデータベース
の検索を行なうテキスト検索部と、ある語と強い関係に
ある共起語の情報を格納した共起語データベースと、前
記キーワード指定部で受け付けたキーワードにより前記
共起語データベースを検索する共起語検索部と、前記キ
ーワード指定部で受け付けたキーワードと該キーワード
により前記共起語検索部で検索された共起語の情報によ
り前記テキスト検索部における検索結果の優先度を計算
する優先度計算部を有することを特徴とするものであ
る。SUMMARY OF THE INVENTION The present invention is, in a text search device for searching text by keyword, a keyword designating section for accepting input of a keyword, a text database storing text searchable by the keyword, A text search unit that searches the text database with the keyword accepted by the keyword designation unit, a co-occurrence word database that stores information on a co-occurrence word that has a strong relationship with a certain word, and a keyword accepted by the keyword designation unit A co-occurrence word search unit that searches the co-occurrence word database with a keyword, a keyword accepted by the keyword designation unit, and a search in the text search unit based on the co-occurrence word information searched by the keyword in the co-occurrence word search unit. Has a priority calculation part that calculates the priority of the result And it is characterized in Rukoto.

【０００８】[0008]

【作用】本発明によれば、キーワード指定部でユーザの
指定したキーワードにより、テキスト検索部でテキスト
データベースを検索するとともに、共起語検索部で共起
語データベースを参照し、共起語の情報を得る。そし
て、優先度計算部において、キーワードと、共起語検索
部で得られた共起語の情報により、テキスト検索部によ
る検索結果の優先度付けを行なう。従来のキーワードの
みによる優先度付けでは、キーワードに関する直接記述
しか反映されなかったが、本発明のテキスト検索装置で
は、キーワードとその共起語による優先度付けを行なっ
て、キーワードに関する直接記述と間接記述の量の両方
を反映することができ、より精度の高い優先度付けを行
なうことができる。According to the present invention, the text search section searches the text database with the keyword specified by the user in the keyword specification section, and the co-occurrence word search section refers to the co-occurrence word database to obtain the co-occurrence word information. To get Then, the priority calculation unit prioritizes the search results by the text search unit based on the keywords and the information of the co-occurrence words obtained by the co-occurrence word search unit. In the prioritization using only keywords, only direct description about keywords is reflected. However, in the text search device of the present invention, prioritization based on keywords and their co-occurrence words is performed to directly and indirectly describe keywords. It is possible to reflect both of the amounts of, and more accurate prioritization can be performed.

【０００９】[0009]

【実施例】図１は、本発明のテキスト検索装置の１実施
例を示すブロック図である。図中、１はキーワード指定
部、２はテキスト検索部、３はテキストデータベース、
４は共起語検索部、５は共起語データベース、６は優先
度計算部、７はテキスト表示部である。1 is a block diagram showing an embodiment of a text search device according to the present invention. In the figure, 1 is a keyword designation part, 2 is a text search part, 3 is a text database,
Reference numeral 4 is a co-occurrence word search unit, 5 is a co-occurrence word database, 6 is a priority calculation unit, and 7 is a text display unit.

【００１０】キーワード指定部１は、ユーザからの検索
のためのキーワードの入力を受け、テキスト検索部２及
び共起語検索部４に伝達する。テキスト検索部２は、キ
ーワードを得ると、テキストデータベース３を検索し、
検索結果を優先度計算部６に伝達する。テキストデータ
ベース３には、テキストが格納されており、任意の単語
で検索することによりその単語に関する記述を得ること
ができる。The keyword designating section 1 receives an input of a keyword for a search from the user and transmits it to the text searching section 2 and the co-occurrence word searching section 4. The text search unit 2 searches the text database 3 when the keyword is obtained,
The search result is transmitted to the priority calculation unit 6. Text is stored in the text database 3, and a description about the word can be obtained by searching for an arbitrary word.

【００１１】共起語検索部４は、キーワードを得ると共
起語データベース５を検索し、共起語の情報を優先度計
算部６に伝達する。共起語データベース５には、単語を
見出し語として、その見出し語と関係が強い語（共起
語）の情報が格納されている。例えば、見出し語を「Ａ
Ｉ」，「ＯＡ」とすると、見出し語共起語ＡＩ・・・・人工知能、コンピュータ、エキスパートシステムＯＡ・・・・ワープロ、パーソナルコンピュータ、電子ファイルといった情報が格納されている。The co-occurrence word search unit 4 searches the co-occurrence word database 5 when the keyword is obtained, and transmits the co-occurrence word information to the priority calculation unit 6. The co-occurrence word database 5 stores information on words (co-occurrence words) having a strong relationship with the head word as a head word. For example, the entry word is "A
Information such as headword, co-occurrence word AI, ... artificial intelligence, computer, expert system OA, ... word processor, personal computer, electronic file is stored.

【００１２】優先度計算部６は、テキスト検索部２によ
る検索結果と、共起語検索部４からの共起語の情報を得
ると、検索された項目の優先度を計算し、その情報をテ
キスト表示部７に伝達する。テキスト表示部７は、検索
項目の優先度の情報を得ると、優先度順に検索項目をソ
ートし、その情報をユーザに表示する。テキスト表示部
７により検索項目を表示するほか、ファイルに結果を格
納したり、別の装置の入力とすることも可能である。When the priority calculation unit 6 obtains the search result by the text search unit 2 and the information of the co-occurrence word from the co-occurrence word search unit 4, the priority calculation unit 6 calculates the priority of the searched item and stores the information. It is transmitted to the text display unit 7. When the text display unit 7 obtains the priority information of the search items, it sorts the search items in order of priority and displays the information to the user. In addition to displaying the search items on the text display unit 7, the results can be stored in a file or input to another device.

【００１３】次に、本発明のテキスト検索装置の一実施
例の動作を具体例をもとに説明する。図２は、本発明の
テキスト検索装置の一実施例における動作の一例の説明
図、図３は、テキストデータベースに格納されているテ
キストデータの一例の説明図である。以下の説明では、
ユーザがキーワードとして「ＡＩ」を指定したときの各
構成部の入力および出力の例について説明する。Next, the operation of one embodiment of the text search apparatus of the present invention will be described based on a concrete example. FIG. 2 is an explanatory diagram of an example of the operation in one embodiment of the text search device of the present invention, and FIG. 3 is an explanatory diagram of an example of text data stored in the text database. In the explanation below,
An example of input and output of each component when the user specifies "AI" as a keyword will be described.

【００１４】キーワード指定部１は、ユーザから入力さ
れたキーワード「ＡＩ」を得ると、そのキーワード「Ａ
Ｉ」をテキスト検索部２と共起語検索部４にそれぞれ伝
達する。テキスト検索部２では、キーワード「ＡＩ」を
得ると、テキストデータベース３を検索する。いま、図
３に示したテキストデータがテキストデータベース３に
格納されているものとする。キーワード「ＡＩ」による
検索の結果、図３に二重線で示した語がマッチし、検索
結果として、「法律エキスパートシステム・・・」、
「プログラム売買・・・」、「ＡＩ・・・」、「ピボッ
ト方式・・・」の各テキストが得られる。これらの検索
結果は優先度計算部６に伝達される。When the keyword designation unit 1 obtains the keyword "AI" input by the user, the keyword "A"
I ”is transmitted to the text search unit 2 and the co-occurrence word search unit 4, respectively. When the keyword “AI” is obtained, the text search unit 2 searches the text database 3. Now, it is assumed that the text data shown in FIG. 3 is stored in the text database 3. As a result of the search using the keyword “AI”, the words shown in double lines in FIG. 3 match, and the search result is “law expert system ...”,
Each text of "program trading ...", "AI ...", and "pivot method ..." is obtained. These search results are transmitted to the priority calculation unit 6.

【００１５】一方、共起語検索部４では、キーワード
「ＡＩ」を得ると、共起語データベース５を検索し、
「ＡＩ」の共起語の情報を得る。いま、「ＡＩ」の共起
語の情報として、「人工知能、ロボット、エキスパート
システム、機械翻訳、推論、学習」が得られたとする。
得られた共起語の情報は、優先度計算部６に伝達され
る。On the other hand, when the keyword "AI" is obtained, the co-occurrence word search unit 4 searches the co-occurrence word database 5,
Obtain information on the co-occurrence word of "AI". Now, it is assumed that "artificial intelligence, robot, expert system, machine translation, inference, learning" is obtained as the information of the co-occurrence word of "AI".
The obtained co-occurrence word information is transmitted to the priority calculation unit 6.

【００１６】優先度計算部６では、テキスト検索部２に
おいてキーワード「ＡＩ」で検索した結果「法律エキス
パートシステム・・・」、「プログラム売買・・・」、
「ＡＩ・・・」、「ピボット方式・・・」と、共起語検
索部４においてキーワード「ＡＩ」で検索した結果得ら
れる共起語の情報「人工知能、ロボット、エキスパート
システム、機械翻訳、推論、学習」を得ると、検索項目
の優先度を計算し、その結果をテキスト表示部７に伝達
する。テキスト表示部７では、優先度付けされた検索結
果を得ると、優先度順に検索項目をソートしてその情報
をユーザに表示する。In the priority calculation unit 6, as a result of searching with the keyword "AI" in the text search unit 2, "law expert system ...", "Program trading ...",
"AI ...", "pivot method ...", and co-occurrence word information "artificial intelligence, robot, expert system, machine translation," obtained as a result of searching with the keyword "AI" in the co-occurrence word search unit 4. When "inference and learning" is obtained, the priority of the search item is calculated and the result is transmitted to the text display unit 7. When the text display unit 7 obtains the search results with priorities, the search items are sorted in order of priority and the information is displayed to the user.

【００１７】優先度計算部６における優先度の計算は、
例えば、以下のような手順で行なうことができる。ま
ず、各検索項目を次の３つのグループに分ける。１．項目名にキーワードを含む。２．項目名に共起語を含む。３．それ以外。各グループは、この順で順序付けられる。そして、各グ
ループ内で項目内に含まれるキーワードと共起語の個数
で各項目を順位付ける。このとき、キーワードの個数を
共起語の個数より優先する。The priority calculation in the priority calculation unit 6 is as follows.
For example, the following procedure can be performed. First, each search item is divided into the following three groups. 1. Include a keyword in the item name. 2. Item names include co-occurrence words. 3. other than that. Each group is ordered in this order. Then, each item is ranked by the number of keywords and co-occurrence words included in the item in each group. At this time, the number of keywords has priority over the number of co-occurrence words.

【００１８】この具体例では、検索結果の各テキストに
は、キーワードとともに、共起語が存在する。図３にお
いて、共起語にアンダーラインを付して示している。ま
ず、項目名にキーワードを含む検索結果として、「ＡＩ
・・・」というテキストが存在する。項目名に共起語を
含む検索結果としては、「法律エキスパートシステム・
・・」というテキストが存在する。それ以外に、「プロ
グラム売買・・・」、「ピボット方式・・・」というテ
キストが存在する。さらに、「プログラム売買・・・」
というテキストには、キーワード「ＡＩ」が１つ存在
し、「ピボット方式・・・」というテキストには、キー
ワード「ＡＩ」が１つと共起語「機械翻訳」が１つ存在
する。これらのことから、優先度は、「ＡＩ・・・」、
「法律エキスパートシステム・・・」、「ピボット方式
・・・」、「プログラム売買・・・」の順となる。In this specific example, a co-occurrence word is present in each text of the search result together with the keyword. In FIG. 3, the co-occurrence words are shown with an underline. First, as a search result that includes a keyword in the item name, "AI
There is a text "...". Search results that include a co-occurrence word in the item name include "Legal expert system.
There is a text "...". Other than that, there are texts such as "program trading ..." and "pivot method ...". In addition, "Program trading ..."
There is one keyword "AI" in the text, and one text "AI" and one co-occurrence word "machine translation" exist in the text "Pivot method ...". From these things, the priority is "AI ...",
"Law expert system ...", "Pivot method ...", "Program trading ..."

【００１９】優先度の計算の別の例として、以下のよう
な点数計算により優先度の計算を行なうことができる。１．項目名にキーワードを含む場合、＋１００を点数に
加える。２．項目名に共起語を含む場合、＋１０を点数に加え
る。３．（キーワードの個数）＋（共起語の個数）×０．７
を点数に加える。このような点数計算の結果得られる各テキストの点数を
優先度として用いる。テキスト表示部７は、この点数を
もとに検索項目をソートしてその情報をユーザに表示す
る。As another example of the priority calculation, the priority can be calculated by the following score calculation. 1. If the item name contains a keyword, add +100 to the score. 2. If the item name contains a co-occurrence word, add +10 to the score. 3. (Number of keywords) + (Number of co-occurrence words) x 0.7
Is added to the score. The score of each text obtained as a result of such score calculation is used as the priority. The text display unit 7 sorts the search items based on this score and displays the information to the user.

【００２０】この具体例では、検索された各テキストの
キーワードと共起語の個数、点数、優先順位は、図４に
示すようになる。すなわち、テキスト「法律エキスパー
トシステム・・・」の場合、項目名に共起語を含んでい
るので＋１０、キーワード１個、共起語２個なので、１
＋２×０．７＝２．４となり、合計１２．４である。テ
キスト「プログラム売買・・・」の場合、項目名にキー
ワード、共起語を含んでいないので、キーワード１個分
の１となる。テキスト「ＡＩ・・・」の場合、項目名に
キーワードを含んでいるので＋１００、キーワード１
個、共起語５個なので、１＋５×０．７＝４．５とな
り、合計１０４．５である。テキスト「ピボット方式・
・・」の場合、キーワード１個と、共起語１個なので、
１＋１×０．７＝１．７となる。In this specific example, the number of keywords and co-occurrence words of each searched text, the score, and the priority order are as shown in FIG. That is, in the case of the text "Legal expert system ...", the item name includes a co-occurrence word, so +10, one keyword, and two co-occurrence words, so 1
+ 2 × 0.7 = 2.4, which is a total of 12.4. In the case of the text “Buy and sell programs ...”, since the item name does not include a keyword and a co-occurrence word, the number is one for each keyword. In the case of the text “AI ...”, the keyword is included in the item name, so +100, keyword 1
Since there are 5 co-occurrence words, 1 + 5 × 0.7 = 4.5, which is a total of 104.5. Text "Pivot method
・・ ”, One keyword and one co-occurrence word
1 + 1 × 0.7 = 1.7.

【００２１】このように各項目の点数を計算し、点数の
多い順に優先順位を与える。その結果、優先度付けされ
た（優先順位、検索項目）の情報（（２法律エキスパ
ートシステム・・・）（４プログラム売買・・・）
（１ＡＩ・・・）（３ピボット形式・・・））が得
られる。この情報をテキスト表示部７に伝達し、テキス
ト表示部７で優先度順に検索項目をソートしてユーザに
表示する。In this way, the score of each item is calculated, and the order of priority is given to the item having the highest score. As a result, information of prioritized (priority order, search items) ((2 legal expert system ...) (4 program trading ...)
(1 AI ...) (3 Pivot type ...)) is obtained. This information is transmitted to the text display unit 7, and the text display unit 7 sorts the search items in order of priority and displays them for the user.

【００２２】優先度の計算のさらに別の例を説明する。
テキスト検索部２において検索された各テキストについ
て、まず、テキスト中のキーワード、共起語を含む文を
記述文とする。このとき、記述文に挟まれた文も記述文
とする。記述文に挟まれた文が複数存在する場合には、
任意に設定された文数以内の場合のみ、挟まれている文
を記述文とする。また、表題にキーワードが含まれると
きは、その段落に含まれている全部の文を記述文とす
る。このようにして記述文と判定された文の数を計数
し、記述文が多いテキストほど優先度が高いと判断す
る。このようにして検索結果のテキストに優先度を与
え、優先度順にソートしてテキスト表示部７に表示す
る。Still another example of calculating the priority will be described.
For each text searched by the text search unit 2, first, a sentence including a keyword and a co-occurrence word in the text is set as a descriptive sentence. At this time, the sentence sandwiched between the descriptive sentences is also the descriptive sentence. If there are multiple statements sandwiched between descriptive sentences,
Only when the number of sentences is less than the set number, the sandwiched sentence is regarded as a descriptive sentence. When the title includes a keyword, all sentences included in the paragraph are descriptive sentences. In this way, the number of sentences determined to be description sentences is counted, and it is determined that the text having more description sentences has higher priority. In this way, the texts of the search results are given priority, sorted in order of priority and displayed on the text display unit 7.

【００２３】図３に示したテキストデータベースに格納
されている「ファジー・・・」というテキストについ
て、キーワード「ファジー」によりこのテキストを検索
したとする。また、キーワード「ファジー」に対応する
共起語を「あいまいさ、ＡＩ」とする。このとき、キー
ワード「ファジー」、共起語「あいまいさ」、「ＡＩ」
を含む文「ファジーとは不確かのことである。」、「こ
の不確かさを応用しようとする工学的分野がファジー工
学である。」、「数値で表わせないあいまいさを含む経
験や勘を研究対象にしている。」の３つの文は記述文と
判定される。また、記述文に挟まれた１文も記述文とす
ると、キーワードや共起語を含まない第３文「カリフォ
ルニア大学のザデー教授がその創始者である。」も記述
文となる。もっとも、この例では、表題にキーワードが
含まれているので、全ての文が記述文となる。この例で
は、記述文は４つとなる。他の検索されたテキストでも
同様に記述文を計数し、記述文の多い順にソートして、
テキスト表示部７で表示することになる。It is assumed that the text "fuzzy ..." Stored in the text database shown in FIG. 3 is searched by the keyword "fuzzy". In addition, the co-occurrence word corresponding to the keyword “fuzzy” is defined as “ambiguity, AI”. At this time, the keyword "fuzzy", the co-occurrence word "ambiguity", "AI"
"Fuzzy is uncertain.", "The engineering field to which this uncertainty is applied is fuzzy engineering.", "Experience and intuition including ambiguity that cannot be expressed numerically are the subjects of research." It is judged that the three sentences of "I am doing." If one sentence sandwiched between the descriptive sentences is also a descriptive sentence, the third sentence that does not include a keyword or a co-occurrence word, "Professor Zaday of the University of California is the founder." Is also a descriptive sentence. However, in this example, since the title includes the keyword, all sentences are descriptive sentences. In this example, there are four descriptive sentences. Similarly, in other searched texts, the description sentences are counted and sorted in descending order of description sentences.
It will be displayed on the text display unit 7.

【００２４】次に、キーワードが複数の意味を持つ場合
について、具体例を用いて説明する。以下の説明では、
キーワードとして「ＣＤ」が入力されたものとする。ま
た、「ＣＤ」に対応する共起語として、「コンパクトデ
ィスク、デジタル、レーザー、音楽」が共起語データベ
ース５に格納されているものとする。「ＣＤ」には、
「コンパクトディスク」、「譲渡性預金（Ｃｅｒｔｉｆ
ｉｃａｔｉｏｎｏｆＤｅｐｏｓｉｔ）」、「現金自動
支払機（ＣａｓｈＤｉｓｐｅｎｓｅｒ）」、「通常軍
縮（ＣｏｎｖｅｎｔｉｏｎａｌＤｉｓａｒｍａｍｅｎ
ｔ）」、「シクロデキストリン」等の意味があり、テキ
スト検索により多くの項目がヒットすることになる。Next, the case where a keyword has a plurality of meanings will be described using a concrete example. In the explanation below,
It is assumed that "CD" is input as the keyword. Further, it is assumed that “compact disc, digital, laser, music” is stored in the co-occurrence word database 5 as a co-occurrence word corresponding to “CD”. On the "CD"
"Compact disc", "Negotiable certificates of deposit (Certif
"ICATION OF DEPOSIT", "Cash Dispenser", "Conventional Disarmament"
t) ”,“ cyclodextrin ”, etc., and many items will be hit by text search.

【００２５】図５は、キーワード「ＣＤ」により検索さ
れた各検索項目におけるキーワードと共起語の数、点
数、順位の説明図である。テキストデータベースに格納
されているテキストは示していないが、キーワード「Ｃ
Ｄ」による検索の結果、図５の左端欄に示すような検索
項目が検索されたものとする。ここで、「半導体レーザ
ー・・・」、「ＰＣＭ音楽放送・・・」、「ＣＤ−ＲＯ
Ｍ・・・」、「ＬＤ・・・」が「コンパクトディスク」
と関係がある。また、「自由金利商品・・・」は、「譲
渡性預金」、「バンクス・・・」は「現金自動支払
機」、「ジュネーブ軍縮会議・・・」は「通常軍縮」、
「分子カプセル・・・」は「シクロデキストリン」と関
係がある。FIG. 5 is an explanatory diagram of the number, score, and rank of keywords and co-occurrence words in each search item searched by the keyword "CD". Although the text stored in the text database is not shown, the keyword "C
As a result of the search by "D", it is assumed that the search items as shown in the leftmost column of FIG. 5 are searched. Here, "semiconductor laser ...", "PCM music broadcasting ...", "CD-RO"
"M ..." and "LD ..." are "compact discs"
Is related to "Free interest rate products ..." are "negotiable deposits", "Banks ..." are "cash dispensers", "Geneva Conference on Disarmament ..." is "normal disarmament",
"Molecular capsules ..." is related to "cyclodextrin".

【００２６】キーワードと共起語の個数を図５に示すよ
うに仮定し、優先度の計算方法として、点数計算を行な
う方法を採用し、上述の計算方法により点数計算を行な
う。そして、計算された点数により、優先度を付ける。
ここで、共起語が用いられている検索項目のみに限定し
て優先度付けを行なうことにより、「コンパクトディス
ク」と関係のある４項目のみが優先度付けされる。これ
により、「ＣＤ」の意味を「コンパクトディスク」に限
定した検索結果を得ることができる。このように、共起
語を用いることにより、複数の意味を持つ語の意味を限
定して検索することができる。Assuming the number of keywords and the number of co-occurrence words as shown in FIG. 5, a method of calculating a score is adopted as the priority calculation method, and the score calculation is performed by the above calculation method. Then, priorities are given according to the calculated points.
Here, by prioritizing only the search items for which the co-occurrence word is used, only the four items related to the "compact disc" are prioritized. As a result, it is possible to obtain a search result in which the meaning of “CD” is limited to “compact disc”. In this way, by using the co-occurrence word, it is possible to limit the search for the meaning of a word having a plurality of meanings.

【００２７】上述の説明では、検索対象となるテキスト
を、図３に示したような項目名とその説明の繰り返しで
ある辞典タイプのテキストと仮定している。しかしこれ
に限らず、１つの大きなテキストでもよい。このとき、
項目の単位を章や節、あるいは、段落などとすれば適応
可能である。項目の単位を段落とした場合には、項目名
や表題が存在しない場合ももちろんある。また、いくつ
かの文書が集まった文書群に対しても検索を行なうこと
ができる。このとき、タイトルを項目名、１文書を１項
目とすればよい。この場合にも、タイトルのない場合で
も適応することができる。In the above description, it is assumed that the text to be searched is a dictionary type text that is a repetition of the item name and its description as shown in FIG. However, the present invention is not limited to this, and may be one large text. At this time,
It is applicable if the unit of items is a chapter, a section, or a paragraph. When the unit of item is a paragraph, the item name and title may not exist. In addition, a search can be performed on a document group in which some documents are collected. At this time, the title may be the item name and the document may be the one item. In this case as well, it can be applied even if there is no title.

【００２８】[0028]

【発明の効果】以上の説明から明らかなように、本発明
によれば、キーワードが直接記述されている場合だけで
はなく、間接的な記述も考慮し、キーワードとその共起
語により検索結果の優先度付けを行なっているので、従
来のキーワードに関する直接記述だけを考慮したキーワ
ードのみの優先度付けに比べ、精度良く優先度付けを行
なうことができる。特に、１項目当たりの文章量が少な
いテキストにおいては、キーワードの個数も少ないが、
本発明では、このようなテキストに対しても有効であ
る。また、キーワードが複数の意味を持つ場合でも、共
起語を用いることにより、キーワードの意味を限定して
検索することができるなどの効果がある。As is apparent from the above description, according to the present invention, not only when a keyword is directly described, but also when an indirect description is taken into consideration, a keyword and its co-occurrence word are used to generate a search result. Since the prioritization is performed, the prioritization can be performed more accurately than the conventional prioritization of only the keyword that considers only the direct description of the keyword. Especially in texts where the amount of text per item is small, the number of keywords is small,
The present invention is also effective for such text. Further, even when a keyword has a plurality of meanings, the use of a co-occurrence word has an effect that the meaning of the keyword can be limited to be searched.

[Brief description of drawings]

【図１】本発明のテキスト検索装置の１実施例を示す
ブロック図である。FIG. 1 is a block diagram showing an embodiment of a text search device according to the present invention.

【図２】本発明のテキスト検索装置の一実施例におけ
る動作の一例の説明図である。FIG. 2 is an explanatory diagram of an example of an operation in one embodiment of the text search device of the present invention.

【図３】テキストデータベースに格納されているテキ
ストデータの一例の説明図である。FIG. 3 is an explanatory diagram of an example of text data stored in a text database.

【図４】キーワード「ＡＩ」により検索された各検索
項目におけるキーワードと共起語の個数、点数、優先順
位の説明図である。FIG. 4 is an explanatory diagram of the number of keywords and co-occurrence words, scores, and priorities in each search item searched by the keyword “AI”.

【図５】キーワード「ＣＤ」により検索された各検索
項目におけるキーワードと共起語の個数、点数、優先順
位の説明図である。FIG. 5 is an explanatory diagram of the number of keywords and co-occurrence words, scores, and priorities in each search item searched by the keyword “CD”.

[Explanation of symbols]

１…キーワード指定部、２…テキスト検索部、３…テキ
ストデータベース、４…共起語検索部、５…共起語デー
タベース、６…優先度計算部、７…テキスト表示部。1 ... Keyword designation part, 2 ... Text search part, 3 ... Text database, 4 ... Co-occurrence word search part, 5 ... Co-occurrence word database, 6 ... Priority calculation part, 7 ... Text display part.

Claims

[Claims]

1. A text search device for searching for text by a keyword, a keyword designating section for accepting input of a keyword, and a text database storing text searchable by the keyword.
A text search unit that searches the text database with the keyword accepted by the keyword designation unit, a co-occurrence word database that stores information on a co-occurrence word that has a strong relationship with a certain word, and a keyword accepted by the keyword designation unit A co-occurrence word search unit that searches the co-occurrence word database with a keyword, a keyword accepted by the keyword designation unit, and a search in the text search unit based on the co-occurrence word information searched by the keyword in the co-occurrence word search unit. A text search device having a priority calculation unit for calculating the priority of a result.