JP2002123544A

JP2002123544A - Retrieval preprocessing apparatus, document retrieving apparatus, and retrieval preprocessing method and document retrieving method

Info

Publication number: JP2002123544A
Application number: JP2000314013A
Authority: JP
Inventors: Masanori Nakamura; 正規中村; Keizo Uchiyama; 恵三内山; Itsuko Tezuka; 伊津子手塚; Akio Yasuda; 明夫保田; Misa Onuma; 美佐大沼; Naoko Makita; 尚子牧田
Original assignee: HEIWA INFORMATION CENTER CO Ltd; Tokyo Electric Power Co Inc
Current assignee: HEIWA INFORMATION CENTER CO Ltd; Tokyo Electric Power Company Holdings Inc
Priority date: 2000-10-13
Filing date: 2000-10-13
Publication date: 2002-04-26

Abstract

PROBLEM TO BE SOLVED: To provide a retrieval preprocessing apparatus and a document retrieving apparatus, and a retrieval preprocessing method and a document retrieving method which make it possible to easily create a proper retrieval sentence and enable retrieval wherein retrieval intention is properly reflected by generating a natural retrieval sentence by using a candidate word and performing retrieval. SOLUTION: Candidate words as candidates for retrieval conditions are presented and a candidate word that a retriever selects out of the candidate words is employed as a 1st word for a retrieval sentence; and candidate words following the word employed for the retrieval sentence are presented and a candidate word that the retriever selects out of the candidate words is employed for the retrieval sentence and added. The presentation of a candidate word following a word employed for the retrieval sentence and the addition of the candidate word selected by the retriever to the retrieval sentence are repeated as many times as needed to generate the retrieval sentence, an object document is retrieved on condition that the retrieval sentence is included, and the retrieval result is presented.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】検索者の意図を容易且つ的確
に反映させた検索を行えるように、自然な言葉を用いて
検索を行う検索前処理装置、文書検索装置、検索前処理
方法及び文書検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention A search pre-processing apparatus, a document search apparatus, a search pre-processing method, and a document search for performing a search using natural words so that a search that reflects the intention of a searcher easily and accurately can be performed. About the method.

【０００２】[0002]

【従来の技術】自然言語による検索は、検索者が自由に
記述する不定形の文書から検索意図を正しく判断するた
め、文章の解析に加え同義語処理や異表記の収束など複
雑な処理が必要である。2. Description of the Related Art Searching in a natural language requires complicated processing such as synonym processing and convergence of different notations in addition to sentence analysis in order to correctly judge the search intention from an indefinite document freely described by the searcher. It is.

【０００３】現状の自然言語による検索は、検索文の形
態素解析によって検索で有効と思われる単語を抽出し、
文字列としての一致検索を行うのが一般的である。[0003] In the current natural language search, words considered to be effective in the search are extracted by morphological analysis of the search sentence.
It is common to perform a match search as a character string.

【０００４】[0004]

【発明が解決しようとする課題】しかしこの方法では、
検索者には検索対象の文書内容とそこに使われている言
葉がわからないため、検索意図を表現する適切な検索文
を作成するのが難しいという問題点があった。However, in this method,
Since the searcher does not know the content of the document to be searched and the words used therein, there is a problem that it is difficult to create an appropriate search sentence expressing the search intention.

【０００５】これを補うため、検索結果と共に類義語や
関連語を提示している場合もあるが、これらの語は検索
に利用した、限られた単語をもとに抽出したものであ
り、必ずしも検索文の文脈に沿ったものではない。To compensate for this, synonyms and related words may be presented together with the search results. However, these words are extracted based on limited words used in the search, and are not necessarily searched. It is not in the context of the sentence.

【０００６】また、検索条件と結果の因果関係がはっき
りしないため、希望した結果が得られなかった場合に、
検索文を修正すべきなのか、目的とする文書が検索対象
にないのかが分からないという問題点もあげられる。[0006] Further, since the causal relationship between the search condition and the result is not clear, if the desired result cannot be obtained,
Another problem is that it is not known whether the search sentence should be corrected or the target document is not included in the search target.

【０００７】そこで本発明は、候補語を用いて自然な検
索文を作成し、検索を行うことにより、容易に適切な検
索文を作成することができるようにし、検索意図が適切
に反映された検索を可能とする検索前処理装置、文書検
索装置、検索前処理方法及び文書検索方法の提供を目的
としている。Therefore, the present invention makes it possible to easily create an appropriate search sentence by creating a natural search sentence using candidate words and performing a search, and the search intention is appropriately reflected. It is an object of the present invention to provide a search preprocessing device, a document search device, a search preprocessing method, and a document search method that enable a search.

【０００８】[0008]

【課題を解決するための手段】上記課題を達成するため
に本発明は、次の手段を採用した。To achieve the above object, the present invention employs the following means.

【０００９】本発明の文書検索装置及び文書検索方法
は、候補語を組み合せた自然な検索文で検索を行うもの
であり、検索意図が適切に反映された検索を可能として
いる。The document search apparatus and the document search method of the present invention perform a search using a natural search sentence combining candidate words, and enable a search in which a search intention is appropriately reflected.

【００１０】また、本発明の検索前処理装置及び検索前
処理方法は、検索対象文書から候補語を抽出することに
より、検索対象文書自身に含まれる自然な言葉を利用し
て適切な検索文を作成できるようにしている。Further, the pre-searching apparatus and the pre-searching method of the present invention extract a candidate word from a search target document, thereby making it possible to generate an appropriate search sentence using natural words contained in the search target document itself. I can make it.

【００１１】従って検索者の意図と検索結果が一致せず
に試行錯誤を繰り返すことを少なくできる。Therefore, it is possible to reduce the number of times of trial and error without matching the searcher's intention with the search result.

【００１２】本発明の検索前処理装置は、検索対象文書
を所定単位の語に分割し、検索条件の候補となる候補語
を作成する候補語作成手段と、前記候補語をその候補語
の出現形態と対応づけて記憶する候補語記憶手段とを備
える。[0012] A search preprocessing apparatus according to the present invention divides a search target document into words in a predetermined unit and creates candidate words serving as search condition candidates, and uses the candidate words as appearances of the candidate words. A candidate word storage means for storing in association with the form.

【００１３】このように検索対象文書を所定単位の語、
例えば、単語、文節、品詞別に分割して候補語を促成
し、どの候補語の次にどの候補語が出現したか、即ち出
現形態を前記候補語と対応つけて記憶したことにより、
検索文を作成する際に、出現形態を再現でき、自然で適
切な検索文の作成を可能としている。In this manner, the search target document is defined as a word in a predetermined unit,
For example, by dividing a word, a phrase, and a part of speech to urge a candidate word, and storing which candidate word appears next to which candidate word, that is, by storing the appearance form in association with the candidate word,
When creating a search sentence, the appearance form can be reproduced, and a natural and appropriate search sentence can be created.

【００１４】この場合に、前記候補語作成手段で作成し
た候補語のうち、同じ候補語の出現した回数を積算する
積算手段と、前記出現回数を候補語毎に記憶する回数記
憶手段を備えるように構成しても良い。In this case, among the candidate words created by the candidate word creating means, an integrating means for integrating the number of occurrences of the same candidate word and a number storage means for storing the number of occurrences for each candidate word are provided. May be configured.

【００１５】出現頻度の高い候補語は、検索文を作成す
る際にも利用頻度が高い傾向にあるので、検索文を作成
する際、前記出現回数が高い順に候補語を提示すること
で利便性が向上する。[0015] Candidate words having a high appearance frequency tend to be frequently used when creating a search sentence. Therefore, when creating a search sentence, presenting the candidate words in descending order of the number of appearances is convenient. Is improved.

【００１６】また、前記候補語の重要度に応じて重み付
けを行う候補語重み付手段と、前記候補語毎の重み付情
報を記憶する重み記憶手段を備えても良い。[0016] The apparatus may further comprise candidate word weighting means for performing weighting according to the importance of the candidate words, and weight storage means for storing weighting information for each of the candidate words.

【００１７】この場合、前記候補語の重要度が、前記候
補語の出現回数及び出現箇所の少なくとも一方とするの
が好適である。In this case, it is preferable that the importance of the candidate word is at least one of the number of appearances and the appearance location of the candidate word.

【００１８】このように、候補語を重み付けすることに
より、候補語を提示する際、重要度の高い順に提示する
ことができ、利便性が向上する。As described above, by weighting the candidate words, when the candidate words are presented, the candidate words can be presented in descending order of importance, and the convenience is improved.

【００１９】また、上記検索前処理装置において、候補
語からキーワードを抽出するキーワード抽出手段と、前
記キーワードを記憶するキーワード記憶手段を備える構
成としても良い。Further, the above-mentioned search preprocessing apparatus may be configured to include a keyword extracting means for extracting a keyword from a candidate word and a keyword storing means for storing the keyword.

【００２０】この場合、前記検索対象文書に存在する前
記キーワードの組み合せを作成する組み合せ作成手段
と、このキーワードの組み合せをこの組み合せが存在す
る文書と対応づけて記憶する組み合せ記憶手段を備える
のが好適である。In this case, it is preferable to provide a combination creating means for creating a combination of the keywords present in the search target document, and a combination storage means for storing the combination of the keywords in association with the document in which the combination exists. It is.

【００２１】本発明の文書検索装置は、複数の文書を検
索対象とし、検索条件と一致する文書を検索するもので
あり、検索条件の候補となる候補語を提示する候補語提
示手段と、前記候補語提示手段に提示された候補語を組
み合せて検索文を作成する検索文作成手段と、前記検索
文が含まれることを検索条件として前記検索対象を検索
した結果を提示する検索結果提示手段とを備える。[0021] The document search apparatus of the present invention searches for a plurality of documents as search targets and searches for documents that match the search conditions, and presents candidate word presenting means for presenting candidate words that are candidates for search conditions. A search sentence creating means for creating a search sentence by combining candidate words presented by the candidate word presenting means, and a search result presenting means for presenting a result of searching for the search target with the inclusion of the search sentence as a search condition. Is provided.

【００２２】これにより、候補語を組み合せた適切な検
索文を作成でき、検索意図が適切に反映された検索を可
能としている。As a result, an appropriate search sentence combining candidate words can be created, and a search in which a search intention is appropriately reflected can be performed.

【００２３】前記文書検索装置において、検索文作成手
段は、前記候補語提示手段によって提示された候補語の
うち、検索者に選択された候補語を検索文に採用するの
が好適である。例えば、候補語提示手段によってディス
プレイ上に表示された候補語を検索者がマウスなどで選
択し、検索文作成手段がこの候補語を検索文に追加（採
用）する。この選択と追加を繰り返して幾つかの候補語
を連ねた検索文を作成して検索を行う。これにより、容
易に検索者の意図に沿った検索文が作成できるようにし
ている。In the above-described document search apparatus, it is preferable that the search sentence creating means employs, as the search sentence, a candidate word selected by a searcher among the candidate words presented by the candidate word presenting means. For example, the searcher selects a candidate word displayed on the display by the candidate word presenting unit with a mouse or the like, and the search sentence creating unit adds (adopts) the candidate word to the search sentence. By repeating this selection and addition, a search sentence in which several candidate words are linked is created and searched. This makes it possible to easily create a search sentence according to the searcher's intention.

【００２４】また、前記候補語提示手段が、検索対象文
書を所定単位の語に分割して作成された候補語と、その
候補語の検索対象文書内での出現形態とに基づき、前記
検索文作成手段に採用された候補語の次に続く候補語を
提示し、前記検索文作成手段が、次に続く候補語のう
ち、検索者に選択された候補語を採用して検索文に追加
する構成とすると好適である。Further, the candidate word presenting means is configured to divide the search target document into words of a predetermined unit and generate the search text based on the appearance of the candidate word in the search target document. The candidate word following the adopted candidate word is presented to the creating unit, and the search sentence creating unit adopts the candidate word selected by the searcher among the following candidate words and adds the candidate word to the search sentence. It is preferable to adopt a configuration.

【００２５】これにより、検索対象文書に含まれる候補
語を出現形態に基づいて提示でき、容易に検索文を作成
できるようにしている。Thus, the candidate words included in the search target document can be presented based on the appearance form, and the search sentence can be easily created.

【００２６】更に前記候補語提示手段が、前記検索文作
成手段に先行して採用された複数個の候補語の順番に応
じ、次に続く候補語を、前記候補語の検索対象文書内で
の出現形態に基づいて決定し、提示する構成としても良
い。Further, the candidate word presenting means, in accordance with the order of the plurality of candidate words employed prior to the search sentence creating means, causes the next succeeding candidate word in the document to be searched for the candidate word. It is good also as a structure determined and presented based on the appearance form.

【００２７】これにより複数個の候補語の順番に基づ
き、文脈に沿った候補語を提示できるようにしている。With this arrangement, it is possible to present candidate words in a context based on the order of a plurality of candidate words.

【００２８】また、前記候補語提示手段が、前記候補語
の検索文書内での出現回数に応じた順番で前記候補語を
提示する構成としても良い。[0028] The candidate word presenting means may present the candidate words in an order according to the number of appearances of the candidate words in the search document.

【００２９】出現頻度の高い候補語は、検索文を作成す
る際にも利用頻度が高い傾向にあるので、検索文を作成
する際、前記出現回数が高い順に候補語を提示すること
で利便性が向上する。Since a candidate word having a high frequency of appearance tends to be frequently used when creating a search sentence, it is convenient to present the candidate words in the order of appearance frequency when creating a search sentence. Is improved.

【００３０】また、前記候補語提示手段が、前記候補語
の重要度に応じた順番で前記候補語を提示する構成とし
ても良い。[0030] The candidate word presenting means may present the candidate words in an order according to the importance of the candidate words.

【００３１】このように、候補語を重み付けすることに
より、候補語を提示する際、重要度の高い順に提示する
ことができ、利便性が向上する。As described above, by weighting the candidate words, the candidate words can be presented in the order of importance when presenting, and the convenience is improved.

【００３２】前記検索文作成手段が採用した複数の候補
語のうち、名詞又は名詞句を組み合せて検索文としても
良い。[0032] Of the plurality of candidate words adopted by the search sentence creating means, a noun or a noun phrase may be combined into a search sentence.

【００３３】前記検索文作成手段が採用した複数の候補
語のうち、名詞又は名詞句を組み合せて検索文とし、前
記検索結果提示手段が、前記検索対象文書を所定単位の
語に分割して作成された候補語のうち名詞又は名詞句で
あるキーワードの組み合せを参照して検索文と一致する
キーワードの組み合せを検索し、この検索文と一致する
キーワードの組み合せが存在する文書を検索結果として
提示する構成としても良い。このとき検索文である組み
合わせが複数存在する場合には、これらの組合せを全て
含む文章を検索結果として抽出しても良いし、前記組合
せのうち何れか一つでも含む文章を検索結果として抽出
しても良い。[0033] Among the plurality of candidate words employed by the search sentence creating means, a noun or a noun phrase is combined into a search sentence, and the search result presentation means divides the search target document into words of a predetermined unit and creates the search sentence. The combination of keywords that match the search sentence is searched for by referring to the combination of keywords that are nouns or noun phrases among the candidate words that have been selected, and a document that includes the combination of keywords that matches the search sentence is presented as a search result. It is good also as composition. At this time, when there are a plurality of combinations that are search sentences, a sentence including all of these combinations may be extracted as a search result, or a sentence including any one of the combinations may be extracted as a search result. May be.

【００３４】本発明の検索前処理方法は、検索対象文書
を所定単位の語に分割し、検索条件の候補となる候補語
を作成し、前記候補語をその候補語の出現形態と対応づ
けて記憶する。The search preprocessing method of the present invention divides a document to be searched into words of a predetermined unit, creates candidate words as search condition candidates, and associates the candidate words with the appearance forms of the candidate words. Remember.

【００３５】このように候補語を出現形態と対応つけて
記憶したことにより、検索文を作成する際に、出現形態
を再現でき、自然で適切な検索文の作成を可能としてい
る。As described above, by storing candidate words in association with appearance forms, when creating a search sentence, the appearance form can be reproduced, and a natural and appropriate search sentence can be created.

【００３６】この検索前処理方法は例えば、検索対象文
書を所定単位の語に分割し、検索条件の候補となる候補
語を作成し、前記候補語をその候補語の出現形態と対応
づけて記憶する処理をコンピュータに実行させるプログ
ラムを記録した記録媒体として提供しても良い。In this search preprocessing method, for example, a search target document is divided into words of a predetermined unit, candidate words serving as search condition candidates are created, and the candidate words are stored in association with the appearance forms of the candidate words. May be provided as a recording medium on which a program for causing a computer to execute the processing to be performed is recorded.

【００３７】また、本発明の文書検索方法は、検索条件
の候補となる候補語を提示し、前記候補語の中から検索
者が選択した候補語を一番目の語として検索文に採用
し、前記検索文に採用された語に続く候補語を提示し、
前記候補語の中から検索者が選択した候補語を検索文に
採用して追加し、この検索文に採用された語に続く候補
語の提示と、検索者に選択された候補語の検索文への追
加とを所要回繰り返して検索文を作成し、前記検索文が
含まれることを検索条件として検索対象文書を検索し、
検索結果を提示する。Further, the document search method of the present invention presents a candidate word as a search condition candidate, employs a candidate word selected by a searcher among the candidate words as a first word in a search sentence, Presenting candidate words following the word adopted in the search sentence,
A candidate word selected by the searcher from among the candidate words is adopted and added to the search sentence, a candidate word following the word adopted in the search sentence is presented, and a search sentence of the candidate word selected by the searcher is added. A search sentence is created by repeating the addition and addition to the required number of times, and a search target document is searched using a search condition that the search sentence is included,
Present search results.

【００３８】これにより、候補語を組み合せた適切な検
索文を作成でき、検索意図が適切に反映された検索を可
能としている。Thus, an appropriate search sentence combining candidate words can be created, and a search in which the search intention is appropriately reflected can be performed.

【００３９】この場合、検索対象文書を所定単位の語に
分割して候補語を作成し、その候補語を検索対象文書内
での出現形態と共に記憶しておき、前記出現形態に基づ
いて前記検索文に採用された語に続く候補語を決定する
のが好適である。In this case, the search target document is divided into words in a predetermined unit to create candidate words, and the candidate words are stored together with the appearance form in the search target document, and the search word is stored based on the appearance form. It is preferable to determine candidate words that follow the word employed in the sentence.

【００４０】この文書検索装置は例えば、検索条件の候
補となる候補語を提示し、前記候補語の中から検索者が
選択した候補語を一番目の語として検索文に採用し、前
記検索文に採用された語に続く候補語を提示し、前記候
補語の中から検索者が選択した候補語を検索文に採用し
て追加し、この検索文に採用された語に続く候補語の提
示と、検索者に選択された候補語の検索文への追加とを
所要回繰り返して検索文を作成し、前記検索文が含まれ
ることを検索条件として検索対象文書を検索し、検索結
果を提示する処理をコンピュータに実行させるプログラ
ムを記録した記録媒体として提供しても良い。This document search apparatus presents, for example, candidate words which are candidates for search conditions, and employs, as a first word, a candidate word selected by a searcher from among the candidate words in a search sentence. Presenting a candidate word following the word adopted in the search word, adopting and adding to the search sentence a candidate word selected by the searcher from the candidate words, and presenting a candidate word following the word adopted in the search sentence And the addition of the candidate word selected by the searcher to the search sentence is repeated as many times as necessary to create a search sentence. The search target document is searched using the search sentence being included as a search condition, and the search result is presented. May be provided as a recording medium on which a program for causing a computer to execute the processing to be performed is recorded.

【００４１】本発明において、以上の各構成要素は、可
能な限り組み合せることができる。In the present invention, the above components can be combined as much as possible.

【００４２】[0042]

【発明の実施の形態】〈第一の実施形態〉以下、本発明
の実施の形態を図面に基づいて説明する。図１は本発明
の一実施形態である検索前処理装置と文書検索装置の概
念図である。本発明は、検索条件の候補となる候補語を
提示し、この候補語を組み合せた自然な検索文で、検索
を行うものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS <First Embodiment> An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a conceptual diagram of a search preprocessing device and a document search device according to an embodiment of the present invention. According to the present invention, a candidate word that is a candidate for a search condition is presented, and a search is performed using a natural search sentence combining the candidate words.

【００４３】検索前処理装置１は、検索対象文書を所定
単位の語に分割し、検索条件の候補となる候補語を作成
する候補語作成手段１ａや、前記候補語作成手段１ａで
作成した候補語のうち、同じ候補語の出現した回数を積
算する積算手段１ｂ、前記候補語からキーワードを抽出
するキーワード抽出手段１ｃを備えている。The search preprocessing apparatus 1 divides a document to be searched into words of a predetermined unit and creates candidate words serving as search condition candidates, and a candidate word created by the candidate word creation means 1a. Among the words, there are provided an accumulating means 1b for accumulating the number of occurrences of the same candidate word, and a keyword extracting means 1c for extracting a keyword from the candidate word.

【００４４】また、前記候補語をその候補語の出現形態
と対応づけて記憶する候補語記憶手段（候補語テーブ
ル）１ｆ、前記候補語の出現回数を候補語毎に記憶する
回数記憶手段（回数テーブル）１ｇ、前記キーワードを
記憶するキーワード記憶手段（キーワードテーブル）１
ｈを備えている。A candidate word storage means (candidate word table) 1f for storing the candidate words in association with the appearance forms of the candidate words, and a number storage means (number of times) for storing the number of appearances of the candidate words for each candidate word Table) 1g, keyword storage means (keyword table) 1 for storing the keyword
h.

【００４５】これらの記憶手段１ｆ〜１ｈは、各々独立
に設けても良いが、本形態では、データベース１０のテ
ーブルとしてそれぞれを格納した。なお、これらの記憶
手段（データベース１０）は、文書検索装置側に設けて
も良い。また、検索対象文書は、データベース１０又は
別体のデータベースに記憶（インターネット上の各サー
バ等に記憶されている場合を含む）しておく。Although these storage means 1f to 1h may be provided independently of each other, they are stored as tables of the database 10 in this embodiment. Note that these storage means (database 10) may be provided on the document search device side. The search target document is stored in the database 10 or a separate database (including a case where the document is stored in each server on the Internet).

【００４６】一方、文書検索装置２は、検索条件の候補
となる候補語を提示する候補語提示手段２ａと、前記候
補語提示手段２ａに提示された候補語を組み合せて検索
文を作成する検索文作成手段２ｂと、前記検索文が含ま
れることを検索条件として検索対象を検索した結果を提
示する検索結果提示手段２ｃを備えている。On the other hand, the document search device 2 combines a candidate word presenting means 2a for presenting a candidate word as a search condition candidate and a candidate word presented to the candidate word presenting means 2a to create a search sentence. It comprises a sentence creating means 2b and a search result presenting means 2c for presenting a result of searching for a search target with the inclusion of the search sentence as a search condition.

【００４７】図２は、上記文書検索装置による検索手順
のフローチャートである。検索を開始すると、先ず最初
の候補語を決定するため、キーワードを頭出し検索する
（Ｓ１）。本例では、ｘＤＳＬについて調べることと
し、図３の如く、「頭出し」の検索欄に「ｘ」を入力
し、「頭出し」ボタンを選択、即ちポインティングデバ
イスにてクリックする。FIG. 2 is a flowchart of a search procedure performed by the document search apparatus. When the search is started, a keyword is searched for and searched to determine the first candidate word (S1). In this example, xDSL is to be checked. As shown in FIG. 3, "x" is input in a search box for "search for", and a "search for" button is selected, that is, clicked with a pointing device.

【００４８】候補語提示手段２ａは、データベース１０
を参照してこの文字から始まるキーワードを一覧表示す
る（Ｓ２）。図４参照。The candidate word presenting means 2a includes a database 10
And displays a list of keywords starting with this character (S2). See FIG.

【００４９】そして検索者が一覧表示されたキーワード
を選択する（Ｓ３）。本例ではｘＤＳＬについて検索す
るため、「ｘＤＳＬ」の欄の自然語を選択する。Then, the searcher selects a keyword displayed as a list (S3). In this example, in order to search for xDSL, a natural language in the “xDSL” column is selected.

【００５０】図５は、自然言語による検索の画面の例を
示している。検索文作成手段２ｂは、先のステップで選
択された候補語「ｘＤＳＬ」を検索文の一番目の語とし
て採用し、候補語提示手段２aは、候補語テーブル１ｆ
を参照してこの候補語に続くと思われる次の候補語を表
示する（Ｓ４）。FIG. 5 shows an example of a search screen in natural language. The search sentence creating means 2b adopts the candidate word “xDSL” selected in the previous step as the first word of the search sentence, and the candidate word presentation means 2a outputs the candidate word table 1f
Is displayed, and the next candidate word that seems to follow this candidate word is displayed (S4).

【００５１】本例では、この候補語を５つずつ表示し、
候補が５つ以上ある場合には、▽ボタンと△ボタンによ
り次の５つの候補、前の５つの候補を表示させる。In this example, the candidate words are displayed five by five,
When there are five or more candidates, the next five candidates and the previous five candidates are displayed using the ▽ and △ buttons.

【００５２】検索者が、この中から次の候補語を選択し
た場合、検索文作成手段２ｂは、この選択された候補語
を検索文に追加し、候補語提示手段２aは、再び候補語
テーブル１ｆを参照し、この候補語の出現形態に基づい
て次に続くと判断される候補語を表示する。When the searcher selects the next candidate word from these, the search sentence creating means 2b adds the selected candidate word to the search sentence, and the candidate word presentation means 2a returns to the candidate word table again. With reference to 1f, a candidate word determined to be continued next is displayed based on the appearance form of the candidate word.

【００５３】検索者は、所望の検索文ができるまで候補
語の選択を繰り返す（Ｓ５）。図６は二番目の候補語と
して「の」を選択した場合を示している。The searcher repeats the selection of candidate words until a desired search sentence is created (S5). FIG. 6 shows a case where “no” is selected as the second candidate word.

【００５４】本例では、三番目の候補語として「情報」
を選択し、検索文が「ｘＤＳＬの情報」となった状態で
検索ボタンを選択する。In this example, "information" is used as the third candidate word.
Is selected, and the search button is selected in a state where the search sentence is “xDSL information”.

【００５５】検索結果提示手段２ｃは、前記検索文が含
まれる文書を検索対象文書から検索し、検索結果を図７
の如く一覧表示する（Ｓ６）。The search result presenting means 2c searches the document containing the search sentence from the search target document, and displays the search result in FIG.
(S6).

【００５６】このように本実施形態では、自然な検索文
で検索を行うので検索者の意図した検索を行い易い。ま
た、予め検索対象を処理して得た候補語を参照しながら
検索文を作成するので、適正な検索文を作成し易い。As described above, in the present embodiment, since the search is performed using the natural search text, the search intended by the searcher can be easily performed. In addition, since a search sentence is created while referring to candidate words obtained by processing a search target in advance, an appropriate search sentence can be easily created.

【００５７】〈検索前処理〉次に、検索前の処理である
候補語を作成する手順を具体的に説明する。候補語作成
手段１ａは、先ず図８に示す文章に対して［分かち書き
処理］を行う。この分かち書き処理は、従来の者と略同
一の作業を行うもので、対象の文書を単語単位に分割
し、候補語を作成するものである。本例では、文書にタ
イトルが付いている場合には、そのタイトルに対しても
分かち書き処理をする（図９参照）。<Pre-Search Processing> Next, a procedure for creating a candidate word, which is a pre-search processing, will be specifically described. The candidate word creating means 1a first performs [separation processing] on the text shown in FIG. This segmenting process performs substantially the same operation as a conventional person, and divides a target document into word units to generate candidate words. In this example, when a title is attached to a document, a separate writing process is also performed on the title (see FIG. 9).

【００５８】そして各候補語を出現形態と対応付けて記
憶する。即ち検索文を作成する際に、次に続く候補語を
提示できるように、各候補語を直前の候補語と共に候補
語テーブルに格納する。図１０は、図９の候補語を候補
語テーブル１ｆに格納した例である。このとき候補語に
ふさわしくない括弧や記号、空白、句読点は、無視して
いる。Each candidate word is stored in association with the appearance form. That is, when creating a search sentence, each candidate word is stored in the candidate word table together with the immediately preceding candidate word so that the next candidate word can be presented. FIG. 10 is an example in which the candidate words of FIG. 9 are stored in the candidate word table 1f. At this time, parentheses, symbols, spaces, and punctuation that are not appropriate for the candidate word are ignored.

【００５９】なお、「専用のワークステーション（高性
能パソコン）」のように「（」の前が名詞である場
合、「（」の直後の名詞、高性能パソコンと、ワーク
ステーションとが、同列であると思われるので、高性能
パソコンの前の候補語としては、「の」を登録するのが
良い。そして、「）」の次の語は、「（」の前のワー
クステーションと、「）」直前の語の両方を前候補語
とする。When "(") is a noun like "dedicated workstation (high-performance personal computer)", the noun immediately after "(", the high-performance personal computer and the workstation are in the same line. It is better to register "no" as a candidate word in front of a high-performance personal computer, and the next word after ")" is "workstation before"(" Both words immediately before are pre-candidate words.

【００６０】また、積算手段１ｂは、図９の候補語につ
いて、ひとつの文、例えば文の始まりから句点までや、
括弧で囲まれた文章、改段された見出し等、ひとまとま
りの文章中に同一の候補語が出現した回数を積算し、候
補語毎の出現回数を回数テーブル１ｇに記憶する（図１
１参照）。The integrating means 1b calculates one sentence, for example, from the beginning of the sentence to a period, for the candidate word in FIG.
The number of occurrences of the same candidate word in a group of sentences, such as a sentence enclosed in parentheses and a headline changed, is integrated, and the number of appearances for each candidate word is stored in the count table 1g (FIG. 1).
1).

【００６１】この回数が多く、出現頻度の高いものは、
検索文に利用される確率も高いと思われるので、候補語
提示手段２ａが候補語を提示する際、候補語テーブル１
ｆから抽出した次の候補語を、この回数テーブル１ｇを
参照して出現頻度の高い順に提示する。Those with a large number of occurrences and a high frequency of appearance are:
Since the probability of being used for a search sentence is considered to be high, when the candidate word presenting means 2a presents the candidate word, the candidate word table 1
The next candidate words extracted from f are presented in order of appearance frequency with reference to the frequency table 1g.

【００６２】また、キーワード抽出手段１ｃは、図９の
候補語から名詞又は主格のものを抽出し、キーワードと
してデータベース１０のキーワードテーブル１ｈに登録
する（図１２参照）。The keyword extracting means 1c extracts a noun or a nominative from the candidate words in FIG. 9 and registers them in the keyword table 1h of the database 10 as keywords (see FIG. 12).

【００６３】候補語提示手段２ａは、ステップＳ１でキ
ーワードの頭出しを行う際、このキーワードテーブル１
ｈを参照してキーワードを提示する。When the candidate word presenting means 2a searches for a keyword in step S1, this keyword table 1
The keyword is presented with reference to h.

【００６４】〈第二の実施形態〉本実施形態は、前述の
実施形態と比べて最初の候補語を頭出し検索することに
代え、技術分野を選択して関連する候補語の中から最初
の候補語を提示することが異なり、その他の構成は略同
じである。本例において前述の実施形態と同一の要素に
ついては、同符号を付す等して再度の説明を省略してい
る。<Second Embodiment> In this embodiment, instead of searching for the first candidate word in comparison with the above-described embodiment, a technical field is selected and the first candidate word is selected from among related candidate words. The difference is that the candidate words are presented, and the other configurations are substantially the same. In this example, the same elements as those in the above-described embodiment are denoted by the same reference numerals, and the description thereof will not be repeated.

【００６５】図１３は本実施形態の概略構成図である。
本形態のキーワード抽出手段１ｃ’は、後述のようにキ
ーワードを関連する技術分野に分類し、関連性の強い語
と対応付けてキーワードテーブル１ｈ’に記憶させてい
る。FIG. 13 is a schematic configuration diagram of the present embodiment.
The keyword extracting means 1c 'of the present embodiment classifies keywords into related technical fields as described later and stores them in the keyword table 1h' in association with strongly related words.

【００６６】また、重み付手段１ｄは、キーワードの重
要度に応じて重み付けを行い、この重み情報をデータベ
ース１０の重みテーブル１ｊに記憶しておき、候補語提
示手段２ａがキーワードを提示する際、重要なキーワー
ドほど、先に提示できるようにしている。The weighting means 1d performs weighting according to the importance of the keyword, stores the weight information in a weight table 1j of the database 10, and when the candidate word presenting means 2a presents the keyword, The more important keywords are, the more they can be presented first.

【００６７】図１４は、本実施形態の検索手順のフロー
チャートである。検索を開始すると、先ず検索する分野
を絞り込むため、候補語提示手段２ａにより検索を行う
分野をモニター等の表示手段に表示して検索者に提示す
る（Ｓ１１）。図１５はこのときの検索分野の表示例で
ある。FIG. 14 is a flowchart of a search procedure according to this embodiment. When the search is started, first, in order to narrow down the field to be searched, the field to be searched is displayed on the display means such as a monitor by the candidate word presenting means 2a and presented to the searcher (S11). FIG. 15 shows a display example of the search field at this time.

【００６８】検索者はこれらの分野の中から検索したい
分野を選択する（Ｓ１２）。The searcher selects a field to be searched from these fields (S12).

【００６９】候補語提示手段２ａは、データベース１０
を参照してこの選択された分野に属するキーワードを一
覧表示する（Ｓ１３）。図１６は「次世代ハート゛ウェ
ア・アーキテクチャ」を選択した場合の表示例である。The candidate word presenting means 2 a
And displays a list of keywords belonging to the selected field (S13). FIG. 16 is a display example when "next-generation heart @ ware architecture" is selected.

【００７０】そして検索者が一覧表示されたキーワード
を選択する（Ｓ１４）。例えばｘＤＳＬについて検索し
たい場合は、「ｘＤＳＬ」の欄の自然語を選択する。Then, the searcher selects a keyword displayed as a list (S14). For example, when it is desired to search for xDSL, a natural language in the column of “xDSL” is selected.

【００７１】図１７は、自然言語による検索の画面の例
を示しており、候補語の欄には「ｘＤＳＬ」と関連性が
高い５つのキーワードが表示されている（Ｓ１５）。な
お、候補が５つ以上ある場合は、▽ボタンと△ボタンに
より次の５つの候補、前の５つの候補を表示させる。FIG. 17 shows an example of a search screen in a natural language, in which five keywords having high relevance to "xDSL" are displayed in the candidate word column (S15). If there are five or more candidates, the next five candidates and the previous five candidates are displayed using the ▽ and △ buttons.

【００７２】検索者は、この中から候補語を選択する
（Ｓ１６）。検索文作成手段２ｂは、この選択された候
補語を検索文の一番目の語として採用し、候補語提示手
段２aは、候補語テーブル１ｆを参照してこの候補語に
続くと思われる次の候補語を表示する（Ｓ１７）。The searcher selects a candidate word from these (S16). The search sentence creating means 2b adopts the selected candidate word as the first word of the search sentence, and the candidate word presenting means 2a refers to the candidate word table 1f to determine the next candidate word following the candidate word. The candidate words are displayed (S17).

【００７３】検索者は、所望の検索文ができるまで候補
語の選択を繰り返す（Ｓ１７，１８）。図６は二番目の
候補語として「の」を選択した場合を示している。検索
文作成手段２ｂはこの選択された候補語を検索文に追加
し、候補語提示手段２ａが次の候補語を表示している。The searcher repeats the selection of candidate words until a desired search sentence is created (S17, S18). FIG. 6 shows a case where “no” is selected as the second candidate word. The search sentence creating unit 2b adds the selected candidate word to the search sentence, and the candidate word presentation unit 2a displays the next candidate word.

【００７４】同様に検索者は、三番目の候補語として
「情報」を選択し、検索文が「ｘＤＳＬの情報」となっ
た状態で検索ボタンを選択する。Similarly, the searcher selects “information” as the third candidate word, and selects the search button in a state where the search sentence is “xDSL information”.

【００７５】検索結果提示手段２ｃは、前記検索文が含
まれる文書を検索対象文書から検索して図７の如く一覧
表示する（Ｓ１９）。The search result presenting means 2c searches the document containing the search sentence from the search target document and displays a list as shown in FIG. 7 (S19).

【００７６】このように本実施形態では、自然な検索文
で検索を行うので検索者の意図した検索を行い易い。ま
た、予め検索対象を処理して得た候補語を参照しながら
検索文を作成するので、適正な検索文を作成し易い。As described above, in the present embodiment, since the search is performed using the natural search text, the search intended by the searcher can be easily performed. In addition, since a search sentence is created while referring to candidate words obtained by processing a search target in advance, an appropriate search sentence can be easily created.

【００７７】〈検索前処理〉次に、上記ステップＳ１３
〜Ｓ１５において、選択されたキーワードと関連性の高
い候補語を提示するため、予めキーワード毎に関連性が
高い候補語を抽出する手順について説明する。<Pre-Search Processing> Next, step S13
In steps S15 to S15, a procedure for extracting candidate words having high relevance for each keyword in order to present candidate words having high relevance to the selected keyword will be described.

【００７８】上記候補語の抽出は、予め人間が検索対象
文書や一般的な文書を読み、各キーワードについて関係
が深いと思われる語を選別し、このキーワード毎に関連
語を記録した関連語辞書を作成し、これを利用しても良
いが、本例では、検索対象文書から抽出したキーワード
の親近性尺度によるキーワード間距離を求め、この距離
に応じて関連性の高い候補語を自動抽出した。The extraction of the candidate words is carried out in advance by a human reading a document to be searched or a general document, selecting words that are considered to be closely related to each keyword, and recording a related word for each keyword. May be used, but in this example, the inter-keyword distance based on the affinity measure of the keyword extracted from the search target document is obtained, and highly relevant candidate words are automatically extracted according to the distance. .

【００７９】先ず、検索対象文書を所定単位の文章（文
の始まりから句点までや、括弧で囲まれた文章、ひとつ
の段落、改段された見出し等、ひとまとまりの文章）に
分割し、各文書中のキーワードを抽出する。First, the document to be searched is divided into sentences of a predetermined unit (a set of sentences such as a sentence from the beginning of the sentence to a punctuation mark, a sentence enclosed in parentheses, a single paragraph, a headline changed, etc.). Extract keywords in the document.

【００８０】次いで、あるキーワードｘを含む単位文書
の数を積算し、これをキーワードｘの出現頻度Ｎ（ｘ）
とする。同様に別のキーワードｙを含む単位文書の数を
積算し、これをキーワードｙの出現頻度Ｎ（ｙ）とす
る。Next, the number of unit documents containing a certain keyword x is integrated, and this is added to the appearance frequency N (x) of the keyword x.
And Similarly, the number of unit documents including another keyword y is integrated, and this is set as the appearance frequency N (y) of the keyword y.

【００８１】また、キーワードｘとキーワードｙとを一
緒に含む単位文書の数をキーワード（ｘ，ｙ）の共起出
現頻度とし、次式に従いキーワード距離Ｌ（ｘ，ｙ）を
求める。Further, the number of unit documents including both the keyword x and the keyword y is defined as the co-occurrence frequency of the keyword (x, y), and the keyword distance L (x, y) is obtained according to the following equation.

【００８２】Ｌ（ｘ，ｙ）＝｛Ｎ（ｘ）＋Ｎ（ｙ）−Ｎ
（ｘ，ｙ）｝／Ｎ（ｘ，ｙ）この距離Ｌ（ｘ，ｙ）が小さいほど、キーワードｘとキ
ーワードｙの関連性は高いと判断する。L (x, y) = {N (x) + N (y) -N
(X, y) / N (x, y) It is determined that the smaller the distance L (x, y), the higher the relevance between the keyword x and the keyword y.

【００８３】この距離Ｌ（ｘ，ｙ）を全てのキーワード
の組み合せについて求め、各キーワードと所定値以下の
距離にあるキーワード或いは距離の小さい順に所定番目
までのキーワードとを対応してキーワードテーブル１
ｈ’に記憶しておく。The distance L (x, y) is obtained for all combinations of keywords, and the keywords are stored in the keyword table 1 by associating each keyword with a keyword having a distance equal to or less than a predetermined value or up to a predetermined keyword in ascending order of distance.
h '.

【００８４】これにより候補語提示手段２ａは、キーワ
ードテーブル１ｈ’を参照することでキーワードに関連
性の高い候補語を提示することが可能となっている。Thus, the candidate word presenting means 2a can present candidate words highly relevant to the keyword by referring to the keyword table 1h '.

【００８５】次にステップＳ１１〜Ｓ１３において、選
択された技術分野に属するキーワードを表示するため、
キーワードを所定の技術分野に分類する手順を説明す
る。Next, in steps S11 to S13, in order to display keywords belonging to the selected technical field,
A procedure for classifying a keyword into a predetermined technical field will be described.

【００８６】上記キーワードの抽出は、予め人間が各キ
ーワードを所定分野に適切に分類しても良いが、本例で
は、予め既知文書を用いて分類の知識辞書を作成してお
き、検索対象文書をこの知識辞書に基づいて自動分類し
た。For the keyword extraction, a human may classify each keyword appropriately in a predetermined field in advance. In this example, however, a knowledge dictionary of the classification is created in advance using a known document, and a search target document is created. Were automatically classified based on this knowledge dictionary.

【００８７】図１８は、キーワードを自動分類する手順
の説明図である。分類内で多くのテキストに出現するキ
ーワード対ほど、その分類内において「Ａと言えばＢ」
と類推しやすい関係にあり、この意味的なつながりの強
さが、分類としてまとめられた集合の概念を端的に表現
していると言える。各分類において、分類内に出現する
キーワード対の類推の強さを定量化し、知識として蓄積
しておくことにより、新規のテキストが与えられた際、
そこに出現するキーワード対と知識とを照合し、類推の
強さの違いから、テキストがどの分類の概念に最も近い
かを判断することが可能となる。FIG. 18 is an explanatory diagram of a procedure for automatically classifying keywords. The keyword pairs that appear in many texts in a category are "Speaking A" in the category.
It can be said that the strength of this semantic connection directly expresses the concept of a set organized as a classification. In each category, by quantifying the strength of analogy of keyword pairs that appear in the category and accumulating it as knowledge, when a new text is given,
By comparing the keyword pairs appearing there with the knowledge, it is possible to determine which classification concept the text is closest to from the difference in analogy strength.

【００８８】１．分類用知識辞書構築 (1)キーワード抽出と共起作成分類内すべてのテキストについて、キーワードとその組
み合わせ（共起）の抽出を行う。論文や新聞記事のよう
に長い文章を持つテキストの場合は一文内の共起、お客
様の声のように、短く、必ずしも文法的に正しいとは限
らない文章の場合はテキスト全体における共起を採用す
るなど、教師となるテキストの特性によって共起の取り
方を変え、より精度の良い情報を抽出するようにする。1. Construction of classification knowledge dictionary (1) Keyword extraction and co-occurrence creation Extract keywords and their combinations (co-occurrence) for all texts in the classification. Co-occurrence within a sentence for texts with long sentences such as papers and newspaper articles, and co-occurrence for the entire text for sentences that are short and not always grammatically correct, such as customer voices For example, the co-occurrence method is changed depending on the characteristics of the text to be used as a teacher to extract more accurate information.

【００８９】(2)距離算出共起を作る２つのキーワードそれぞれの頻度、および共
起の頻度から（ａ）式を用いて距離を計算する。分類内
のそれぞれの出現頻度において、もう一方と共起してい
る割合が大きいほど、そのキーワードがあるともう一方
も同じテキスト内に存在する確率が高いため、２つのキ
ーワードの距離は近いと定義する。(2) Distance Calculation The distance is calculated from the frequency of each of the two keywords forming the co-occurrence and the frequency of the co-occurrence using the equation (a). The larger the co-occurrence with the other in the frequency of each occurrence in the classification, the higher the probability that the other keyword will be in the same text if the keyword is present, so the distance between the two keywords is defined to be closer. I do.

【数１】 (Equation 1)

【００９０】(3) 依存度算出上記距離をもとに、分類内におけるキーワード間の類推
の強さ（依存度）を（ｂ）式にて定義する。依存度の最
小値（分類内での最大距離 Lmaxに対する依存度）が0.0
5となるように分類ごとに定数を設定し、テキスト数等
に起因する分類間の数値の差をなくしている。(3) Calculation of Dependency Based on the above distance, the strength of analogy between keywords in the classification (degree of dependency) is defined by equation (b). The minimum value of the dependence (the dependence on the maximum distance Lmax in the classification) is 0.0
A constant is set for each classification so that it becomes 5, and the difference in numerical values between classifications due to the number of texts and the like is eliminated.

【数２】 (Equation 2)

【００９１】２．自動分類 (1) キーワード抽出と共起作成新規の分類対象テキストについて、キーワードとその組
み合わせ（共起）の抽出を行う。2. Automatic classification (1) Keyword extraction and co-occurrence creation Extract keywords and their combinations (co-occurrence) for new classification target text.

【００９２】(2) 候補分類選出分類用知識辞書と照合し、分類対象テキストから抽出し
た共起が一つでも存在する分類を、該当分類候補として
選出する。(2) Selection of Candidate Classification A classification that matches at least one co-occurrence extracted from the classification target text by collating with the knowledge dictionary for classification is selected as a corresponding classification candidate.

【００９３】(3) 依存度抽出分類用辞書と分類対象テキストのキーワードの組み合わ
せを照合し、候補分類内で、同一要素単語での組み合わ
せにおける最大依存度を抽出する。また、該当する組み
合わせが候補分類にないキーワードについては、依存度
は0.5とする。(3) Extraction of Dependency The combination of the keyword of the classification dictionary and the keyword of the text to be classified is collated, and the maximum dependence of the combination with the same element word in the candidate classification is extracted. In addition, for a keyword whose corresponding combination is not included in the candidate classification, the degree of dependence is set to 0.5.

【００９４】(4) 重み付け演算キーワードの出現分類数（特徴値）から重みパラメタを
求め、重み付けをする。まず、全分類数と特徴値から
（ｃ）式で重みパラメタを求める。特徴値が少ないほど
その分類に固有のキーワードであると言えるため、重み
パラメタは大きくなる。(4) Weighting calculation A weighting parameter is obtained from the number of occurrence classifications (feature values) of the keyword and weighted. First, a weight parameter is obtained from the total number of classifications and the feature value by using the equation (c). Since the smaller the feature value is, the more the keyword is specific to the classification, the larger the weight parameter is.

【数３】 (Equation 3)

【００９５】次に、ファジィ測度を計算する。このと
き、とくに特徴値の小さいものについては、下図上の式
を用いて加算点演算を行う。これにより、分類に固有な
キーワードと出現分類の多い一般的なキーワードとの重
み付けに差を与えることができる。どちらの式を使用す
るかは、全体の分類数によって適切な閾値を定める。Next, a fuzzy measure is calculated. At this time, especially for those having a small feature value, an addition point calculation is performed using the equation shown in the figure below. As a result, it is possible to give a difference in weighting between a keyword specific to a classification and a general keyword having many occurrence classifications. Which formula is used determines an appropriate threshold according to the total number of classifications.

【数４】 (Equation 4)

【００９６】(5) 分類実効値演算分類対象テキストから抽出した全キーワードにおけるフ
ァジィ測度の重み付け演算値を加算し、その平均を求め
て分類実効値とする。(5) Classification effective value calculation The weighted calculation values of the fuzzy measure for all the keywords extracted from the text to be classified are added, and the average thereof is obtained as the classification effective value.

【数５】 (Equation 5)

【００９７】(6)分類分類対象テキストを分類実効値の最も高い分類とする。
なお、ひとつの分類対象テキストを複数に分類しても良
い場合には、例えば、分類実効値の閾値を設けて該閾値
以上の分類にそれぞれ分類することや、分類実効値の高
い順に所定数分類することとする。(6) Classification The text to be classified is classified as having the highest classification effective value.
If one classification target text may be classified into a plurality of texts, for example, a threshold of the classification effective value may be provided and classified into classifications having the threshold or more, or a predetermined number of classifications may be performed in descending order of the classification effective value. I decided to.

【００９８】次に重み付手段１ｄによる重み付け手順に
ついて説明する。文章の表記上の特徴に着目して、垂要
キーワードを抽出するものである。具体的には、（１）文章の内容を示す重要語は、主語と目的語に含ま
れていることが多い。（２）文章の内容を端的に示したのがタイトルである。Next, the procedure of weighting by the weighting means 1d will be described. It focuses on the notational characteristics of sentences and extracts key keywords. Specifically, (1) important words indicating the contents of a sentence are often included in the subject and the object. (2) The title simply shows the contents of the sentence.

【００９９】（３）重要な文言ほど文章中に頻繁に記載
されている。という特徴に基づいて重要キーワードであ
るか否かを判断するようにしている。なお、具体的な計
算方法については図１９に示す。(3) The more important a word is, the more frequently it is described in a sentence. It is determined whether or not the keyword is an important keyword based on the characteristic described above. FIG. 19 shows a specific calculation method.

【０１００】先ず、前述と同様に対象文書からキーワー
ドを抽出する。但し、抽出された名詞に続く助詞がなに
かや、前後に強調キーワードとして「」””があるか等
もあわせて記憶するようにする。First, keywords are extracted from the target document in the same manner as described above. It should be noted that what the particle following the extracted noun is, and whether there is """" before and after the emphasis keyword are also stored.

【０１０１】次いで抽出された各キーワードに対してそ
れぞれ以下の基本点を付与する。Next, the following basic points are given to each of the extracted keywords.

【０１０２】すなわち、キーワードの次に来る語で基本
点が決定され、具体的にはキーワードの次が主語節を現
す「が」，「は」の場合にはそのキーワードの基本点は
０．８とする。That is, the basic point is determined by the word following the keyword. Specifically, if the word “ga” or “ha” indicates the subject phrase after the keyword, the basic point of the keyword is 0.8. And

【０１０３】また、目的語節を現す「を」，「と」の場
合には０．７を、さらに「で」，「や」，「も」の場合
にも０．７を基本点として付与する。Also, 0.7 is given as a base point for "wo" and "to" representing object phrases, and 0.7 for "de", "ya" and "mo". I do.

【０１０４】さらにまた、タイトルに含まれているキー
ワードであって、本文中にも記載されているものには
０．９を、また、本文中に記載されていないものには
０．６を基本点とする。Furthermore, the keywords included in the title, which are described in the text, are set to 0.9, and those not described in the text are set to 0.6. Point.

【０１０５】そして、上記に該当しないキーワードの基
本点はすべて０．５とする。The basic points of keywords that do not correspond to the above are all set to 0.5.

【０１０６】また、重要なキーワードほど多数回にわた
って文書中に出現するので、複数回出現したものに対し
て以下の計算式に従って加点する（図１９の１〜３参
照）。Also, since the more important keywords appear in the document many times, points that appear more than once are added according to the following formula (see 1-3 in FIG. 19).

【０１０７】Ｖ＝（Ｖ０＋Ｖ１）−（Ｖ０×Ｖ１）……（イ）但し、Ｖ０は１回目に出現されたキーワードの基本点、
Ｖ１は２回目に出現されたキーワードの基本点であり、
求められるＶがそのキーワードの重要度である。そし
て、同一のキーワードが３回以上出現した場合には、上
記求められたＶを式（イ）のＶ０とし、新たに出現され
たキーワードの基本点をＶ１として算出し、以下それを
所要数繰り返す。V = (V0 + V1)-(V0 × V1) (A) where V0 is the basic point of the keyword that appears for the first time,
V1 is the basic point of the keyword that appeared for the second time,
The required V is the importance of the keyword. If the same keyword appears three or more times, the obtained V is calculated as V0 in equation (a), and the basic point of the newly appeared keyword is calculated as V1, and thereafter, the required number of times is repeated. .

【０１０８】接尾語付きのキーワードは減点する（図１
９の４参照）。即ち、基本点に一定の数値（（１以下の
正の数）以下、「減点係数」と称す）を掛け算する。A keyword with a suffix is deducted (see FIG. 1).
9-4). That is, the basic point is multiplied by a certain numerical value (a positive number equal to or less than 1 and referred to as a “deduction coefficient”).

【０１０９】この接尾語としては、例えば「名、量、
風、策、図、表、化、系、圏、材、者、種、数、製、
説、側、属、値、的、度、費、部、法、用、派、比、
率、流、列、例、論、画、群、型、欄、点、性、日、
時、類」等がある。As the suffix, for example, “name, quantity,
Wind, plan, diagram, table, chemical, system, sphere, material, person, species, number, product,
Theory, side, genus, value, target, degree, cost, part, law, use, faction, ratio,
Rate, flow, column, example, theory, drawing, group, type, column, point, gender, day,
Time, kind ".

【０１１０】しかし、上記のような接尾文字が付いてい
るキーワードでも例えば「女性」等のように重要キーワ
ードとなり得るものもあり、一様に同一滅点対象とする
のはまずい。However, some of the keywords with the suffixes as described above can be important keywords, for example, "female".

【０１１１】そこで、接尾文字を含めて２文字の場合は
滅点を少なくした。Therefore, in the case of two characters including the suffix, the number of dark spots is reduced.

【０１１２】具体的には、文字数「２」；Ｖ’＝Ｖ０×
０．７その他；Ｖ’＝Ｖ０×０．５とした（但し、Ｖｏは各単語のもつ基本点である）。Specifically, the number of characters is “2”; V ′ = V0 ×
0.7 Others: V ′ = V0 × 0.5 (where Vo is a basic point of each word).

【０１１３】ひらがな混在のキーワードは滅点する（図
１９の５参照）。そして、このときの滅点係数は「０．
５」とする。[0113] Hiragana-mixed keywords are extinguished (see 5 in FIG. 19). The dark spot coefficient at this time is “0.
5 ".

【０１１４】一文字漢字は滅点する（図１９の６参
照）。The one-letter kanji disappears (see 6 in FIG. 19).

【０１１５】すなわち、例えば「何が」のようにキーワ
ードとしては適さない語句であるにもかかわらず「が」
で加点（基本点が高い）される言葉がある。That is, although a word such as "what" is not suitable as a keyword,
There is a word that is added (the basic point is higher).

【０１１６】そしてこの「何」や「次」等の言葉は、出
現頻度が高く、しかも、接続する助詞が「が」「を」等
の場合が多い。この様に重要キーワードとなり得ない
（役に立たない）語句の重み付けを低く抑えるために一
文字漠字を滅点対象とし、そのときの滅点係数を「０．
７」とした。The words such as “what” and “next” have a high frequency of appearance, and the connecting particles are often “ga” and “wo”. In this way, in order to keep the weight of a phrase that cannot be an important keyword (useless) at a low level, a single character vague character is set as a vanishing point, and the vanishing point coefficient at that time is set to “0.
7 ".

【０１１７】そして、このようにして計算された各キー
ワードの重み情報は図２０に示す通りとなった。The weight information of each keyword calculated as described above is as shown in FIG.

【０１１８】これにより候補語提示手段２ａは、ステッ
プＳ１３においてキーワードを表示する際、重みテーブ
ル１ｊを参照して、重要なキーワードから先に提示して
いる。Thus, when displaying the keywords in step S13, the candidate word presenting means 2a refers to the weight table 1j and presents the important keywords first.

【０１１９】〈第三の実施形態〉本実施形態は、前述の
実施形態と比べ、候補語を提示する際に、先行して選択
された単語により作成された検索文の文脈を反映するた
めに、複数個の候補語の連続を考慮して次の候補語を抽
出する構成とした点が異なり、その他の構成は略同じで
ある。本例において前述の実施形態と同一の要素につい
ては、同符号を付す等して再度の説明を省略している。<Third Embodiment> This embodiment is different from the above-described embodiment in that when presenting a candidate word, it reflects the context of a search sentence created by the word selected earlier. The difference is that the next candidate word is extracted in consideration of the continuation of a plurality of candidate words, and other configurations are substantially the same. In this example, the same elements as those in the above-described embodiment are denoted by the same reference numerals, and the description thereof will not be repeated.

【０１２０】本例では、検索前処理装置１において、候
補語作成手段１ａが作成した候補語を候補語テーブル１
ｆに記憶する際に、この候補語の並び順を所定数記憶し
ておく。本例では、図２１に示すように、一つの候補語
に対して直前の候補語ｎ−１、二つ前の候補語ｎ−２、
三つ前の候補語ｎ−３を記憶している。In this example, in the pre-search processing apparatus 1, the candidate words created by the candidate word creating means 1a are stored in the candidate word table 1.
When the candidate words are stored in f, a predetermined number of arrangement orders of the candidate words are stored. In this example, as shown in FIG. 21, one candidate word is immediately preceding the candidate word n−1, two previous candidate words n−2,
The third previous candidate word n-3 is stored.

【０１２１】これにより文書検索装置２の候補語提示手
段２ａで候補語を提示する際、先行する３つの候補語を
参照し、検索対象文書の文脈に沿った候補語をできるよ
うにしている。Thus, when the candidate word is presented by the candidate word presentation means 2a of the document search device 2, the preceding three candidate words are referred to, and the candidate word in accordance with the context of the document to be searched can be created.

【０１２２】例えば、図８の文章から候補語を作成し、
図２２のように「電話機にＩＤカード」が先行して
選択された場合、前述の実施形態のように参照数が１で
あると、「の」「で」「と」が提示されるが、本例のよ
うに参照数が３であると、「の」「で」に絞られる。な
お、便宜上ひとつの検索対象文書の候補語について示し
たが、実際には勿論複数の文書から候補語を作成し、こ
れを参照して候補語を提示している。For example, a candidate word is created from the sentence in FIG.
As shown in FIG. 22, when “ID for telephone” is selected in advance, if the reference number is 1 as in the above-described embodiment, “of”, “at”, and “to” are presented. If the number of references is 3, as in this example, the number is reduced to "no" and "de". Although one candidate word of the search target document is shown for convenience, the candidate word is actually created from a plurality of documents, and the candidate word is presented by referring to this.

【０１２３】また本例では、先行して選択されている候
補語が参照数に達していないときには表示可能な参照数
まで減らして候補語を提示する。具体的には、一番目の
キーワードに続き、二番目の候補語を選択する際には参
照数を１とし、「に」「デジタル式」「でも」を提示す
る。In this example, when the number of previously selected candidate words has not reached the number of references, the number of displayable reference words is reduced to the number of references that can be displayed. More specifically, when selecting the second candidate word following the first keyword, the number of references is set to 1, and "ni", "digital", and "but" are presented.

【０１２４】同様に三番目のキーワードを選択する際に
は参照数を２とし、「ＩＤカード」を提示する。Similarly, when selecting the third keyword, the number of references is set to 2 and an “ID card” is presented.

【０１２５】〈第四の実施形態〉本実施形態は、前記第
二の実施形態と比べ、キーワードの組み合せを用いて検
索を行う点が異なっており、その他の構成は略同じであ
る。本例において前述の実施形態と同一の要素について
は、同符号を付す等して再度の説明を省略している。<Fourth Embodiment> This embodiment is different from the second embodiment in that a search is performed using a combination of keywords, and other configurations are substantially the same. In this example, the same elements as those in the above-described embodiment are denoted by the same reference numerals, and the description thereof will not be repeated.

【０１２６】図２３は本実施形態の概略構成図である。
検索前処理装置１の組み合せ作成手段１ｅは、検索対象
文書に存在するキーワードの組み合せを作成し、この組
み合せが存在する文書と対応付けてデータベース１０の
組み合せテーブル（組み合せ記憶手段）１ｋに記憶させ
ている。FIG. 23 is a schematic configuration diagram of the present embodiment.
The combination creating means 1e of the search preprocessing device 1 creates a combination of keywords existing in the document to be searched and stores the combination in the combination table (combination storage means) 1k of the database 10 in association with the document in which the combination exists. I have.

【０１２７】先ず、組み合せ作成手段１ｅは、キーワー
ド抽出手段１ｃ’で抽出したキーワードのうち所定単位
の文章（文の始まりから句点までや、括弧で囲まれた文
章、ひとつの段落、改段された見出し等、ひとまとまり
の文章）に含まれるものを複数組み合せる。図２４は、
文の始まりから句点までを一単位とし、二つのキーワー
ドの組み合せを作成した場合の説明図である。First, the combination creating means 1e outputs a sentence of a predetermined unit (a sentence from the beginning of a sentence to a punctuation mark, a sentence enclosed in parentheses, a single paragraph, a paragraph break) from among the keywords extracted by the keyword extracting means 1c '. Combine multiple items included in a group of sentences (headings, etc.). FIG.
FIG. 9 is an explanatory diagram in the case where a combination of two keywords is created by using a unit from the beginning of a sentence to a period as a unit.

【０１２８】また、検索文作成手段２ｂ’は前述の如く
検索者の選択により自然な検索文を作成し、検索ボタン
を選択した場合、検索文から名詞の組み合せを作成し、
この組み合せを検索結果提示手段２ｃ’に受け渡す。図
２５は、検索文として「複数の内線番号を持つ電話」を
採用した場合の説明図である。The search sentence creating means 2b 'creates a natural search sentence by the searcher's selection as described above, and when the search button is selected, creates a combination of nouns from the search sentence.
This combination is passed to the search result presentation means 2c '. FIG. 25 is an explanatory diagram in the case where "phone having a plurality of extension numbers" is employed as a search sentence.

【０１２９】検索結果提示手段２ｃ’は、図２５の組み
合せについて組み合せテーブル１ｋを検索し、この組み
合せが存在する対象文書を検索結果として提示する。The search result presenting means 2c 'searches the combination table 1k for the combination shown in FIG. 25, and presents a target document in which this combination exists as a search result.

【０１３０】このように同じ文章に出現するキーワード
の間には、その文章の意味を表す上で何らかの関連があ
るため、キーワード単体で検索するより組み合わせで検
索するほうが”検索文”という文章の意図が検索に反映
される。即ち、本実施形態によれば、関連性の高い文書
を効率良く検索することができる。Since there is some relationship between the keywords appearing in the same sentence in expressing the meaning of the sentence, it is better to search by combination rather than searching by keyword alone for the purpose of the sentence “search sentence”. Is reflected in the search. That is, according to the present embodiment, a highly relevant document can be efficiently searched.

【０１３１】〈第五の実施形態〉図２６は、本発明の検
索前処理方法及び文書検索方法をコンピュータ２０に実
行させるための構成例を示している。<Fifth Embodiment> FIG. 26 shows an example of a configuration for causing a computer 20 to execute a search preprocessing method and a document search method according to the present invention.

【０１３２】コンピュータ２０は、キーボード２１ａや
ポインティングデバイス２１ｂ等の入力手段２１、プロ
グラムや検索用のデータ等を記憶する記憶手段２２、プ
ログラムに従い処理を行う演算処理部２３、検索結果等
を表示する表示手段（出力手段）２４を備えている。The computer 20 includes input means 21 such as a keyboard 21a and a pointing device 21b, storage means 22 for storing programs, search data, and the like, an arithmetic processing unit 23 for performing processing in accordance with the programs, and a display for displaying search results and the like. Means (output means) 24.

【０１３３】このコンピュータ２０に対し、記録媒体２
５から検索前処理方法を実行させるプログラムを読み込
ませることにより、演算処理部２３や記憶手段２２等
が、前記候補語作成手段１ａや、積算手段１ｂ、キーワ
ード抽出手段１ｃ、データベース１０を構成し、検索前
処理、即ち、検索対象文書を所定単位の語に分割し、検
索条件の候補となる候補語を作成し、前記候補語をその
候補語の出現形態と対応づけて記憶する処理を実行す
る。The recording medium 2 is stored in the computer 20.
5, the arithmetic processing unit 23 and the storage unit 22 constitute the candidate word creation unit 1a, the accumulation unit 1b, the keyword extraction unit 1c, and the database 10, Pre-search processing, that is, a process of dividing a search target document into words in a predetermined unit, creating candidate words as search condition candidates, and storing the candidate words in association with the appearance forms of the candidate words is executed. .

【０１３４】また、このコンピュータ２０に対し、記録
媒体２５から文書検索方法を実行させるプログラムを読
み込ませることにより、演算処理部２３や記憶手段２
２、表示手段２４等が、前記候補語提示手段２ａ、検索
文作成手段２ｂ、検索結果提示手段２ｃを構成し、文書
検索処理、即ち、検索条件の候補となる候補語を提示
し、前記候補語の中から検索者が選択した候補語を一番
目の語として検索文に採用し、前記検索文に採用された
語に続く候補語を提示し、前記候補語の中から検索者が
選択した候補語を検索文に採用して追加し、この検索文
に採用された語に続く候補語の提示と、検索者に選択さ
れた候補語の検索文への追加とを所要回繰り返して検索
文を作成し、前記検索文が含まれることを検索条件とし
て検索対象文書を検索し、検索結果を提示する処理を実
行する。Further, by causing the computer 20 to read a program for executing the document search method from the recording medium 25, the arithmetic processing unit 23 and the storage unit 2 are read.
2. The display means 24 constitutes the candidate word presenting means 2a, the search sentence creating means 2b, and the search result presenting means 2c, and presents a document search process, that is, presents a candidate word as a candidate for a search condition. The candidate word selected by the searcher from the words is adopted as the first word in the search sentence, and the candidate word following the word adopted in the search sentence is presented, and the searcher selects from the candidate words. A candidate word is adopted and added to a search sentence, and a candidate word following the word adopted in the search sentence is presented and the candidate word selected by the searcher is added to the search sentence a required number of times. Is created, a search target document is searched with the inclusion of the search sentence as a search condition, and a process of presenting the search result is executed.

【０１３５】なお、記録媒体２５は、前記検索前処理方
法や前記文書検索方法を実行させるプログラムを記憶し
たフロッピー（登録商標）ディスク、ＣＤ−ＲＯＭ、光
磁気ディスク、半導体メモリなどである。The recording medium 25 is a floppy (registered trademark) disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, or the like in which a program for executing the search preprocessing method or the document search method is stored.

【０１３６】〈その他の実施形態〉上記の実施形態で
は、一つの検索文で検索を行う例を示したが、複数の検
索文を作成し、ＡＮＤ検索やＯＲ検索を行うように構成
しても良い。<Other Embodiments> In the above embodiment, an example in which a search is performed with one search sentence has been described. However, a plurality of search sentences may be created and configured to perform an AND search or an OR search. good.

【０１３７】また、上記実施形態では検索文に採用され
た候補語の後ろに続く候補語を提示した例を示したが、
本発明はこれに限らず、前に続く候補語を提示しても良
い。In the above embodiment, an example is shown in which candidate words following a candidate word used in a search sentence are presented.
The present invention is not limited to this, and a preceding candidate word may be presented.

【０１３８】また、候補語を提示する際、半角・全角の
違いや、カタカナとひらがなの差、長音の有無の等が違
う語も同列に提示しすることや、シソーラスを用いて同
意語や関連語にあたる候補語も提示するように構成して
も良い。In addition, when presenting candidate words, words having different half-width and full-width characters, differences between katakana and hiragana, presence or absence of long sounds, and the like may be presented in the same row, and synonyms and related words may be displayed using a thesaurus. You may comprise so that the candidate word which is a word may also be shown.

【０１３９】上記検索前処理装置と文書検索装置は、別
体に設けても、一体に設けても良い。また、上記検索前
処理装置と文書検索装置は、インターネット等、通信回
線を介した装置に記憶されている文書を対象にしても良
い。例えばインターネット上の各サーバに記憶されてい
る文書は、様々な分野にわたっており、また新語・造語
が使われることもあるので、検索に際しては試行錯誤が
必要となり易いが、本発明では、検索対象文書から候補
語を作成しているので、このような場合でも適切な検索
文が容易に作成でき、特に有用である。また、本発明
は、候補語を参照しながら検索文を作成するので、検索
漏れが少なく、アンケートで収集した文書や、テクニカ
ルサポートへの質問とその回答、コンピュータソフトの
オンラインヘルプなど、多岐にわたる文書から漏れなく
適切な情報を検索する際に好適である。The above-described retrieval pre-processing device and the document retrieval device may be provided separately or integrally. Further, the search pre-processing device and the document search device may target documents stored in a device such as the Internet via a communication line. For example, documents stored in each server on the Internet cover various fields, and new words and coined words are sometimes used. Therefore, trial and error are often required for searching. Since the candidate word is created from the above, an appropriate search sentence can be easily created even in such a case, which is particularly useful. In addition, since the present invention creates a search sentence while referring to candidate words, there are few omissions in the search, and a wide variety of documents such as documents collected by questionnaires, questions and answers to technical support, online help of computer software, etc. It is suitable when searching for appropriate information without omission.

【発明の効果】以上のように本発明によれば、候補語を
用いて自然な検索文を作成し、検索を行うことにより、
容易に適切な検索文を作成することができるようにし、
検索意図が適切に反映された検索を可能とする検索前処
理装置、文書検索装置、検索前処理方法及び文書検索方
法を提供することができる。As described above, according to the present invention, a natural search sentence is created using candidate words, and a search is performed.
Make it easy to create appropriate search statements,
A search preprocessing device, a document search device, a search preprocessing method, and a document search method that enable a search in which a search intention is appropriately reflected can be provided.

[Brief description of the drawings]

【図１】本発明に係る第一の実施形態の概略構成図FIG. 1 is a schematic configuration diagram of a first embodiment according to the present invention.

【図２】第一の実施形態における検索手順のフローチ
ャートFIG. 2 is a flowchart of a search procedure according to the first embodiment.

【図３】文書検索時の表示例を示す図FIG. 3 is a diagram showing a display example at the time of document search;

【図４】文書検索時の表示例を示す図FIG. 4 is a diagram showing a display example at the time of document search.

【図５】文書検索時の表示例を示す図FIG. 5 is a diagram showing a display example at the time of document search.

【図６】文書検索時の表示例を示す図FIG. 6 is a diagram showing a display example at the time of document search.

【図７】文書検索時の表示例を示す図FIG. 7 is a diagram showing a display example at the time of document search.

【図８】検索対象文書を示す図FIG. 8 shows a search target document.

【図９】分かち処理の説明図FIG. 9 is an explanatory diagram of a separation process.

【図１０】候補語テーブルの説明図FIG. 10 is an explanatory diagram of a candidate word table.

【図１１】回数テーブルの説明図FIG. 11 is an explanatory diagram of a frequency table.

【図１２】キーワードテーブルの説明図FIG. 12 is an explanatory diagram of a keyword table.

【図１３】本発明に係る第二の実施形態の概略構成図FIG. 13 is a schematic configuration diagram of a second embodiment according to the present invention.

【図１４】第二の実施形態における検索手順のフロー
チャートFIG. 14 is a flowchart of a search procedure according to the second embodiment.

【図１５】文書検索時の表示例を示す図FIG. 15 is a diagram showing a display example at the time of document search.

【図１６】文書検索時の表示例を示す図FIG. 16 is a diagram showing a display example at the time of document search.

【図１７】文書検索時の表示例を示す図FIG. 17 is a diagram showing a display example at the time of document search.

【図１８】自動分類手順の説明図FIG. 18 is an explanatory diagram of an automatic classification procedure.

【図１９】重み付け処理の説明図FIG. 19 is an explanatory diagram of a weighting process.

【図２０】重みデータの説明図FIG. 20 is an explanatory diagram of weight data.

【図２１】候補語テーブルの説明図FIG. 21 is an explanatory diagram of a candidate word table.

【図２２】複数の候補語の出現形態を考慮する場合の
説明図FIG. 22 is an explanatory diagram in the case of considering appearance forms of a plurality of candidate words.

【図２３】本発明に係る第四の実施形態の概略構成図FIG. 23 is a schematic configuration diagram of a fourth embodiment according to the present invention.

【図２４】キーワードの組み合せ作成の説明図FIG. 24 is an explanatory diagram of creating a combination of keywords.

【図２５】検索文の組み合せ作成の説明図FIG. 25 is an explanatory diagram of creating a combination of search sentences.

【図２６】本発明に係る第五の実施形態の概略構成図FIG. 26 is a schematic configuration diagram of a fifth embodiment according to the present invention.

[Explanation of symbols]

１検索前処理装置１ａ候補語作成手段１ｂ積算手段１ｃ，１ｃ’ キーワード抽出手段１ｄ重み付手段１ｅ組み合せ作成手段２文書検索装置２ａ候補語提示手段２ｂ検索文作成手段２ｃ検索結果提示手段１０データベース２０コンピュータ２１入力手段２２記憶手段２３演算処理部２４表示手段２５記録媒体 DESCRIPTION OF SYMBOLS 1 Search preprocessing apparatus 1a Candidate word creation means 1b Accumulation means 1c, 1c 'Keyword extraction means 1d Weighting means 1e Combination creation means 2 Document search apparatus 2a Candidate word presentation means 2b Search sentence creation means 2c Search result presentation means 10 Database 20 Computer 21 Input unit 22 Storage unit 23 Arithmetic processing unit 24 Display unit 25 Recording medium

フロントページの続き (72)発明者内山恵三東京都千代田区内幸町一丁目１番３号東京電力株式会社内 (72)発明者手塚伊津子東京都千代田区内幸町一丁目１番３号東京電力株式会社内 (72)発明者保田明夫東京都新宿区西新宿６−14−１新宿グリーンタワービル19Ｆ株式会社平和情報センター内 (72)発明者大沼美佐東京都新宿区西新宿６−14−１新宿グリーンタワービル19Ｆ株式会社平和情報センター内 (72)発明者牧田尚子東京都新宿区西新宿６−14−１新宿グリーンタワービル19Ｆ株式会社平和情報センター内Ｆターム(参考） 5B075 ND03 NK02 NK31 NR12 PP13 PP24 PQ02 PQ46 PQ75 PR04Continuing on the front page (72) Inventor Keizo Uchiyama Tokyo Electric Power Company, Inc. 1-3-1, Uchisaiwai-cho, Chiyoda-ku, Tokyo (72) Inventor Itsuko Izuko 1-3-1, Uchisaiwaicho, Chiyoda-ku, Tokyo Tokyo Electric Power Stock Inside the company (72) Inventor Akio Yasuda 6-14-1, Nishi-Shinjuku, Shinjuku-ku, Tokyo Shinjuku Green Tower Building 19F Peace Information Center Co., Ltd. (72) Inventor Misa Onuma 6-14-1 Nishi-Shinjuku, Shinjuku-ku, Tokyo Green Tower Building 19F Peace Information Center Co., Ltd. (72) Inventor Naoko Makita 6-14-1 Nishi-Shinjuku, Shinjuku-ku, Tokyo Shinjuku Green Tower Building 19F Peace Information Center Co., Ltd. F-term (reference) 5B075 ND03 NK02 NK31 PP24 PQ02 PQ46 PQ75 PR04

Claims

[Claims]

1. A document to be searched is divided into words of a predetermined unit,
A search preprocessing apparatus comprising: candidate word creating means for creating a candidate word serving as a candidate for a search condition; and candidate word storage means for storing the candidate word in association with the appearance form of the candidate word. .

2. The method according to claim 1, further comprising: integrating means for integrating the number of occurrences of the same candidate word among the candidate words created by the candidate word creating means; and number-of-times storage means for storing the number of occurrences for each candidate word. The retrieval pre-processing device according to claim 1, wherein:

3. The apparatus according to claim 1, further comprising: candidate word weighting means for performing weighting according to the importance of the candidate word; and weight storage means for storing weighting information for each of the candidate words. 3. The search preprocessing device according to 2.

4. The pre-search processing apparatus according to claim 3, wherein the importance of the candidate word is at least one of the number of appearances and the location of the candidate word.

5. The apparatus according to claim 1, further comprising: keyword extracting means for extracting a keyword from the candidate words; and keyword storing means for storing the keyword.
The search preprocessing device according to 3 or 4.

6. A combination creating means for creating a combination of the keywords present in the search target document, and combination storage means for storing the combination of the keywords in association with the document in which the combination exists. The search preprocessing device according to claim 5, wherein

7. A document search apparatus for searching a plurality of documents as a search target and searching for a document that matches a search condition, comprising: candidate word presenting means for presenting a candidate word as a search condition candidate; A search sentence creating means for creating a search sentence by combining the candidate words presented to the means, and a search result presenting means for presenting a result of searching for the search target with the inclusion of the search sentence as a search condition. A document search device characterized by the following.

8. The search sentence creating unit according to claim 7, wherein, among the candidate words presented by the candidate word presenting unit, a candidate word selected by a searcher is adopted in the search sentence. Document retrieval device.

9. The method according to claim 1, wherein the candidate word presenting unit is configured to divide the search target document into words of a predetermined unit and generate the search word based on the appearance of the candidate word in the search target document. The candidate word following the adopted candidate word is presented to the creating unit, and the search sentence creating unit adopts the candidate word selected by the searcher among the following candidate words and adds the candidate word to the search sentence. 9. The document search device according to claim 8, wherein:

10. The candidate word presenting means, in accordance with the order of a plurality of candidate words adopted prior to the search sentence creating means, sets a next succeeding candidate word in the candidate word search target document. The document search apparatus according to claim 9, wherein the document is determined and presented based on an appearance form of the document.

11. The document search device according to claim 9, wherein the candidate word presenting means presents the candidate words in an order according to the number of appearances of the candidate words in the search document.

12. The document search apparatus according to claim 9, wherein the candidate word presenting means presents the candidate words in an order according to the importance of the candidate words.

13. The document search device according to claim 9, wherein a search sentence is formed by combining a noun or a noun phrase among a plurality of candidate words adopted by the search sentence creating means. .

14. A plurality of candidate words adopted by the search sentence creating means, a noun or a noun phrase is combined into a search sentence, and the search result presenting means divides the search target document into words in a predetermined unit. The combination of keywords that match the search sentence is searched for by referring to the combination of keywords that are nouns or noun phrases among the candidate words created in the above, and a document in which the combination of keywords that match this search sentence exists as a search result 14. The document search device according to claim 9, wherein the document is presented.

15. The method according to claim 1, wherein the search target document is divided into words of a predetermined unit, candidate words serving as search condition candidates are created, and the candidate words are stored in association with the appearance forms of the candidate words. Search preprocessing method.

16. A candidate word which is a candidate for a search condition is presented, and a candidate word selected by a searcher from among the candidate words is adopted as a first word in a search sentence, and a word adopted in the search sentence is provided. , The candidate word selected by the searcher from the candidate words is adopted and added to the search sentence, and the candidate word following the word adopted in the search sentence is presented to the searcher. A search sentence is created by repeating the addition of the selected candidate word to the search sentence a required number of times, a search target document is searched using the search sentence being included as a search condition, and a search result is presented. The document search method you want.

17. A candidate word is created by dividing a search target document into words in a predetermined unit, and the candidate word is stored together with an appearance form in the search target document, and the search text is stored based on the appearance form. 17. The document search method according to claim 16, wherein a candidate word following the word adopted in step (c) is determined.