JPH0973460A

JPH0973460A - Document retrieval device

Info

Publication number: JPH0973460A
Application number: JP7228919A
Authority: JP
Inventors: Miwa Yasutaka; みわ安高
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-09-06
Filing date: 1995-09-06
Publication date: 1997-03-18

Abstract

PROBLEM TO BE SOLVED: To efficiently and accurately retrieve a desired document by the document retrieval device. SOLUTION: This document retrieval device is a document retrieval device which retrieves a desired document from a plurality of documents stored in a file 2 and is equipped with an input part 1 to which words, phrases, part-of- speech information, etc., are inputted, a storage means which adds words and phrases to part-of-speech information and saves them in the file 2, and a CPU 3 which retrieves a word and a phrase matching a key word and part-of-speech information inputted from the input part 1 from the file 2 and displays a document including the word and phrase at a display part 4.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、例えばファイルな
どの保存手段に保存された文書を検索する文書検索装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search device for searching a document stored in a storage means such as a file.

【０００２】[0002]

【従来の技術】従来から、例えばファイルなどに保存さ
れている文書を、あるキーワードを基にフルテキスト検
索する文書検索装置はよく知られているが、上記ファイ
ルに多くの文書を保存可能な今日では、文書検索効率の
向上が望まれている。2. Description of the Related Art Conventionally, a document search apparatus for performing full-text search for a document stored in a file or the like based on a certain keyword is well known. However, it is possible to store many documents in the file today. Then, improvement of document retrieval efficiency is desired.

【０００３】従来の文書検索装置では、通常、ファイル
に多くの文書が登録されていることから、文書検索装置
にある 1つのキーワードだけを入力して、ファイルの中
からそのキーワードと同じ文字および語句などを含む文
書を検索・表示するのには多くの時間が費やされる。Since many documents are usually registered in a file in a conventional document retrieval device, only one keyword in the document retrieval device is input and the same characters and phrases as those keywords are entered in the file. It takes a lot of time to search and display the document including the.

【０００４】ところで、この文書検索装置では、入力し
たキーワードと同じ文字および語句などをもつ文書がフ
ァイルの中に存在せず、文書検索結果が０件ということ
も多々ある。また入力したキーワードと同じ文字および
語句などをもつ文書がファイルの中に多く在り過ぎて文
書検索をかけた意味がなくなる場合もある。By the way, in this document retrieval apparatus, there are many cases where the document having the same characters and phrases as the inputted keyword does not exist in the file, and the document retrieval result is 0. In addition, there are cases where there are too many documents having the same characters and phrases as the input keyword in the file, and the meaning of the document search is meaningless.

【０００５】この場合、せっかく費やした文書検索時間
が無駄になり、ユーザは、前回入力したキーワードと類
似するキーワードを改めて文書検索装置に入力し直し、
新たに文書検索を実行させることになる。In this case, the spent document search time is wasted, and the user inputs again a keyword similar to the previously input keyword into the document search device,
A new document search will be executed.

【０００６】しかしながら、これでは、文書検索効率が
悪いばかりか、入力操作が多くなり業務の停滞を招くこ
とになる。However, this not only results in poor document retrieval efficiency, but also results in a lot of input operations, resulting in a stagnation of work.

【０００７】[0007]

【発明が解決しようとする課題】上述したように、従来
の文書検索装置では、あるキーワードだけで文書を検索
した場合、そのキーワードと同じ文字および語句などを
もつ文書が存在しなかったり多く、また在り過ぎたりし
て、文書検索を再度やり直すことになり文書検索効率が
悪いという問題があった。As described above, in the conventional document search apparatus, when a document is searched for only with a certain keyword, there are many documents that do not have the same characters and phrases as those keywords, and There was a problem that the document search efficiency was poor because the document search was redone because it was too often.

【０００８】本発明はこのような課題を解決するために
なされたもので、所望の文書を効率よく、かつ的確に検
索できる文書検索装置を提供することを目的としてい
る。The present invention has been made to solve the above problems, and an object of the present invention is to provide a document retrieval apparatus capable of efficiently and accurately retrieving a desired document.

【０００９】[0009]

【課題を解決するための手段】上記した目的を達成する
ために、請求項１記載の発明の文書検索装置は、ファイ
ルに登録された所望の文書をキーワードによって検索す
る文書検索装置において、文字列が入力される入力手段
と、前記入力手段から入力された文字列を単語および語
句などに形態素解析する形態素解析手段と、前記形態素
解析手段により形態素解析された単語および語句などに
それぞれ品詞情報を付加して前記ファイルに登録する登
録手段と、文書検索時に、前記入力手段よりキーワード
と共に品詞情報が入力されたとき、これらの検索条件を
基に該当する単語および語句を前記ファイルの各文書中
から検索する検索手段と、前記検索手段により検索され
た単語および語句を含む文書を表示する表示手段とを具
備している。In order to achieve the above-mentioned object, a document search device according to the invention of claim 1 is a document search device for searching a desired document registered in a file by a character string. Input means for inputting, morphological analysis means for morphologically analyzing the character string input from the input means into words and phrases, and part-of-speech information added to the words and phrases morphologically analyzed by the morphological analysis means Then, when the part-of-speech information is input together with the keyword from the input means during the document search, the registration means for registering in the file, and the corresponding words and phrases are searched from each document in the file based on these search conditions. And a display unit for displaying a document including the words and phrases searched by the search unit.

【００１０】この請求項１記載の発明では、形態素解析
手段により形態素解析された単語および語句などには、
それぞれ品詞情報が付加されてファイルに登録される。According to the first aspect of the invention, the words and phrases that have been morphologically analyzed by the morphological analysis means are:
Part of speech information is added to each and registered in the file.

【００１１】そして文書検索時には、入力手段よりキー
ワードと共に品詞情報が入力されると、これらの検索条
件を基に該当する単語および語句がファイルの各文書中
から検索され、その単語および語句を含む文書が表示さ
れる。すなわち、検索範囲を品詞により限定した中で文
書を検索することができる。When part-of-speech information is input together with a keyword from the input means at the time of document retrieval, the corresponding words and phrases are retrieved from each document in the file based on these retrieval conditions, and the document containing the words and phrases is retrieved. Is displayed. That is, it is possible to search for a document while limiting the search range by part of speech.

【００１２】また請求項２記載の発明の文書検索装置
は、ファイルに登録された所望の文書をキーワードによ
って検索する文書検索装置において、文字列が入力され
る入力手段と、前記入力手段から入力された文字列を単
語および語句などに形態素解析する形態素解析手段と、
前記形態素解析手段により形態素解析された単語および
語句などにそれぞれ品詞情報を付加して前記ファイルに
登録する登録手段と、文書検索時に、前記入力手段より
キーワードと共に品詞情報が入力されたとき、これらの
検索条件を基に該当する単語および語句を前記ファイル
の各文書中から検索する検索手段と、前記検索手段によ
り文書中から単語および語句が検索される度にそれぞれ
がヒットする頻度を計数する計数手段と、前記計数手段
により計数された頻度の値が所定値以上の単語および語
句を含む文書を表示する表示手段とを具備している。According to a second aspect of the present invention, there is provided a document retrieving apparatus for retrieving a desired document registered in a file by using a keyword, the input means inputting a character string and the input means. Morpheme analysis means for morpheme analysis of the character string into words and phrases,
A registration unit that adds part-of-speech information to each of the words and phrases that have been morpheme-analyzed by the morpheme-analyzing unit and registers it in the file; Retrieval means for retrieving corresponding words and phrases from each document of the file based on retrieval conditions, and counting means for counting the hit frequency of each word and phrase retrieved from the document by the retrieving means And a display unit for displaying a document including words and phrases whose frequency value counted by the counting unit is a predetermined value or more.

【００１３】この請求項２記載の発明では、上記同様に
形態素解析された単語および語句などにそれぞれ品詞情
報が付加されてファイルに登録される。According to the second aspect of the invention, the words and phrases subjected to the morphological analysis are added with the POS information and registered in the file.

【００１４】そして文書検索時には、入力手段よりキー
ワードと共に品詞情報が入力されると、これらの検索条
件を基に該当する単語および語句がファイルの各文書中
から検索される。この際、文書中から単語および語句が
検索される度に、計数手段によりそれぞれがヒットする
頻度が計数され、その頻度の値が所定値以上の単語およ
び語句を含む文書が表示される。すなわち、検索範囲を
品詞と出現頻度によってさらに限定した中で文書を検索
することができる。When part-of-speech information is input together with a keyword at the time of document search, the corresponding words and phrases are searched from each document in the file based on these search conditions. At this time, each time a word or phrase is searched from the document, the frequency of each hit is counted by the counting means, and a document including the word or phrase whose frequency value is a predetermined value or more is displayed. That is, the document can be searched while the search range is further limited by the part of speech and the appearance frequency.

【００１５】上記により結果、所望の文書を効率よく、
かつ的確に検索できる。As a result of the above, the desired document can be efficiently
And you can search accurately.

【００１６】[0016]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

【００１７】図１は本発明の文書検索装置の一つの実施
形態を示す構成図である。FIG. 1 is a block diagram showing an embodiment of the document retrieval apparatus of the present invention.

【００１８】同図において、１は例えばキーボードおよ
びマウスなどの入力部であり、さまざまな情報、例えば
アスキーコードなどで定められた記号、数字、文字（カ
タカナ、ひらがななどの単語）、文字列（語句、キーワ
ード）および品詞情報（名詞、動詞、形容詞など）が入
力・指示される。２はファイルであり上記さまざまな情
報が形態素解析後、文書として保存される。３はＣＰＵ
であり入力された文字列似ついて日本語解析（辞書引き
・形態素解析、構文解析などを含む）を行い、上記さま
ざまな情報を文書としてファイル２に保存・登録すると
共に、入力部１から入力されたキーワード情報を基にフ
ァイル２内の該当する文書を検索し表示部４に表示す
る。In FIG. 1, reference numeral 1 denotes an input unit such as a keyboard and a mouse, and various information such as symbols, numbers, characters (words such as katakana and hiragana) and character strings (words and phrases) defined by various ASCII codes and the like. , Keyword) and part-of-speech information (noun, verb, adjective, etc.) are input / instructed. Reference numeral 2 is a file, and the above various information is stored as a document after morphological analysis. 3 is CPU
The Japanese character analysis (including dictionary lookup / morphological analysis, syntactic analysis, etc.) is performed on the input character string similarity, and the various information described above is stored / registered as a document in the file 2 and input from the input unit 1. The corresponding document in the file 2 is searched based on the keyword information and displayed on the display unit 4.

【００１９】次に、図２および図４を参照してこの文書
検索装置の動作を説明する。Next, the operation of this document retrieval apparatus will be described with reference to FIGS. 2 and 4.

【００２０】図２はファイル２内の語句−品詞対応テー
ブルを示す図、図３はこの文書検索装置の検索動作を示
すフローチャート、図４は語句が検索される様子を模式
的に示す図である。この文書検索装置はフルテキスト検
索機能を有するものである。この文書検索装置では、入
力部１によって上記さまざまな情報、例えば文字列が入
力されると、ＣＰＵ３により形態素解析が行われ、文字
列が意味のある文字および語句単位に分割される。そし
てその分割された文字および語句単位に品詞情報が付加
されてファイル２に保存・登録される。FIG. 2 is a diagram showing a word / phrase-part-of-speech correspondence table in the file 2, FIG. 3 is a flow chart showing a search operation of this document search device, and FIG. 4 is a diagram schematically showing how words are searched. . This document search device has a full-text search function. In this document retrieval apparatus, when the above-mentioned various information, for example, a character string is input by the input unit 1, the CPU 3 performs a morpheme analysis and divides the character string into meaningful characters and word units. Then, the part-of-speech information is added to each of the divided characters and phrases, and is stored / registered in the file 2.

【００２１】この際、ＣＰＵ３はファイル２内に、図２
に示すように、テーブル（インデックスファイル）２０
を作成する。このテーブル２０は、形態素解析した文字
および語句２１などにそれぞれ品詞情報２２を対応付け
たテーブルである。At this time, the CPU 3 stores the file 2 in FIG.
As shown in, the table (index file) 20
Create This table 20 is a table in which part of speech information 22 is associated with each of the characters and phrases 21 subjected to morphological analysis.

【００２２】例えば「文書」、「敬順」「敬忠」という
語句には「名詞」が対応付けられ、「敬」という文字に
は「動詞」が対応付けられ、「恭し」という語句には
「形容詞」が対応付けられる。For example, the words "document", "respect" and "respect" are associated with "nouns", the letters "respect" are associated with "verbs", and the phrase "relief" is associated with them. The "adjective" is associated.

【００２３】この文書検索装置では、文書を検索する場
合、入力部１を所定操作することによりＣＰＵ３は文書
検索状態となり、表示部４が検索文字列（キーワード）
の入力待機状態となる。In this document retrieval apparatus, when retrieving a document, the CPU 3 enters a document retrieval state by operating the input unit 1 in a predetermined manner, and the display unit 4 displays a retrieval character string (keyword).
Will be in the input standby state.

【００２４】ここで、検索者が検索キーワードが適切か
否かを調べるために自分が考えている言葉、例えば
「敬」などと品詞、例えば「名詞」などを思い浮かべた
とする。そしてキーワードとして、例えば「敬」と、品
詞情報、例えば「名詞」などが入力部１から入力される
と、ＣＰＵ３により形態素解析処理および式の処理など
が成された後（図３のステップ301 ）、その「敬」と
「名詞」の組み合わせた検索条件で、ファイル２内の文
書を一つ一つに対して文字列のマッチング処理（検索処
理）および品詞情報のマッチング処理が実行される（ス
テップ302 〜ステップ303 ）。Here, it is assumed that the searcher thinks of a word he is thinking about, such as "respect", and a part of speech, such as "noun", in order to check whether the search keyword is appropriate. Then, for example, "respect" and part-of-speech information such as "noun" are input from the input unit 1 as keywords, and after the morphological analysis processing and the expression processing are performed by the CPU 3 (step 301 in FIG. 3). , The character string matching process (searching process) and the part-of-speech information matching process are executed for each document in the file 2 under the search condition in which "Kei" and "Noun" are combined (step 302-Step 303).

【００２５】この場合、図４に示すように、テーブル２
０には、「敬」で始まる語句が 3つあるが、そのうち検
索条件４１の品詞が「名詞」であるものは「敬順」およ
び「敬忠」の 2つなので、この 2つの語句が抽出され、
これらの語句を含む文書がヒットし、このヒットした文
書がＣＰＵ３の出力エリアに格納される（ステップ304
）。その後、その文書はＣＰＵ３から表示部４に出力
されて表示される。In this case, as shown in FIG.
In 0, there are three words and phrases that start with "respect", but two of them are "respect" and "respect" because the part of speech in search condition 41 is "noun". ,
A document containing these words is hit, and the hit document is stored in the output area of the CPU 3 (step 304).
). After that, the document is output from the CPU 3 to the display unit 4 and displayed.

【００２６】このようにこの実施例の文書検索装置によ
れば、文書登録時に各語句に品詞情報として、例えば名
詞、動詞および形容詞などを付加してファイル２に保存
しておくことにより、文書検索時に、ユーザが所望の文
書を検索するために検索範囲をキーワードと品詞によっ
て限定した中で文書が検索される。As described above, according to the document search apparatus of this embodiment, when a document is registered, each word / phrase is added as part-of-speech information, for example, a noun, a verb, and an adjective, and is stored in the file 2. Sometimes, a user searches for a document by limiting the search range with keywords and parts of speech to search for a desired document.

【００２７】すなわち、検索者が検索キーワードが適切
か否かを調べるために自分が考えている言葉、例えば
「敬」などと品詞、例えば「名詞」などを入力すれば、
ファイル２に登録された複数の文書の中で「敬」で始ま
る言葉で、かつ名詞である語句を含む文書が列挙される
ので、所望の文書を効率よく、かつ的確に検索すること
ができる。That is, if the searcher inputs a word he / she thinks for checking whether or not the search keyword is appropriate, for example, "respect" and a part of speech, for example, "noun",
Among the plurality of documents registered in the file 2, since the documents that include the phrase that is a noun and the word that starts with "respect" are listed, the desired document can be searched efficiently and accurately.

【００２８】次に、図５および図７を参照して他の実施
例を説明する。Next, another embodiment will be described with reference to FIGS.

【００２９】図５はこの他の実施例の文書検索装置によ
りファイルに登録された語句、品詞情報および出現回数
との関係を示す図、図６は図５の文書検索装置の文書検
索動作を示すフローチャート、図７はこの文書検索装置
によって語句が検索される様子を模式的に示す図であ
る。FIG. 5 is a diagram showing a relationship between a word / phrase registered in a file, a part of speech information, and the number of appearances, which is registered in a file by the document search apparatus of another embodiment, and FIG. 6 shows a document search operation of the document search apparatus of FIG. FIG. 7 is a flowchart schematically showing how words are searched for by this document search device.

【００３０】この実施例の場合、形態素解析後、ファイ
ル２へ文書を登録する際、ＣＰＵ３により、図５に示す
ように、テーブル（インデックスファイル）５０が作成
される。In this embodiment, when the document is registered in the file 2 after the morphological analysis, the CPU 3 creates a table (index file) 50 as shown in FIG.

【００３１】このテーブル５０は、形態素解析した文字
および語句などにそれぞれ品詞情報を対応付けたテーブ
ルであるが、これら文字や語句が、文書検索時にヒット
する度に頻度情報５１が付加されるようになっている。
この頻度情報５１は、ヒットする度に数値が加算される
ものである。The table 50 is a table in which morphologically analyzed characters and phrases are associated with part-of-speech information, and the frequency information 51 is added each time these characters or phrases are hit during document retrieval. Has become.
The frequency information 51 is such that a numerical value is added each time a hit is made.

【００３２】この文書検索装置では、文書を検索する場
合、入力部１を所定操作することによりＣＰＵ３は文書
検索状態となり、表示部４が検索文字列（キーワード）
の入力待機状態となる。In this document retrieval apparatus, when a document is retrieved, the CPU 3 enters a document retrieval state by operating the input unit 1 in a predetermined manner, and the display unit 4 displays a retrieval character string (keyword).
Will be in the input standby state.

【００３３】ここで、上記実施例同様に、入力部１から
キーワード、例えば「敬」と、品詞情報、例えば「名
詞」などが入力されると、ＣＰＵ３により形態素解析処
理および式の処理などが成された後（図６のステップ60
1 ）、その「敬」と「名詞」の組み合わせた検索条件
で、ファイル２内の文書を一つ一つに対して文字列のマ
ッチング処理（検索処理）、品詞情報のマッチング処理
を行い（ステップ602 〜ステップ603 ）、さらにこれら
のマッチング処理により得られた結果に対して頻度情報
のマッチング処理が実行される（ステップ604 ）。Here, as in the above-described embodiment, when a keyword such as "Kei" and part-of-speech information such as "Noun" are input from the input unit 1, the CPU 3 performs morphological analysis processing and expression processing. After being done (Step 60 in FIG. 6)
1) Then, under the search condition that combines "respect" and "noun", character string matching processing (search processing) and part-of-speech information matching processing are performed for each document in file 2 (step 602 to step 603), and frequency information matching processing is executed on the results obtained by these matching processing (step 604).

【００３４】この場合、図７に示すように、テーブル５
０には、「敬」で始まる語句の 3つある。そのうち品詞
情報としての「名詞」は「敬順」および「敬忠」の 2つ
である。さらに、「敬順」および「敬忠」の頻度情報
は、それぞれ“１６”と“６”である。この場合、検索
条件５１に出現回数が10回以上という設定があり、「敬
順」および「敬忠」の中から出現回数が10回以上のも
の、つまり「敬順」 1つだけを含む文書が出力の対象と
なる。したがって「敬順」を含む文書のみがＣＰＵ３の
出力エリアに格納される（ステップ605 ）。その後、そ
の文書はＣＰＵ３から表示部４に出力されて表示され
る。In this case, as shown in FIG.
In 0, there are three words that start with "respect". Of these, "nouns" as part-of-speech information are "respect" and "respect". Further, the frequency information of "respect" and "respect" is "16" and "6", respectively. In this case, the search condition 51 has a setting that the number of appearances is 10 or more, and a document that has 10 or more appearances from “respect” and “respect” is output, that is, a document that includes only one “respect” is output. Be subject to. Therefore, only the document including "respect" is stored in the output area of the CPU 3 (step 605). After that, the document is output from the CPU 3 to the display unit 4 and displayed.

【００３５】このようにこの実施例の文書検索装置によ
れば、文書登録時に各語句に品詞情報として、例えば名
詞、動詞および形容詞などを付加してファイル２に保存
しておき、文書検索時に、ユーザが所望により入力部１
からキーワードと品詞情報を入力すれば、これら情報が
ヒットした頻度（出現回数）が10回以上のものについて
文書が検索されるので、さらに検索範囲を限定した中で
所望の文書を効率よく、かつ的確に検索することができ
る。なおこの文書検索装置では、出現回数を10回以上に
設定したが、この値は初期設定時点で所望により変更可
能である。As described above, according to the document search apparatus of this embodiment, when a document is registered, each word / phrase is added as part-of-speech information, for example, a noun, a verb, and an adjective, and stored in the file 2. Input unit 1 if desired by the user
If you enter the keyword and part-of-speech information from, the documents are searched for when the frequency (occurrence number) of these information hits is 10 times or more, so you can efficiently search for the desired document while limiting the search range. You can search accurately. Although the number of appearances is set to 10 or more in this document retrieval device, this value can be changed as desired at the time of initial setting.

【００３６】またこの文書検索装置は、上記したフルテ
キスト検索のみに限定されるものではない。例えば入力
部１によって文字および文字列などが順次入力されて文
書を作成しその文書をファイル２に保存登録後、その文
書を呼び出すための任意の文字列、すなわちキーワード
を登録する、いわゆるキーワード検索の場合にも適用す
ることができる。The document search device is not limited to the above-mentioned full-text search. For example, a character string and a character string are sequentially input by the input unit 1 to create a document, store the document in the file 2 and register it, and then register an arbitrary character string for calling the document, that is, a keyword, which is a so-called keyword search. It can also be applied in cases.

【００３７】この場合、このキーワードとしては、一般
的に作成した文書を代表する意味の語句を登録すること
になるが、文書中に存在しない語句でもよい。この場
合、この語句が入力部２から入力されれば、その文書を
ダイレクトに検索できる。In this case, a word having a meaning representative of the created document is generally registered as this keyword, but a word not existing in the document may be used. In this case, if this phrase is input from the input unit 2, the document can be directly searched.

【００３８】[0038]

【発明の効果】以上説明したように請求項１記載の文書
検索装置によれば、文書登録時に文字および語句に品詞
情報を付加して登録しておき、文書検索時にはキーワー
ドと品詞情報で検索範囲を限定した中で文書が検索され
るので、検索範囲を狭めて所望の文書を効率よく、かつ
的確に検索することができる。As described above, according to the document search apparatus of the first aspect, the part of speech information is added to the character and the phrase at the time of document registration and registered, and the search range is the keyword and the part of speech information at the time of document search. Since the document is searched in the limited range, the search range can be narrowed to search the desired document efficiently and accurately.

【００３９】また請求項２記載の文書検索装置によれ
ば、文書登録時に文字および語句に品詞情報を付加して
登録しておき、文書検索時にはキーワード、品詞および
これらの情報がヒットした頻度の値で検索範囲をさらに
限定した中で文書を検索するので、検索範囲をさらに狭
めて所望の文書を効率よく、かつ的確に検索することが
できる。Further, according to the document search apparatus of the second aspect, when the document is registered, the part of speech information is added to the character and the phrase and registered, and at the time of the document search, the keyword, the part of speech, and the value of the frequency of hitting these pieces of information. Since a document is searched for in a more limited search range, a desired document can be searched efficiently and accurately by further narrowing the search range.

[Brief description of drawings]

【図１】本発明に係る文書検索装置の一つの実施形態を
示す図である。FIG. 1 is a diagram showing an embodiment of a document search device according to the present invention.

【図２】この文書検索装置によりファイルに登録された
語句と品詞情報との関係を示す図である。FIG. 2 is a diagram showing a relationship between a word / phrase registered in a file by this document search device and part-of-speech information.

【図３】この文書検索装置の文書検索動作を示すフロー
チャートである。FIG. 3 is a flowchart showing a document search operation of this document search device.

【図４】この文書検索装置により語句が検索される様子
を模式的に示す図である。FIG. 4 is a diagram schematically showing how words are searched by this document search device.

【図５】他の実施例の文書検索装置によりファイルに登
録された語句、品詞情報および出現回数（ヒットした頻
度）との関係を示す図である。FIG. 5 is a diagram showing a relationship between a word / phrase registered in a file by a document search device according to another embodiment, part-of-speech information, and the number of appearances (hit frequency).

【図６】図５の文書検索装置の文書検索動作を示すフロ
ーチャートである。6 is a flowchart showing a document search operation of the document search device of FIG.

【図７】この文書検索装置により語句が検索される様子
を模式的に示す図である。FIG. 7 is a diagram schematically showing how words are searched for by this document search device.

【符号の説明】１…入力部、２…ファイル、３…ＣＰＵ、４…表示部。[Explanation of Codes] 1 ... Input unit, 2 ... File, 3 ... CPU, 4 ... Display unit.

Claims

[Claims]

1. A document retrieval device for retrieving a desired document registered in a file by a keyword, input means for inputting a character string, and morphological analysis of the character string input from the input means into words and phrases. A morphological analysis unit, a registration unit that adds part-of-speech information to each of the words and phrases that have been morphologically analyzed by the morpheme analysis unit, and registers it in the file; A search means for searching the corresponding words and phrases from each document of the file based on these search conditions, and a display means for displaying a document containing the words and phrases searched by the search means. A document retrieval device characterized by being provided.

2. A document retrieval apparatus for retrieving a desired document registered in a file by a keyword, input means for inputting a character string, and morphological analysis of the character string input from the input means into words and phrases. A morphological analysis unit, a registration unit that adds part-of-speech information to each of the words and phrases that have been morphologically analyzed by the morpheme analysis unit, and registers it in the file; When the search is performed, a search unit that searches corresponding documents and phrases from each document of the file based on these search conditions, and hits each time the search unit searches the document for the words and phrases. Counting means for counting the frequency, and the value of the frequency counted by the counting means is a word or phrase of a predetermined value or more And a display unit for displaying the included document.