JPH03248272A

JPH03248272A - Document retrieving device

Info

Publication number: JPH03248272A
Application number: JP2046201A
Authority: JP
Inventors: Masao Ito; 正雄伊藤; Yoshihiro Hayakawa; 早川　佳宏; Yuji Sugano; 祐司菅野; Atsushi Ando; 安藤　敦史; Noboru Tamura; 登田村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-02-27
Filing date: 1990-02-27
Publication date: 1991-11-06
Anticipated expiration: 2012-10-08
Also published as: JP2661312B2

Abstract

PURPOSE:To reduce the retrieval noise for document information retrieval by subjecting the result of character string collation to morpheme analysis and discriminating whether a character string coinciding with a collation character string is included in morphemes or not and deleting sentences which do not include it. CONSTITUTION:Retrieval of documents related to fire is inputted to a retrieval condition input device 1 as a retrieval condition, and the collation character string at this time is defined as 'fire'. Two documents, namely, an example 1 'The fire at last night is immediately extinguished.' and an example 2 'A series of cases of fire due to arson are solved.' are stored in a storage device. When receiving the collation character string 'fire' from the device 1, a retrieving device 3 collates it with character strings of documents in the device 2. Collation results in success with respect to both of examples 1 and 2. Collation results are sent to a storage device 4, and sentences which succeed in collation are stored. A morpheme analyzing device 5 subjects documents stored in the device 4 to morpheme analysis to recognize that the collation character string 'fire' is included in morphemes, and the results are stored in a secondary result storage device 7, and retrieval results are displayed on a display device 8.

Description

【発明の詳細な説明】産業上の利用分野本発明は計算機を利用した文書検索装置に関するもので
ある。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a document retrieval device using a computer.

従来の技術近年、ワードブロセｙすやパーソナルコンビーータの普
及やコンビーータによる文字認識の実用化に伴い、これ
らのコンビーータ等によって作成される電子化文書が多
くなってきた。このため、大量の文書情報を蓄積し、必
要に応じて文書情報を検索するための文書データベース
に対する関心が高１ってきている。2. Description of the Related Art In recent years, with the spread of word brochures and personal converters and the practical use of character recognition using converters, an increasing number of electronic documents have been created using these converters. For this reason, there is a growing interest in document databases that store a large amount of document information and search for document information as needed.

従来の文書データベースでは、文書を検索する場合、文
書毎に付されたキーワードを利用するキーワード検索が
一般的であった。しかし、キーワード付は作業が蓄積文
書の増加に間に合わないこと、時間の経過と伴なってキ
ーワードが陳腐化すること、データベース管理者の予想
を越えたキーワードによる検索には対応できずに検索漏
れが多くなること等の問題点があった。In conventional document databases, when searching for documents, a keyword search using keywords attached to each document is common. However, adding keywords does not keep up with the increase in the number of accumulated documents, keywords become obsolete over time, and it is not possible to handle searches using keywords that exceed the expectations of database administrators, resulting in missed searches. There were problems such as an increase in the number of people.

このような背景から最近は、全文データベースと呼ばれ
る文書データベースが注目されている。Against this background, document databases called full-text databases have recently attracted attention.

つまり、全文データベースでは、利用者から与えられた
検索条件と蓄積されている文書の全ての情報との間で照
合を行い、検索条件を満たす文書を出力するから、検索
条件では、従来のキーワードのような単語以外に、文な
どの文字列を用いることができる利点がある。In other words, in a full-text database, the search conditions given by the user are compared with all the information of the stored documents, and the documents that meet the search conditions are output. There is an advantage that character strings such as sentences can be used in addition to words such as .

発明が解決しようとする課題ところが、従来の全文データベースでは、被照合文字列
が文書中のどこにあっても、照合が成功するように検索
条件を設定するため、利用者の予想しなかった文字列と
検索条件との照合が成功してしまい、その結果不必要な
文書を検索ノイズとして出力してしまうことが多くなる
。この検索ノイズとは、例えば「火事」という検索条件
に対する「放火事件」、「文化」という検索条件に対す
る「条文化する」などのノイズであるが、この種の検索
ノイズは、照合文字列の長さが短いほど増加する。Problem to be Solved by the Invention However, in conventional full-text databases, search conditions are set so that matching is successful no matter where the string to be matched is located in the document. and the search conditions are successfully matched, and as a result, unnecessary documents are often output as search noise. This search noise is, for example, noise such as ``arson incident'' in response to the search condition ``fire,'' or ``article culture'' in response to the search condition ``culture.'' The shorter the length, the more it increases.

本発明の課題は、以上のような全文データベースでの検
索ノイズの増加に鑑み、文書情報を検索する際、検索ノ
イズを減少させることを目的とする。In view of the increase in search noise in full-text databases as described above, an object of the present invention is to reduce search noise when searching document information.

課題を解決するための手段本発明は、照合文字列を含む検索条件を入力する検索条
件入力装置と、文書データを格納する文書データ記憶装
置と、入力した前記検索条件から前記文書データを検索
する文字列照合装置と、この文字列照合装置の照合結果
を１次結果として記憶する１次結果記憶装置と、この１
次結果記憶装置の１次結果から形態素解析を行う形態素
解析装置と、形態素解析を行う場合に必要な辞書を記憶
する辞書記憶装置と、前記形態素解析装置の形態素解析
による有効な文書を記憶する２次結果記憶装置と、この
２次結果記憶装置装置の２次結果を表示する検索結果表
示装置とから構成され、前記１次結果を形態素解析し、
解析した形態素の中に照合文字列が含まれている文書の
みを前記２次結果記憶装置に出力させるものである。即
ち、本発明では、文字列照合を行った結果に対して、さ
らに形態素解析を行い、解析を行った形態素の中に、含
まれていない文書を削除することにより検索の精度を向
上させる。いい換えると、本発明では、形態素解析を行
う範囲を縮小させて、形態素解析の時間を縮小させる。Means for Solving the Problems The present invention provides a search condition input device for inputting search conditions including a matching character string, a document data storage device for storing document data, and a search for the document data based on the input search conditions. a string matching device; a primary result storage device that stores the matching results of the string matching device as a primary result;
A morphological analysis device that performs morphological analysis from the primary results of the next result storage device, a dictionary storage device that stores a dictionary necessary for performing morphological analysis, and 2 that stores valid documents obtained by the morphological analysis of the morphological analysis device. It is composed of a next result storage device, and a search result display device that displays the secondary results of this secondary result storage device, and performs morphological analysis on the primary results,
Only documents in which a matching character string is included in the analyzed morphemes are outputted to the secondary result storage device. That is, in the present invention, the accuracy of the search is improved by further performing morphological analysis on the result of character string matching and deleting documents that are not included in the analyzed morphemes. In other words, in the present invention, the range in which morphological analysis is performed is reduced, and the time for morphological analysis is reduced.

作　　　　用本発明吟前述した構成により、利用者が入力した検索条
件から照合に必要となる文字列を抽出し、その文字列を
用いて文書データベースから全文検索を行い、１次結果
を得る。この１次結果には、日本語が分かち書きされて
いない（単語間が英語のようにスペース等で区切されて
いない）ために生じる、単語間にまたがって一致したり
、単語の一部分で一致した文書も含まれている可能性が
あるので、（１）　　これらの文書の形態素解析を行い、形態素の
中に照合文字列が含まれているかどうかを判定し、含ま
れていないものを削除するから、正確な検索が可能とな
る。Operation of the present invention With the configuration described above, a character string necessary for verification is extracted from the search conditions input by the user, and a full text search is performed from a document database using the character string to obtain a primary result. This primary result includes documents that match across words or match parts of words, which occurs because Japanese is not separated (words are not separated by spaces like in English). (1) Perform morphological analysis of these documents, determine whether the matching string is included in the morphemes, and delete those that do not. Accurate search becomes possible.

（２）また形態素解析は処理量が大きいので、解析時析
を行ってから照合文字列と照合するのではなく、照合文
字列を形態素と仮定してから、形態素解析を行い、照合
文字列の付近で形態素解析が失敗した場合は、照合文字
列が形態素となシえないため、その文書を削除するため
、処理時間を短縮できる。(2) Morphological analysis requires a large amount of processing, so instead of performing parsing and then matching the matching string, assume that the matching string is a morpheme, then perform morphological analysis, and then If morpheme analysis fails in the vicinity, the document is deleted because the matching character string cannot be used as a morpheme, thereby reducing processing time.

（３）形態素解析の処理量を少なくするために、文書全
体を検索するのではなく、照合文字列の前後の句読点か
ら、照合文字列を含む文だけを抽出してその文に対して
のみ解析を行うので、処理時間を短縮できる。(3) In order to reduce the amount of processing required for morphological analysis, instead of searching the entire document, only the sentences that include the matching string are extracted from the punctuation marks before and after the matching string, and only that sentence is analyzed. , processing time can be shortened.

（４）さらに形態素解析の処理量を少なくするために、
本発明では、照合文字列の前後の平板名・かたかな・漢
字・アルファベットの種類が異なる位置をさがして、形
態素の区切れになりそうな個所を抽出し、その部分のみ
解析を行うから、さらに処理時間を短縮できる。(4) In order to further reduce the amount of processing for morphological analysis,
In the present invention, the positions before and after the collated character string where the type of plain name, katakana, kanji, or alphabet differ are extracted, and the place that is likely to be a morpheme break is extracted, and only that part is analyzed. Furthermore, processing time can be reduced.

実施例以下、図面を用いて本発明と雄側の詳細を説明する。Example Hereinafter, the present invention and the details of the male side will be explained using the drawings.

第１図は本発明による文書検索装置の構成を示す概略図
であり、符号１は利用者が入力した検索条件から照合文
字列を抽出する検索条件入力装置を、２は文書データを
記憶する文書データ記憶装置を、３は検索条件入力装置
１から照合文字列を受は取シかつ文書データ記憶装置２
の文書データから文字列照合を行う文字列照合装置を、
４は文字列照合装置３の文字列照合に成功した文書を記
憶する１次結果記憶装置を、５は１次結果記憶装置４の
文書を形態素解析して照合文字列が形態素の中に含まれ
るかどうかを判定する形態素解析装置をそれぞれ示す。FIG. 1 is a schematic diagram showing the configuration of a document search device according to the present invention, in which reference numeral 1 denotes a search condition input device for extracting matching character strings from search conditions input by the user, and 2 denotes a document storing document data. 3 receives the collated character string from the search condition input device 1 and the document data storage device 2;
A string matching device that matches strings from document data.
Reference numeral 4 denotes a primary result storage device that stores documents that have been successfully matched with strings by the character string matching device 3, and 5 designates a primary result storage device that morphologically analyzes the documents in the primary result storage device 4 so that the matched strings are included in the morphemes. Each shows a morphological analysis device that determines whether

そして、符号６は形態素解析装置５が形態素解析を行う
のに必要な辞書を記憶する辞書記憶装置、７形態素解析
装置５が判定した文書を記憶する２次結果記憶装置、８
は２次結果記憶装置７の文書検索結果表示装置、９は全
体の制御を行う中央処理装置である。Reference numeral 6 denotes a dictionary storage device that stores a dictionary necessary for the morphological analysis device 5 to perform morphological analysis; 7 a secondary result storage device that stores documents determined by the morphological analysis device 5; 8;
is a document search result display device of the secondary result storage device 7, and 9 is a central processing unit that performs overall control.

つぎに、第１図に示した文書検索装置の動作を第２図の
処理の流れを示す例を用いて説明すると、検索条件入力
装置１は火事に関する文書を検索することを検索条件と
し、このさきの照合文字列は「火事」とする。まだ文書
データ記憶装置２には、文書として、例文１：「昨日の
火事はすぐに鎮火した。」と、例文２：「一連の放火事
件は解決した。」の２つが入っているものとする。した
がって、文字列検索装置３Ｉ′ｉ検索条件入力装置１か
ら照合文字列「火事」を受は取ると、文書データ記憶装
置２の文書から文字列照合を行う。この例では、例文１
・２とも照合に成功する。その照合の結果は１次結果記
憶装置４に送られ、文字列検索装置３が照合に成功した
文章（即ち、この例では例文１・２）が記憶される。Next, the operation of the document search device shown in FIG. 1 will be explained using an example showing the process flow in FIG. The first match string is "fire." It is assumed that the document data storage device 2 still contains two documents: Example Sentence 1: "Yesterday's fire was extinguished quickly." and Example Sentence 2: "The series of arson incidents have been solved." . Therefore, when the character string retrieval device 3I'i receives the collated character string "fire" from the search condition input device 1, the character string is collated from the document in the document data storage device 2. In this example, example sentence 1
・Verification of both is successful. The results of the comparison are sent to the primary result storage device 4, and the sentences successfully matched by the character string search device 3 (ie, example sentences 1 and 2 in this example) are stored.

つぎに、形態素解析装置５は、１次結果記憶装置４に記
憶された文書を形態素解析して形態素の中に照合文字列
が含まれているかどうかを判定するが、この形態素解析
装置５では、例文１ｉｌ″を第２図に示すように「昨日
」「の」「火事」「は」「すぐ」「に」「鎮火Ｊ　ｒ　
ＬＪ　ｒた」「。」というように形態素解析されたこと
になるので、形態素の中に照合文字列「火事」が含まれ
ていると認識される。また、この形態素解析装置５では
、例文２を、第′２図に示すように「一連」「の」「放
火」「事件」「は」「解決Ｊ　ｒ　Ｌｌ　ｒた」「。」
というように形態素解析するが、これらの形態素の中に
照合文字列「火事」が含まれていないことを認識する。Next, the morphological analysis device 5 morphologically analyzes the document stored in the primary result storage device 4 and determines whether or not the matching character string is included in the morphemes. As shown in Figure 2, the example sentences ``1il'' are ``yesterday'', ``no'', ``fire'', ``ha'', ``soon'', ``ni'', and ``extinguishing the fire''.
Since the morphemes have been analyzed as ``LJ rta'' and ``.'', it is recognized that the matching character string ``fire'' is included in the morphemes. In addition, this morphological analysis device 5 converts example sentence 2 into ``series'', ``no'', ``arson'', ``incident'', ``wa'', ``resolved J r Ll r ta'', and ``.'' as shown in Figure '2.
The morphemes are analyzed as follows, but it is recognized that the matching character string ``fire'' is not included in these morphemes.

なお、この形態素解析装置５が形態素解析に必要とする
辞書は辞書記憶装置６に記憶されている辞書である。The dictionary that this morphological analysis device 5 requires for morphological analysis is the dictionary stored in the dictionary storage device 6.

このようにして、形態素解析装置２によって文が形態素
解析され、形態素の中に照合文字列が含まれている文書
が２次結果記憶装置７に記憶される。つまり、前述した
例では、２次結果として、形態素が含まれている例文１
のみが前記２次結果記憶装置７に記憶されるけれども、
この２次結果記憶装置７の文書は検索結果記憶装置８で
表示される。In this manner, the sentence is morphologically analyzed by the morphological analysis device 2, and documents whose morphemes include matching character strings are stored in the secondary result storage device 7. In other words, in the above example, the secondary result is example sentence 1 that contains the morpheme.
Although only the results are stored in the secondary result storage 7,
The documents in the secondary result storage device 7 are displayed on the search result storage device 8.

以上に述べたように、本発明の第１実施例によれば、文
字列照合した結果に対して形態素解析を施すことにより
、検索ノイズを縮小させることができる。As described above, according to the first embodiment of the present invention, search noise can be reduced by performing morphological analysis on the result of character string matching.

次に、第３図は本発明の第２実施例による形態素解析装
置５の形態素解析の方式を説明図であり、符号３１は１
次結果の文書の例であり、「一連の放火事件は解決した
。」という文が入っており、照合文字列は「火事」の場
合とする。この場合、同文書を形態素解析装置５が形態
素解析する場合、照合文字列を形態素とみなしてから、
形態素解析を行なう。つまり、符号３２は照合文字列「
火事」を形態素とみなしたときの形態素解析結果であり
、この場合「一連の」までは形態素解析できるが、「放
」のところで形態素解析に失敗し、そこで処理を終了す
る。いい換えると、全文を形態素解析するのではなく、
照合文字列の前後まで形態素解析をすれば、照合文字列
が形態素になるかどうか判定可能である。Next, FIG. 3 is an explanatory diagram of the morphological analysis method of the morphological analysis device 5 according to the second embodiment of the present invention, and the reference numeral 31 is 1
The following is an example of the resulting document, which contains the sentence ``A series of arson incidents have been resolved.'' The matching character string is ``fire.'' In this case, when the morphological analysis device 5 morphologically analyzes the same document, it considers the matching character string as a morpheme and then
Perform morphological analysis. In other words, code 32 is the collation string "
This is the result of morphological analysis when ``fire'' is regarded as a morpheme.In this case, morphological analysis is possible up to ``series'', but the morphological analysis fails at ``fire'', and the process ends there. In other words, instead of morphologically analyzing the entire sentence,
By performing morphological analysis before and after the match string, it is possible to determine whether the match string is a morpheme.

この第２の実施例によれば、照合文字列を形態素と仮定
して形態素解析を行うので、全文を形態素解析しないで
判定することができるため、処理時間が短縮される。According to this second embodiment, since morphological analysis is performed assuming that the collated character string is a morpheme, a determination can be made without morphological analysis of the entire text, thereby reducing processing time.

次に、第４図は本発明の第３の実施例による形態素解析
装置５の形態素解析の前処理を示し、符号４１は１次結
果の文書の例であり、この文書には［捜査本部は、今年
になって、一連の放火事件を解決した。］という文が入
っているものとする。Next, FIG. 4 shows the preprocessing of morphological analysis by the morphological analysis device 5 according to the third embodiment of the present invention, and reference numeral 41 is an example of a document of the primary result. , has solved a series of arson cases this year. ] is assumed to be included.

形態素解析する前に、形態素解析装置５は、照合文字列
を含みかつ句読点で囲まれた部分を抽出し、それに対し
てのみ形態素解析を行う。第４図の符号４２は前処理を
行った後の文書で「一連の放火事件を解決した。」とな
る。Before performing morphological analysis, the morphological analysis device 5 extracts a portion that includes the matching character string and is surrounded by punctuation marks, and performs morphological analysis only on that portion. Reference numeral 42 in FIG. 4 is a document after pre-processing, which reads ``A series of arson cases were solved.''

したがって、第３実施例による形態素解析装置５では、
照合文字列を含む句読点に囲まれた文についてのみ形態
素解析を行うので、全文を形態素解析しないで判定する
ことができるため、処理時間が短縮される。Therefore, in the morphological analysis device 5 according to the third embodiment,
Since morphological analysis is performed only on sentences surrounded by punctuation marks that include the matching character string, it is possible to make a determination without morphologically analyzing the entire text, thereby reducing processing time.

また、第５図は本発明の第４実施例による形態素解析装
置５の形態素解析の前処理を説明図であシ、符号５１は
１次結果の文書の例であシ、この文書には、「捜査本部
は、今年になって、一連の放火事件を解決した。」とい
う文が入っているものとする。形態素解析する前に、形
態素解析装置５は、照合文字列の前後で文字種（平板名
・かだかな・漢字・アルファベット等）が変わる部分を
さがし、その範囲だけを形態素解析する。この例でいえ
ば、語句前方に関しては「火」と「放」は同じく漢字で
あシ、「放」と「の」は漢字と平板名と文字種が異なる
ので、「放」から形態素解析を始め、語句後方に関して
は「事」と「件」は同じ漢字で、「件」と「を」は漢字
と平板名で文字種が異なるので、「件」まで形態素解析
の対象とする。符号５２はこのような前処理を行った後
の例であり、形態素解析結果は「放火事件」となる。Further, FIG. 5 is an explanatory diagram of preprocessing for morphological analysis by the morphological analysis device 5 according to the fourth embodiment of the present invention, and reference numeral 51 is an example of a document as a primary result. Assume that the text contains the following sentence: ``The investigation headquarters has solved a series of arson cases this year.'' Before performing morphological analysis, the morphological analysis device 5 searches for parts where the character type (plain name, kadakana, kanji, alphabet, etc.) changes before and after the collated character string, and performs morphological analysis only on that range. In this example, regarding the front of the word, ``fire'' and ``ho'' are both kanji characters, but ``ho'' and ``no'' have different kanji, plain name, and character type, so we start the morphological analysis from ``ho''. Regarding the back part of the word, ``Koto'' and ``Kate'' are the same kanji, and ``Koto'' and ``wo'' have different character types depending on the kanji and plain name, so ``Koto'' is included in the morphological analysis. Reference numeral 52 is an example after performing such preprocessing, and the morphological analysis result is "arson incident."

これは文字種が異なるところで形態素の区切れが多いこ
とを利用したものであるけれども、漢字と平板名は混同
する可能性があるので、助詞だけに限定する必要がある
。This takes advantage of the fact that there are many breaks in morphemes in different character types, but since there is a possibility of confusion between kanji and plain names, it is necessary to limit this to particles only.

したがって、前述した第４実施例においては、照合文字
列の前後の文字種の変わる範囲についてのみ形態素解析
を行うので、全文を形態素解析しないで判定することが
でき、処理時間が短縮されることになる。Therefore, in the fourth embodiment described above, morphological analysis is performed only on the range in which the character type before and after the collation character string changes, so that judgment can be made without morphological analysis of the entire text, and the processing time is shortened. .

発明の効果以上述べたように本発明は、文字列照合して成功した文
書に対して、形態素解析を行い、形態素の中に照合文字
列が含まれているかどうかを判定し、含まれていないも
のを削除することにより正確な検索が可能となる。また
本発明では、照合文字列を形態素と仮定してから形態素
解析を行なうので、形態素解析が失敗した場合は、照合
文字列が形態素となりえないから、その文書を削除して
形態素解析処理量を減少できる。さらに、本発明では、
照合文字列を含みかつ前後の句読点て囲まれた部分に対
してのみ、形態素解析を行うことにより、処理時間を短
縮できる。さらに、本発明によれば、照合文字列の前後
の文字種（平板名・かたかな・漢字・アルファベット等
）が異なる位置を探して、その範囲についてのみ形態素
解析を行うので、処理時間の短縮が可能となる。Effects of the Invention As described above, the present invention performs morphological analysis on a document for which character string matching has been successfully performed, determines whether or not the matching string is included in the morphemes, and determines whether or not the matching string is included. By deleting things, accurate searches become possible. In addition, in the present invention, morphological analysis is performed after assuming that the matching string is a morpheme, so if morphological analysis fails, the matching string cannot be a morpheme, so the document is deleted and the morphological analysis processing amount is reduced. Can be reduced. Furthermore, in the present invention,
Processing time can be reduced by performing morphological analysis only on the part that includes the matching character string and is surrounded by the preceding and following punctuation marks. Furthermore, according to the present invention, positions where the character types (plain name, katakana, kanji, alphabet, etc.) before and after the collation string are different are searched for, and morphological analysis is performed only for that range, reducing processing time. It becomes possible.

[Brief explanation of drawings]

第１図は本発明の第１実施例による文書検索装置の概略
構成図、第２図は同文書検索装置の処理フロー図、第３
図は本発明の第２の実施例による形態素解析装置の処理
例図、第４図は本発明の第３の実施例による形態素解析
装置の処理例図、第５図は本発明の第４の実施例による
形態素解析装置の処理例図である。 ■・・・検索条件入力装置、２・・文書データ記憶装置
、３・・・文字列照合装置、４・・・１次結果記憶装置
、５・・・形態素解析装置、６・・・２次結果記憶装置
、７・・・２次結果記憶装置、８・・検索結果表示装置
、９・・・中央制御装置、３１　１次結果例、３２　　
形態素処理例、４１．５１・　１次結果例、４２．５２
　　前処理例。FIG. 1 is a schematic configuration diagram of a document retrieval device according to a first embodiment of the present invention, FIG. 2 is a processing flow diagram of the document retrieval device, and FIG.
The figure is a processing example diagram of the morphological analysis device according to the second embodiment of the present invention, FIG. 4 is a processing example diagram of the morphological analysis device according to the third embodiment of the present invention, and FIG. It is a processing example diagram of the morphological analysis device according to the embodiment. ■...Search condition input device, 2...Document data storage device, 3...Character string matching device, 4...Primary result storage device, 5...Morphological analysis device, 6...Secondary Result storage device, 7... Secondary result storage device, 8... Search result display device, 9... Central control device, 31 Primary result example, 32
Morphological processing example, 41.51・Primary result example, 42.52
Pretreatment example.

Claims

[Claims]

(1) A search condition input device for inputting search conditions including a matching character string, a document data storage device for storing document data, a string matching device for searching the document data based on the input search conditions, and a character string matching device for searching the document data based on the input search conditions; The collation result of the column collation device is 1
a morphological analysis device that performs morphological analysis from the primary result of the primary result storage device; a dictionary storage device that stores a dictionary necessary for performing morphological analysis; It consists of a secondary result storage device that stores valid documents obtained by the morphological analysis of the morphological analysis device, and a search result display device that displays the secondary results of the secondary result storage device, and the primary result is morphologically analyzed. A document retrieval device characterized in that the secondary result storage device outputs only documents in which a matching character string is included in the analyzed morphemes.

(2) The morphological analysis device performs morphological analysis assuming that the collated character string is a morpheme, and outputs only documents for which the analysis is successful to the secondary result storage device. Document search device.

(3) The morphological analysis device does not perform morphological analysis on the entire document in the primary result storage device, but performs morphological analysis only on sentences that include the matching character string and are surrounded by punctuation marks. 2. The document retrieval device according to claim 1, wherein the document retrieval device performs the following steps.

(4) The morphological analysis device does not perform morphological analysis on the entire document in the primary result storage device, but uses character types (plain name, katakana, kanji, alphabet, etc.) before and after the collated character string. 2. The document retrieval device according to claim 1, wherein the device searches for different positions of the character strings and performs morphological analysis only on character strings within that range.