JPH1145255A

JPH1145255A - Document retrieval device and computer-readable recording medium where program making computer function as same device is recorded

Info

Publication number: JPH1145255A
Application number: JP9199615A
Authority: JP
Inventors: Atsushi Takato; 淳高藤
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1997-07-25
Filing date: 1997-07-25
Publication date: 1999-02-16

Abstract

PROBLEM TO BE SOLVED: To retrieve necessary information in real time during the editing of a document by retrieving a corresponding subdocument from a storage means according to the vector representation of an inputted retrieval condition. SOLUTION: This device is equipped with clients 100 which output retrieval conditions for retrieving desired descriptions in a document and a retrieval engine which can perform vector space retrieval and Boolean retrieval by generating a dislocation file 102 from a document data base(DB) 101 and further equipped with a retrieval server 103 which inputs the retrieval conditions from the clients 100 and retrieves the corresponding documents by using the dislocation file 102 according to the retrieval condition. Then an arbitrary range of the document displayed on a screen is specified and the structure of the character string within the range is analyzed; and a retrieval condition is generated by using the character string within the specified range and converted into vector representation. Further, the retrieval server 103 retrieves a subdocument according to the vector representation of the inputted retrieval condition.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画面表示された編
集中の文書の記述をクエリー（Ｑｕｅｒｙ）として用い
ると共に、文書を構成するセンテンスのいくつかをまと
めたサブドキュメントを検索対象とし、クエリーに直接
関連する文書中の記述を検索結果として得ることができ
るようにした文書検索装置およびその装置としてコンピ
ュータを機能させるためのプログラムを記録したコンピ
ュータ読み取り可能な記録媒体に関する。[0001] The present invention uses a description of a document being edited displayed on a screen as a query, and searches a sub-document in which some of the sentences constituting the document are collected. The present invention relates to a document search apparatus capable of obtaining a description in a directly related document as a search result, and a computer-readable recording medium storing a program for causing a computer to function as the apparatus.

【０００２】[0002]

【従来の技術】コンピュータの発達に伴って、紙を媒体
として記録され、また保存されていた文書についても、
電子化された情報として取り扱うことが一般的となり、
大量の電子化された文書がデータベースに蓄積されるこ
ととなっている。2. Description of the Related Art With the development of computers, documents recorded and stored using paper as a medium have been developed.
It has become common to handle it as electronic information,
A large number of electronic documents are to be stored in a database.

【０００３】ところで、大量の電子化された文書が蓄積
されていくに従って、大量の文書の中から所望の文書を
容易に探し出すことができる検索技術が重要となってく
る。特に、誰もがコンピュータを用いて作業を行う今日
にあっては、特殊な手法を用いることなく、コンピュー
タの初心者であっても簡単に所望の文書を検索できるよ
うな検索システムを構築することが重要なポイントとな
る。[0003] By the way, as a large number of digitized documents are accumulated, a search technology that can easily find a desired document from a large number of documents becomes important. In particular, in today's world where everyone works on a computer, it is possible to construct a search system that allows even a novice computer user to easily search for a desired document without using a special technique. This is an important point.

【０００４】検索の容易化を図るための１つの手法とし
て、人間が通常用いる自然言語を用いてクエリーを入力
することができるようにする技術を挙げることができ
る。この技術は、近年の自然言語処理技術の発達に伴っ
て、多くの検索システムに取り入れられつつある。[0004] One technique for facilitating retrieval includes a technique that enables a human to input a query using a natural language that is usually used. This technology has been adopted in many search systems with the development of natural language processing technology in recent years.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来の
検索システムによれば、自然言語でクエリーを入力でき
るようにして入力作業の容易化を図ることができるが、
ワードプロセッサ等で編集を行っている文書に関連する
情報を検索して参照したいような場合にあっては、検索
ソフトを別に起動させた後、クエリーを入力して検索処
理を行う必要があるため、操作が煩わしく、利便性に欠
けるという問題点があった。However, according to the conventional search system, it is possible to input a query in a natural language, thereby facilitating the input operation.
If you want to search and refer to information related to the document you are editing with a word processor, etc., you need to start the search software separately, enter a query, and perform the search process. There is a problem that the operation is troublesome and lacks convenience.

【０００６】また、ベクトル空間法を用いて検索を行う
場合においては、検索対象となる文書とクエリーとをベ
クター表現に変換し、文書とクエリーとの類似度を求
め、所定の閾値を超える文書を検索結果として出力する
処理が行われる。ここで、検索対象の文書に複数のトピ
ックが含まれているような場合には、文書をベクター表
現に変換することによって各トピックが平均化されるこ
とになるため、たとえクエリーとの関連度の高いトピッ
クが文書中に含まれている場合であっても、文書全体と
クエリーの類似度は小さなものとなることがあり、クエ
リーとの関連性が低い文書として扱われる虞があるとい
う問題点があった。In the case of performing a search using the vector space method, a document to be searched and a query are converted into a vector expression, a similarity between the document and the query is obtained, and a document exceeding a predetermined threshold is searched. A process of outputting as a search result is performed. Here, if the document to be searched includes multiple topics, each topic is averaged by converting the document to a vector expression. Even when a high topic is included in the document, the similarity between the entire document and the query may be small, and the document may be treated as a document with low relevance to the query. there were.

【０００７】さらに、上記従来の検索システムは、入力
されたクエリーに該当する文書を検索するものであるた
め、クエリーに直接該当する記述を含むセンテンスやパ
ラグラフが欲しい場合に不便であるという問題点があっ
た。換言すれば、検索結果として得ることができるのは
文書そのものあるため、ある事項に関する記述を引用し
たいような場合においては、検索によって得られた文書
から必要な記述を探し出す作業が必要であった。このこ
とは、従来の検索システムにおいては、文書全体が検索
対象であり、センテンスやパラグラフ等の部分的な範囲
を検索対象とすることが不可能であったことによるもの
である。[0007] Further, since the above-mentioned conventional search system searches for a document corresponding to an input query, it is inconvenient when a sentence or paragraph including a description directly corresponding to the query is desired. there were. In other words, since the document itself can be obtained as a search result, it is necessary to search for a necessary description from the document obtained by the search when it is necessary to cite a description of a certain matter. This is because, in the conventional search system, the entire document is a search target, and it is impossible to search a partial range such as a sentence or a paragraph.

【０００８】本発明は上記に鑑みてなされたものであっ
て、文書の編集中にリアルタイムで必要な情報を検索す
ることができるようにして、検索処理の利便性の向上を
図ることを目的とする。The present invention has been made in view of the above, and has as its object to improve the convenience of search processing by enabling necessary information to be searched in real time while editing a document. I do.

【０００９】また、本発明は上記に鑑みてなされたもの
であって、ベクトル空間法を用いて検索を行う場合に、
複数のトピックを含む文書を確実に検索することができ
るようにすることを目的とする。Further, the present invention has been made in view of the above, and when a search is performed using a vector space method,
An object of the present invention is to ensure that a document including a plurality of topics can be searched.

【００１０】さらに、本発明は上記に鑑みてなされたも
のであって、文書を構成するセンテンスのいくつかをま
とめたサブドキュメントを検索対象とし、検索条件に直
接関連する文書中の記述を検索結果として得ることがで
きるようにして、引用文や注釈文の作成を支援すること
ができるようにすることを目的とする。Further, the present invention has been made in view of the above, and a sub-document in which some of sentences constituting a document are collected is to be searched, and a description in a document directly related to a search condition is searched. The purpose of the present invention is to make it possible to support creation of a quotation or a comment.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、請求項１の文書検索装置は、ベクトル空間法を用い
て検索処理を行う文書検索装置において、文書を構成す
る各センテンスを任意の数のセンテンスまたはパラグラ
フ毎のセンテンス等からなるグループに区分して、区分
したグループをサブドキュメントと定義し、予め用意し
た文書を前記サブドキュメントの単位でベクター表現に
変換して格納する格納手段と、画面表示されている文書
の任意の範囲を指定する範囲指定手段と、前記範囲指定
手段で指定された範囲の文字列の構造を解析する解析手
段と、前記解析手段による解析結果に基づいて、前記指
定された範囲の文字列を用いて検索条件を生成する検索
条件生成手段と、前記検索条件生成手段で生成した検索
条件をベクター表現に変換する変換手段と、前記変換手
段から入力した前記検索条件のベクター表現に基づい
て、前記格納手段から該当する前記サブドキュメントを
検索する検索手段と、を備えたものである。According to a first aspect of the present invention, there is provided a document search apparatus for performing a search process using a vector space method, wherein each sentence constituting a document is an arbitrary number. Storage means for dividing a sentence or a group of sentences for each paragraph, defining the divided group as a sub-document, converting a prepared document into a vector expression in units of the sub-document, and storing the same; Range designating means for designating an arbitrary range of the displayed document; analyzing means for analyzing the structure of the character string in the range designated by the range designating means; and Search condition generating means for generating a search condition using a character string in a specified range, and a vector expression for the search condition generated by the search condition generating means Conversion means for converting, on the basis of the vector representation of the search condition input from said converting means, a search means for searching said sub-document corresponding from said storage means, those provided with.

【００１２】また、請求項２の文書検索装置は、請求項
１に記載の文書検索装置において、前記検索手段が、前
記検索条件のベクター表現と前記サブドキュメントのベ
クター表現との類似度を求め、求めた類似度が所定の閾
値を超えるサブドキュメントを選択し、選択したサブド
キュメントを類似度に応じて配列したリストを検索結果
として出力するものである。According to a second aspect of the present invention, in the document search apparatus according to the first aspect, the search means obtains a similarity between a vector expression of the search condition and a vector expression of the sub-document, A sub-document whose obtained similarity exceeds a predetermined threshold is selected, and a list in which the selected sub-documents are arranged according to the similarity is output as a search result.

【００１３】また、請求項３の文書検索装置は、請求項
１または２に記載の文書検索装置において、さらに、前
記文書を構成するセンテンスの区分方法を指定するため
の指定手段と、前記指定手段を介して指定された前記セ
ンテンスの区分方法に基づいて、前記文書から前記サブ
ドキュメントを生成する生成手段と、を備え、前記格納
手段が、前記生成手段で生成したサブドキュメントをベ
クター表現に変換して格納するものである。According to a third aspect of the present invention, there is provided the document search apparatus according to the first or second aspect, further comprising: designation means for designating a method of classifying sentences constituting the document; and the designation means. Generating means for generating the sub-document from the document based on the sentence segmentation method specified via the storage means, wherein the storing means converts the sub-document generated by the generating means into a vector expression. Is stored.

【００１４】また、請求項４の文書検索装置は、請求項
１〜３のいずれか１つに記載の文書検索装置において、
さらに、前記検索手段による検索結果に基づいて、所望
のサブドキュメントを選択する選択手段と、前記画面表
示されている文書の任意の場所を指定する指定手段と、
前記指定手段で指定した場所に前記選択手段で選択した
サブドキュメントを挿入する挿入手段と、を備えたもの
である。According to a fourth aspect of the present invention, there is provided a document search apparatus according to any one of the first to third aspects.
A selecting unit for selecting a desired sub-document based on a search result by the searching unit; a specifying unit for specifying an arbitrary place of the document displayed on the screen;
Insertion means for inserting the sub-document selected by the selection means at the location specified by the specification means.

【００１５】さらに、請求項５のコンピュータ読み取り
可能な記録媒体は、前記請求項１〜４のいずれか１つに
記載の文書検索装置の各手段としてコンピュータを機能
させるためのプログラムを記録したものである。According to a fifth aspect of the present invention, there is provided a computer-readable recording medium on which a program for causing a computer to function as each means of the document search apparatus according to any one of the first to fourth aspects is recorded. is there.

【００１６】[0016]

【発明の実施の形態】以下、本発明の文書検索装置およ
びその装置としてコンピュータを機能させるためのプロ
グラムを記録したコンピュータ読み取り可能な記録媒体
の実施の形態について、添付の図面を参照しつつ詳細に
説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a document search apparatus according to the present invention and a computer-readable recording medium storing a program for causing a computer to function as the apparatus. explain.

【００１７】〔実施の形態１〕図１は、実施の形態１の
文書検索装置のシステム構成図である。図１に示す文書
検索装置は、文書中の所望の記述を検索するための検索
条件を出力する複数のクライアント１００と、文書ＤＢ
（データベース）１０１から転置ファイル１０２を生成
すると共に、ベクトル空間検索を行うことができる検索
エンジン（例えば、ＣＬＡＲＩＴＥＣＨ社のＣＬＡＲＩ
Ｔ等）を備え、クライアント１００から検索条件を入力
し、入力した検索条件に基づいて、転置ファイル１０２
を用いて該当する文書中の記述を検索する検索サーバ１
０３と、上記クライアント１００や検索サーバ１０３等
を接続するネットワーク１０４と、から構成されてい
る。[First Embodiment] FIG. 1 is a system configuration diagram of a document search apparatus according to a first embodiment. The document search device shown in FIG. 1 includes a plurality of clients 100 that output search conditions for searching for a desired description in a document, a document DB
(Database) A search engine capable of generating the transposed file 102 from the database 101 and performing a vector space search (for example, CLARITECH's CLARI)
T), the search condition is input from the client 100, and the transposed file 102 is
Search server 1 that searches for a description in a corresponding document by using
And a network 104 for connecting the client 100, the search server 103, and the like.

【００１８】図１において、文書ＤＢ１０１は、クライ
アント１００等で作成された複数の文書を格納したもの
であり、格納される文書は、ワープロ文書や、ＳＧＭ
Ｌ，ＨＴＭＬ等の構造化文書等、いかなる種類の文書で
あっても良い。実施の形態１においては、この文書ＤＢ
１０１に格納された文書の一部分を検索対象とするが、
検索対象を文書ＤＢ１０１中のものに限定するものでは
ない。In FIG. 1, a document DB 101 stores a plurality of documents created by the client 100 or the like. The stored documents include word processing documents and SGM.
Any type of document, such as a structured document such as L or HTML, may be used. In the first embodiment, the document DB
Although a part of the document stored in 101 is to be searched,
The search target is not limited to those in the document DB 101.

【００１９】転置ファイル１０２は、一般に、文書ＤＢ
１０１中の複数の文書と、これら複数の文書それぞれか
ら抽出した複数の索引語との関係を規定し、ある索引語
が各文書それぞれにおいてどの程度重要な語彙であるか
を示したものであって、この索引語を用いて該当する文
書を検索することができるようにしたものである。実施
の形態１の文書検索装置においては、この転置ファイル
１０２を、文書を検索対象とするのではなく、文書の一
部分（後述するサブドキュメント）を検索対象とするた
めに用いる。The transposed file 102 generally has a document DB
101 defines the relationship between a plurality of documents in 101 and a plurality of index words extracted from each of the plurality of documents, and indicates how important a certain index word is in each document. In addition, a corresponding document can be searched using the index word. In the document search apparatus according to the first embodiment, the transposed file 102 is used not for searching a document but for searching a part of a document (subdocument described later).

【００２０】具体的には、１つの文書をサブドキュメン
トと呼ばれる任意の数のセンテンスからなる単位に区切
り、このサブドキュメントから上記索引語となる名詞句
を抽出する。そして、抽出した名詞句それぞれについ
て、サブドキュメント中の出現頻度，文書ＤＢ１０１全
体における分布等の統計情報を求め、求めた名詞句毎の
統計情報を用いて各サブドキュメントをベクター表現に
変換する。この処理を文書中の全てのサブドキュメント
について行い、転置ファイル１０２に格納する。More specifically, one document is divided into units of an arbitrary number of sentences called sub-documents, and the noun phrase serving as the index word is extracted from the sub-documents. Then, for each of the extracted noun phrases, statistical information such as the appearance frequency in the sub-document and the distribution in the entire document DB 101 is obtained, and each sub-document is converted into a vector expression using the obtained statistical information for each noun phrase. This process is performed for all sub-documents in the document, and stored in the transposition file 102.

【００２１】なお、各索引語には、対応する文書中の重
要度に応じた重み付けを行うことができる。また、サブ
ドキュメントだけでなく、文書を検索対象とする検索を
行う場合には、サブドキュメントのベクター表現に基づ
いて、文書のベクター表現を予め生成しておくか、検索
を行う際に生成することにすれば良い。Each index word can be weighted according to the degree of importance in the corresponding document. In addition, when performing a search for not only a sub-document but also a document, a vector expression of the document may be generated in advance based on the vector expression of the sub-document, or generated at the time of the search. You can do it.

【００２２】クライアント１００および検索サーバ１０
３は、パーソナルコンピュータやワークステーション等
によって構成される。図２は、クライアント１００の処
理を示す概略ブロック図である。クライアント１００
は、ワードプロセッサ，表計算ソフト等のアプリケーシ
ョンプログラムで編集作業中の文書（画面表示されてい
る文書）において、任意の範囲を指定し、指定した範囲
の文字列をクエリー２００として検索の開始を指定する
マウス等の入力装置２０７と、クエリー２００を入力
し、入力したクエリー２００について、品詞情報等を格
納した辞書２０２および各単語の係り受け等を解析する
ための文法辞書２０３を用いて形態素解析，構文解析等
の解析処理を行い、上記転置ファイル１０２の索引語に
対応する名詞句からなる検索条件２０６を生成し、生成
した検索条件２０６を検索サーバ１０３に出力する自然
言語処理モジュール２０１を備えている。Client 100 and search server 10
Reference numeral 3 includes a personal computer, a workstation, and the like. FIG. 2 is a schematic block diagram showing the processing of the client 100. Client 100
Specifies an arbitrary range in a document being edited by an application program such as a word processor or spreadsheet software (document displayed on the screen), and specifies a character string in the specified range as a query 200 to start a search. The input device 207 such as a mouse, the query 200 is input, and the input query 200 is subjected to morphological analysis and syntax using a dictionary 202 storing part-of-speech information and a grammar dictionary 203 for analyzing the dependency of each word. It includes a natural language processing module 201 that performs an analysis process such as analysis, generates a search condition 206 including a noun phrase corresponding to the index word of the transposed file 102, and outputs the generated search condition 206 to the search server 103. .

【００２３】また、図３は、検索サーバ１０３の処理を
示す概略ブロック図である。検索サーバ１０３は、文書
ＤＢ１０１中の文書を転置ファイル１０２に登録する処
理と、ベクトル空間法を用いた検索処理とを行うもので
ある。FIG. 3 is a schematic block diagram showing the processing of the search server 103. The search server 103 performs a process of registering a document in the document DB 101 in the transposed file 102 and a search process using a vector space method.

【００２４】この検索サーバ１０３は、文書ＤＢ１０１
から文書を入力し、文書のフォーマットの認識処理や、
品詞情報等を格納した辞書３０１および各単語の係り受
け等を解析するための文法辞書３０２を用いて形態素解
析，構文解析，名詞句抽出等の解析処理を行い、上述し
たサブドキュメント毎の名詞句リストを含むドキュメン
ト・セット３０３を生成する自然言語処理モジュール３
００と、自然言語処理モジュール３００で生成したドキ
ュメント・セット３０３を入力し、入力したドキュメン
ト・セット３０３中の各サブドキュメントをベクター表
現に変換して転置ファイル１０２に登録するデータベー
ス・ビルド・コンポーネント３０４と、クライアント１
００から検索条件２０６を入力し、入力した検索条件２
０６を構成する各名詞句について、クエリー２００中の
出現頻度，文書ＤＢ１０１全体における分布等の統計情
報を求め、求めた名詞句毎の統計情報を用いて検索条件
２０６をベクター表現に変換したクエリー・ドキュメン
ト３０６を生成するクエリー・ビルド・コンポーネント
３０５と、クエリー・ビルド・コンポーネント３０５で
生成したクエリー・ドキュメント３０６を入力し、転置
ファイル１０２中の各サブドキュメントのベクトルとク
エリー・ドキュメント３０６のベクトルとを比較して、
クエリー・ドキュメント３０６との類似度に応じたスコ
アを各サブドキュメントに付与し、所定の閾値を超える
スコアが付与されたサブドキュメントを選択してサブド
キュメントリスト３０８を生成し、生成したサブドキュ
メントリスト３０８を検索結果として出力する検索エン
ジン３０７と、を備えている。The search server 103 includes a document DB 101
Input the document from, and recognize the format of the document,
Analysis processing such as morphological analysis, syntax analysis, and noun phrase extraction is performed using a dictionary 301 storing part-of-speech information and a grammar dictionary 302 for analyzing the dependency of each word, and the like. Natural language processing module 3 for generating a document set 303 including a list
And a database build component 304 that inputs the document set 303 generated by the natural language processing module 300, converts each subdocument in the input document set 303 into a vector expression, and registers the vector expression in the transposed file 102. , Client 1
Input search condition 206 from 00, and input search condition 2
For each of the noun phrases constituting No. 06, statistical information such as the frequency of appearance in the query 200 and the distribution in the entire document DB 101 is calculated, and the search condition 206 is converted into a vector expression using the obtained statistical information for each noun phrase. A query build component 305 that generates a document 306 and a query document 306 generated by the query build component 305 are input, and the vector of each subdocument in the transposed file 102 is compared with the vector of the query document 306. do it,
A score corresponding to the degree of similarity with the query document 306 is given to each sub-document, a sub-document given a score exceeding a predetermined threshold is selected to generate a sub-document list 308, and the generated sub-document list 308 And a search engine 307 that outputs as a search result.

【００２５】なお、図１においては、文書ＤＢ１０１お
よび転置ファイル１０２をネットワーク１０４に単独に
接続した構成を示したが、これらを検索サーバ１０３に
直接接続する構成としても良い。また、図１において
は、実施の形態１の文書検索装置をネットワーク１０４
を介したシステムで構成するように示したが、図２に示
したクライアント１００の処理を図３に示した検索サー
バ１０３の自然言語処理モジュール３００に行わせるよ
うに構成することにより、１台のコンピュータで実施の
形態１の文書検索装置を構成することができる。Although FIG. 1 shows a configuration in which the document DB 101 and the transposed file 102 are independently connected to the network 104, these may be directly connected to the search server 103. In FIG. 1, the document search device according to the first embodiment is connected to a network 104.
However, the configuration is such that the processing of the client 100 shown in FIG. 2 is performed by the natural language processing module 300 of the search server 103 shown in FIG. The document search device according to the first embodiment can be configured by a computer.

【００２６】次に、上述した構成を備えた文書検索装置
の動作について、（１）転置ファイルの生成処理，
（２）サブドキュメントの検索処理の順で詳細に説明す
る。Next, the operation of the document search apparatus having the above-described configuration will be described with respect to (1) a process of generating an inverted file,
(2) Sub-document search processing will be described in detail.

【００２７】（１）転置ファイルの生成処理図４は、転置ファイルの生成処理を示すフローチャート
である。検索サーバ１０３は、クライアント１００等に
よって作成された新たな文書が文書ＤＢ１０１に登録さ
れた場合（Ｓ４０１）、この文書を入力して転置ファイ
ル１０２に登録するための処理を開始する（Ｓ４０
２）。(1) Transposition File Generation Process FIG. 4 is a flowchart showing a transposition file generation process. When a new document created by the client 100 or the like is registered in the document DB 101 (S401), the search server 103 starts processing for inputting this document and registering it in the transposition file 102 (S40).
2).

【００２８】検索サーバ１０３において、自然言語処理
モジュール３００は、ステップＳ４０２で入力した文書
を解析する処理を行う（Ｓ４０３）。具体的には、入力
した文書がワープロ文書，ＨＴＭＬ等の構造化文書等、
いかなるフォーマットの文書であるかを判定する処理を
行う。その後、辞書３０１および文法辞書３０２を用い
て形態素解析，係り受け等の構文解析を行い、文書を複
数のサブドキュメントに区分すると共に、区分したサブ
ドキュメントから名詞句を抽出する等の処理を行う。In the search server 103, the natural language processing module 300 performs a process of analyzing the document input in step S402 (S403). Specifically, the input document is a word processing document, a structured document such as HTML, etc.
A process is performed to determine the format of the document. Thereafter, syntax analysis such as morphological analysis and dependency is performed by using the dictionary 301 and the grammar dictionary 302 to divide the document into a plurality of sub-documents, and to perform processing such as extracting a noun phrase from the divided sub-documents.

【００２９】なお、ステップＳ４０３で区分されたサブ
ドキュメントは、任意の数のセンテンスで構成される。
例えば、予め複数のセンテンスをサブドキュメントとし
ても良いし、パラグラフ単位のセンテンスをサブドキュ
メントとしても良い。実施の形態１においては、このサ
ブドキュメントが検索対象となることから、サブドキュ
メントをどのように構成するか、即ちセンテンスをどの
ように区切るかはユーザの好みにより設定・変更するこ
とができる。The sub-documents classified in step S403 are composed of an arbitrary number of sentences.
For example, a plurality of sentences may be set as sub-documents in advance, or sentences in paragraph units may be set as sub-documents. In the first embodiment, since this sub-document is to be searched, how to compose the sub-document, that is, how to divide sentences, can be set / changed according to the user's preference.

【００３０】そして、自然言語処理モジュール３００
は、ステップＳ４０３における処理の結果に基づいて、
サブドキュメント毎に名詞句リストを生成し、生成した
名詞句リストを含むドキュメント・セット３０３を生成
する（Ｓ４０４）。Then, the natural language processing module 300
Is based on the result of the processing in step S403.
A noun phrase list is generated for each sub-document, and a document set 303 including the generated noun phrase list is generated (S404).

【００３１】その後、データベース・ビルド・コンポー
ネント３０４は、自然言語処理モジュール３００で生成
したドキュメント・セット３０３を入力し、転置ファイ
ル１０２に登録する処理を行う（Ｓ４０５）。After that, the database build component 304 performs a process of inputting the document set 303 generated by the natural language processing module 300 and registering it in the transposed file 102 (S405).

【００３２】具体的には、ドキュメント・セット３０３
中のサブドキュメントの各名詞句を転置ファイル１０２
の索引語として、サブドキュメント中の出現頻度，文書
ＤＢ１０１全体における分布等の統計情報を求め、求め
た名詞句毎の統計情報を用いてサブドキュメントをベク
ター表現に変換する。この処理をドキュメント・セット
３０３中の全てのサブドキュメントについて行って、転
置ファイル１０２に登録する。Specifically, the document set 303
Transpose file 102 for each noun phrase of subdocument in
The statistical information such as the frequency of occurrence in the sub-document and the distribution in the entire document DB 101 is obtained as an index term, and the sub-document is converted into a vector expression using the obtained statistical information for each noun phrase. This process is performed for all sub-documents in the document set 303 and registered in the transposed file 102.

【００３３】（２）サブドキュメントの検索処理続いて、サブドキュメントの検索処理について説明す
る。図５は、サブドキュメントの検索処理を示すフロー
チャートである。ユーザは、現在ワードプロセッサ等で
編集中の文書において、クライアント１００の入力装置
２０７を用いて任意の範囲を指定し（Ｓ５０１）、指定
した範囲の文字列をクエリー２００とする検索処理の実
行を指定する（Ｓ５０２）。(2) Sub-document Search Processing Next, sub-document search processing will be described. FIG. 5 is a flowchart showing a sub-document search process. The user specifies an arbitrary range using the input device 207 of the client 100 in a document currently being edited by a word processor or the like (S501), and specifies execution of a search process using the character string in the specified range as the query 200. (S502).

【００３４】図６（ａ）は、ステップＳ５０１におい
て、クエリー２００とする範囲の指定を行った様子を、
図６（ｂ）は、検索処理の実行を指定する様子をそれぞ
れ示す説明図である。図６（ａ）に示すように、ユーザ
は、入力装置２０７としてのマウス等を用いて、文書６
００中の任意の範囲６０１を指定する。クエリー２００
となる範囲６０１は、図６に示すように段落を単位とし
たもの以外に、全文書，ページ全体，エンベット，オブ
ジェクト，編集中の文書に関連づけられた文書，ＵＲＬ
等、いかなるものであっても良い。また、範囲６０１
は、１つだけではなく、複数の範囲を指定することがで
きる。複数の範囲が指定された場合には、指定された複
数の範囲を１つのクエリー２００として扱うことにな
る。さらに、クエリー２００となる範囲６０１を指定す
る際に、ユーザは、検索結果に反映して欲しい重要な事
項が記述された範囲６０１に正の重み付けを行うための
指定を行うことができ、また、検索結果に反映して欲し
くない不必要な事項が記述された範囲６０１に負の重み
付けを行うための指定を行うことができる。FIG. 6A shows a state in which the range of the query 200 is specified in step S501.
FIG. 6B is an explanatory diagram illustrating a state in which execution of a search process is specified. As shown in FIG. 6A, the user uses the mouse or the like as the input device 207 to
An arbitrary range 601 in 00 is specified. Query 200
The range 601 is, as shown in FIG. 6, a document associated with all documents, an entire page, an embed, an object, a document being edited, a URL, in addition to a unit of paragraph.
And so on. Also, the range 601
Can specify more than one range, not just one. If a plurality of ranges are specified, the specified ranges are treated as one query 200. Further, when specifying the range 601 serving as the query 200, the user can specify a positive weight for the range 601 in which important items desired to be reflected in the search results are described. A range 601 in which unnecessary items that are not desired to be reflected in the search result are described can be designated for performing negative weighting.

【００３５】そして、ユーザは、図６（ｂ）に示すよう
に、入力装置２０７であるマウスの右ボタンをクリック
してメニュー６０２を開き、「検索」を選択することに
よって検索処理の実行を指定する。ここでは、図６
（ｂ）に示すようにメニュー６０２を開いて検索処理の
実行を指定することにしたが、検索処理の実行を指定す
るためのアイコンを用意しておくことにしても良い。Then, as shown in FIG. 6B, the user clicks the right button of the mouse as the input device 207 to open the menu 602 and selects "search" to specify the execution of the search process. I do. Here, FIG.
As shown in (b), the menu 602 is opened to specify the execution of the search processing. However, an icon for specifying the execution of the search processing may be prepared.

【００３６】図５に戻り、ステップＳ５０２で検索処理
の実行が指定されると、自然言語処理モジュール２０１
は、指定された範囲６０１の文字列をクエリー２００と
して入力し、入力したクエリー２００の解析処理を行う
（Ｓ５０３）。具体的には、辞書２０２および文法辞書
２０３を用いて形態素解析，係り受け等の構文解析処理
を行う。Returning to FIG. 5, when execution of the search processing is designated in step S502, the natural language processing module 201
Inputs the character string in the designated range 601 as the query 200, and analyzes the input query 200 (S503). Specifically, syntax analysis processing such as morphological analysis and dependency is performed using the dictionary 202 and the grammar dictionary 203.

【００３７】自然言語処理モジュール２０１は、ステッ
プＳ５０３における解析結果に基づいて、クエリー２０
０から検索条件を生成する（Ｓ５０４）。すなわち、自
然言語モジュール２０１は、指定された範囲６０１の文
字列から、転置ファイル１０２中の索引語に対応する名
詞句を抽出し、名詞句からなる検索条件２０６を生成す
る。The natural language processing module 201 executes the query 20 based on the analysis result in step S503.
A search condition is generated from 0 (S504). That is, the natural language module 201 extracts a noun phrase corresponding to the index word in the transposed file 102 from the character string in the designated range 601 and generates a search condition 206 including the noun phrase.

【００３８】上述したようにしてクエリー２００から検
索条件２０６が生成され、生成された検索条件２０６
は、クライアント１００から検索サーバ１０３に出力さ
れる。また、上述したように、選択した範囲６０１に重
みの指定がなされた場合には、その重み指定情報につい
ても、検索サーバ１０３に出力される。As described above, the search condition 206 is generated from the query 200, and the generated search condition 206
Is output from the client 100 to the search server 103. Further, as described above, when a weight is specified for the selected range 601, the weight specification information is also output to the search server 103.

【００３９】その後、検索サーバ１０３のクエリー・ビ
ルド・コンポーネント３０５は、クライアント１００か
ら検索条件２０６を入力し、入力した検索条件２０６を
構成する各名詞句について、クエリー２００中の出現頻
度，文書ＤＢ１０１全体における分布等の統計情報を求
め、求めた統計情報を用いて検索条件２０６をベクター
表現に変換したクエリー・ドキュメント３０６を生成す
る（Ｓ５０５）。なお、重み付けの指定がなされている
場合には、該当する名詞句に正の重みまたは負の重みを
付加して上記クエリー・ドキュメント３０６を生成す
る。Thereafter, the query build component 305 of the search server 103 inputs the search condition 206 from the client 100, and for each noun phrase constituting the input search condition 206, the appearance frequency in the query 200, the entire document DB 101 Is obtained, and a query document 306 is generated by converting the search condition 206 into a vector expression using the obtained statistical information (S505). When the weighting is specified, the query document 306 is generated by adding a positive weight or a negative weight to the corresponding noun phrase.

【００４０】検索エンジン３０７は、クエリー・ビルド
・コンポーネント３０５で生成したクエリー・ドキュメ
ント３０６を入力し、転置ファイル１０２中のサブドキ
ュメントのベクトルとクエリー・ドキュメント３０６の
ベクトルとを比較して、クエリー・ドキュメント３０６
との類似度に応じたスコアを各サブドキュメントに付与
し、所定の閾値を超えるスコアが付与されたサブドキュ
メントを選択して、サブドキュメントリスト３０８を生
成する（Ｓ５０６）。The search engine 307 inputs the query document 306 generated by the query build component 305, compares the vector of the sub-document in the transposed file 102 with the vector of the query document 306, and 306
The sub-document list 308 is generated by assigning a score according to the degree of similarity to each sub-document, selecting a sub-document to which a score exceeding a predetermined threshold is given (S506).

【００４１】なお、類似度に応じたスコアは、各サブド
キュメントとクエリー・ドキュメント３０６との類似度
を余弦距離に基づいて表現したものであり、スコアが大
きいサブドキュメントがよりクエリー・ドキュメント３
０６と類似していることを表している。そして、検索エ
ンジン３０７には、予めスコアの閾値が設定されてお
り、この閾値を超えるスコアが付与されたサブドキュメ
ントが検索結果とされる。The score according to the degree of similarity expresses the degree of similarity between each sub-document and the query document 306 based on the cosine distance.
It is similar to 06. Then, a score threshold is set in the search engine 307 in advance, and a sub-document given a score exceeding this threshold is set as a search result.

【００４２】その後、検索エンジン３０７は、生成した
サブドキュメントリスト３０８を検索結果としてクライ
アント１００に出力し、このサブドキュメントリスト３
０８はクライアント１００に画面表示される（Ｓ５０
７）。After that, the search engine 307 outputs the generated sub-document list 308 to the client 100 as a search result, and the sub-document list 3
08 is displayed on the screen of the client 100 (S50).
7).

【００４３】図７は、サブドキュメントリスト３０８の
一例を示す説明図である。サブドキュメントリスト３０
８には、クエリー・ドキュメント３０６との類似度に応
じたスコアのランキング，文書中のいずれのサブドキュ
メントかを特定するためのサブドキュメントＩＤおよび
サブドキュメントを含む文書の文書名が表示されてい
る。FIG. 7 is an explanatory diagram showing an example of the sub-document list 308. Sub-document list 30
8 shows a ranking of scores according to the degree of similarity to the query document 306, a subdocument ID for specifying which subdocument in the document, and a document name of the document including the subdocument.

【００４４】クライアント１００のユーザは、画面表示
されたサブドキュメントリスト３０８から任意のサブド
キュメントを選択することにより、文書ＤＢ１０１中の
文書のサブドキュメントを画面表示させることができ
る。The user of the client 100 can display a sub-document of a document in the document DB 101 on the screen by selecting an arbitrary sub-document from the sub-document list 308 displayed on the screen.

【００４５】なお、実施の形態１の文書検索装置は、予
め１つの文書を選択し（文書を検索対象として検索した
ものでも良い）、選択した文書のサブドキュメントを検
索して、文書中の所望の記述を探し出すためにも利用す
ることができる。The document search apparatus according to the first embodiment selects one document in advance (the document may be searched as a search target), searches a sub-document of the selected document, and searches for a desired document in the document. Can also be used to find the description.

【００４６】また、図７に示したサブドキュメントリス
ト３０８を利用して、文書のランキングを生成すること
もできる。その結果、複数のトピックを含む文書が複数
ある場合であっても、特定のトピックに関する文書のラ
ンキングを作成することができ、特定の記述を含む文書
を容易に得ることができる。The document ranking can be generated by using the sub-document list 308 shown in FIG. As a result, even when there are a plurality of documents including a plurality of topics, it is possible to create a ranking of documents related to a specific topic, and to easily obtain a document including a specific description.

【００４７】さらに、上述した実施の形態１の文書検索
装置で得た検索結果であるサブドキュメントリスト３０
８において、ユーザが検索結果としてふさわしいと思う
サブドキュメントやふさわしくないと思うサブドキュメ
ントについては、その結果を検索サーバ１０３にフィー
ドバックすることができる。すなわち、ユーザは、検索
結果としてふさわしいと思うサブドキュメントに対し
て、正の重み、例えば「＋」を指定することができ、検
索結果としてふさわしくないと思うサブドキュメントに
対して負の重み、例えば「−」を指定することができ
る。その結果、入力した重みが正の指定である場合に
は、転置ファイル１０２中の該当するサブドキュメント
の重みが強化され、入力した重みが負の指定である場合
には、サブドキュメントの重みが弱められる。Further, the sub-document list 30 which is a search result obtained by the document search apparatus according to the first embodiment described above.
In step 8, the results of the sub-documents that the user considers appropriate or unsuitable as the search results can be fed back to the search server 103. That is, the user can specify a positive weight, for example, “+” for a sub-document that is deemed appropriate as a search result, and a negative weight, for example, “A” for a sub-document deemed inappropriate as a search result. -"Can be specified. As a result, if the input weight is a positive specification, the weight of the corresponding sub-document in the transposed file 102 is strengthened, and if the input weight is a negative specification, the weight of the sub-document is weakened. Can be

【００４８】このように、実施の形態１の文書検索装置
によれば、検索対象を文書全体ではなく、文書のサブド
キュメントとし、検索条件に直接関連する文書中の記述
を検索結果として得ることができるようにしたため、文
書の編集中にリアルタイムで必要な情報を検索すること
ができ、検索処理の利便性の向上を図ることができる。
したがって、文書中の記載に関する引用文や注釈文等を
検索によって得ることができる。また、文書中の必要な
箇所を探す必要がないため、作業効率の向上を図ること
ができる。さらに、ベクトル空間法を用いて検索を行う
場合に、検索条件との関連性の大きなトピックだけでな
く、複数の他のトピックを含む文書をも確実に検索する
ことができる。As described above, according to the document search apparatus of the first embodiment, the search target is not the entire document but the sub-document of the document, and the description in the document directly related to the search condition can be obtained as the search result. Since it is possible to search for necessary information in real time while editing a document, it is possible to improve the convenience of search processing.
Therefore, quotes, annotations, and the like related to descriptions in the document can be obtained by searching. Further, since it is not necessary to search for a necessary part in the document, it is possible to improve work efficiency. Further, when performing a search using the vector space method, it is possible to reliably search not only a topic having a high relevance to a search condition but also a document including a plurality of other topics.

【００４９】〔実施の形態２〕次に、実施の形態２の文
書検索装置について説明する。実施の形態２の文書検索
装置は、実施の形態１の文書検索装置を利用して、引用
文や注釈文の作成を支援するためのものである。なお、
文書検索装置の構成や転置ファイル１０２の生成処理に
ついては、上述した実施の形態１のものと同様であるた
め、ここではそれらの説明については省略する。[Second Embodiment] Next, a document search apparatus according to a second embodiment will be described. The document search device according to the second embodiment uses the document search device according to the first embodiment to support creation of a quote or an annotation. In addition,
The configuration of the document search device and the process of generating the transposed file 102 are the same as those in the first embodiment, and thus description thereof is omitted here.

【００５０】図８は、サブドキュメントの検索を利用し
た引用文や注釈文の作成処理を示すフローチャートであ
り、図５と同様のステップについては同一のステップ番
号を付して詳細な説明を省略する。FIG. 8 is a flowchart showing a process of creating a quote or an annotation by using a search of a sub-document. Steps similar to those in FIG. 5 are denoted by the same step numbers and detailed description is omitted. .

【００５１】ユーザは、現在ワードプロセッサ等で編集
中の文書６００において、クライアント１００の入力装
置２０７を用い、引用文や注釈文が必要な任意の範囲６
０１を指定する（Ｓ８０１：図６（ａ）参照）。指定す
る範囲６０１については、単語単位，センテンス単位，
パラグラフ単位等、ユーザの好みに応じて指定すること
ができる。The user uses the input device 207 of the client 100 to select an arbitrary range 6 in which a quote or an annotation is required in the document 600 currently being edited by a word processor or the like.
01 (S801: see FIG. 6A). For the specified range 601, word units, sentence units,
It can be specified according to the user's preference, such as paragraph units.

【００５２】そして、指定した範囲６０１をクエリー２
００とする検索処理の実行が指定されると（Ｓ５０
２）、クエリー２００の解析処理（Ｓ５０３），検索条
件２０６の生成処理（Ｓ５０４），クエリー・ドキュメ
ント３０６の生成処理（Ｓ５０５）およびクエリー・ド
キュメント３０６によるサブドキュメントの検索処理
（Ｓ５０６）が行われ、検索結果であるサブドキュメン
トリスト３０８がクライアント１００に画面表示される
（Ｓ５０７）。Then, the specified range 601 is added to the query 2
When the execution of the search process is set to 00 (S50)
2), a query 200 analysis process (S503), a search condition 206 generation process (S504), a query document 306 generation process (S505), and a subdocument search process by the query document 306 (S506) are performed. The sub-document list 308 as a search result is displayed on the screen of the client 100 (S507).

【００５３】ステップＳ５０７で画面表示されたサブド
キュメントリスト３０８中の各サブドキュメントは、引
用文や注釈文の候補である。クライアント１００のユー
ザは、必要に応じてサブドキュメントの内容確認を行っ
た後、サブドキュメントリスト３０８から所望のサブド
キュメントを選択し、かつ、編集中の文書６００へサブ
ドキュメントを挿入する箇所を指定する（Ｓ８０２）。Each sub-document in the sub-document list 308 displayed on the screen in step S507 is a candidate for a quote or an annotation. After confirming the content of the sub-document as necessary, the user of the client 100 selects a desired sub-document from the sub-document list 308 and specifies a position where the sub-document is to be inserted into the document 600 being edited. (S802).

【００５４】ステップＳ８０２でサブドキュメントの選
択と挿入箇所の指定が行われると、指定された挿入箇所
に選択されたサブドキュメントを挿入する処理が行われ
る（Ｓ８０３）。ユーザは、その後、必要に応じて挿入
されたサブドキュメントを加工する等の処理を行うこと
ができる。When the sub-document is selected and the insertion position is specified in step S802, a process of inserting the selected sub-document into the specified insertion position is performed (S803). Thereafter, the user can perform processing such as processing the inserted sub-document as necessary.

【００５５】なお、ステップＳ８０２においては、検索
されたサブドキュメントをどのように利用するかを指定
するメニューを表示することができるようにしても良
い。具体的には、サブドキュメントの利用方法として、
メニューやアイコン等により引用文，頭注，脚注等の指
定を行うことができるようにする。そして、引用文が選
択された場合には、ステップＳ８０１で指定された範囲
６０１の後段にサブドキュメントを引用文として挿入す
る。また、頭注または脚注が選択された場合には、所定
の個所にサブドキュメントを注釈文として挿入すると共
に、指定された範囲６０１と注釈文とを関連づける数字
等を両方に付す等の処理を行うようにする。In step S802, a menu for designating how to use the searched sub-document may be displayed. Specifically, as a method of using sub-documents,
Ability to specify quotes, headnotes, footnotes, etc. using menus and icons. If a quote is selected, the sub-document is inserted as a quote after the range 601 specified in step S801. When a headnote or footnote is selected, a process is performed such that a subdocument is inserted as a comment at a predetermined location, and a number or the like that associates the designated range 601 with the comment is attached to both. To

【００５６】また、実施の形態２では、引用文や注釈文
の作成支援について説明したが、例えば、ある語句や事
柄についての説明を記述する場合に、その語句や事柄を
クエリー２００としてサブドキュメントを検索し、検索
したサブドキュメントをそれらの説明文として利用する
こともできる。In the second embodiment, the description has been given of the support for creating a quotation or an annotation. For example, when a description of a word or a matter is described, the sub-document is used as a query 200 using the word or the matter. You can also search and use the searched sub-documents as their descriptive text.

【００５７】このように、実施の形態２の文書検索装置
によれば、検索対象を文書のサブドキュメントとし、検
索条件に直接関連する文書中の記述を検索結果として得
ることができるようにしたため、検索結果を引用文や注
釈文として利用することができる。また、画面表示され
ている編集中の文書の任意の範囲６０１をクエリー２０
０とし、その場で検索の実行を指定することができるた
め、編集中にリアルタイムで引用文や注釈文となり得る
サブドキュメントを検索することができる。As described above, according to the document search apparatus of the second embodiment, the search target is a sub-document of the document, and the description in the document directly related to the search condition can be obtained as the search result. Search results can be used as quotes and annotations. Also, an arbitrary range 601 of the document being edited displayed on the screen is searched for by the query 20.
Since it can be set to 0 and the execution of the search can be designated on the spot, a sub-document that can be a quote or an annotation can be searched in real time during editing.

【００５８】なお、実施の形態１および２で説明した文
書検索装置は、予め用意されたプログラムをコンピュー
タやワークステーションで実行することによって実現さ
れる。このプログラムは、ハードディスク，フロッピー
ディスク，ＣＤ−ＲＯＭ，ＭＯ，ＤＶＤ等のコンピュー
タで読み取り可能な記録媒体に記録され、コンピュータ
によって記録媒体から読み出されることによって実行さ
れる。また、このプログラムは、上記記録媒体を介し
て、またはネットワークを介して配布することができ
る。The document search device described in the first and second embodiments is realized by executing a prepared program on a computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a floppy disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. This program can be distributed via the recording medium or via a network.

【００５９】[0059]

【発明の効果】以上説明したように、本発明の文書検索
装置（請求項１）によれば、文書を構成する各センテン
スを任意の数のセンテンスまたはパラグラフ毎のセンテ
ンス等からなるグループに区分して、区分したグループ
をサブドキュメントと定義し、予め用意した文書をサブ
ドキュメントの単位でベクター表現に変換して格納する
格納手段と、画面表示されている文書の任意の範囲を指
定する範囲指定手段と、範囲指定手段で指定された範囲
の文字列の構造を解析する解析手段と、解析手段による
解析結果に基づいて、前記指定された範囲の文字列を用
いて検索条件を生成する検索条件生成手段と、検索条件
生成手段で生成した検索条件をベクター表現に変換する
変換手段と、変換手段から入力した前記検索条件のベク
ター表現に基づいて、前記格納手段から該当する前記サ
ブドキュメントを検索する検索手段と、を備えたため、
文書の編集中にリアルタイムで必要な情報を検索するこ
とができ、検索処理の利便性の向上を図ることができ
る。また、文書中の必要な箇所を探す必要がないため、
作業効率の向上を図ることができる。さらに、ベクトル
空間法を用いて検索を行う場合に、検索条件との関連性
の大きなトピックだけでなく、複数の他のトピックを含
む文書をも確実に検索することができる。As described above, according to the document retrieval apparatus of the present invention (claim 1), each sentence constituting a document is divided into a group consisting of an arbitrary number of sentences or sentences for each paragraph. Storage means for defining a divided group as a sub-document, converting a prepared document into a vector representation in units of sub-documents, and storing the document, and a range specifying means for specifying an arbitrary range of the document displayed on the screen Analysis means for analyzing the structure of a character string in a range specified by the range specification means; and search condition generation for generating a search condition using the character string in the specified range based on an analysis result by the analysis means. Means, a conversion means for converting the search condition generated by the search condition generation means into a vector expression, and a conversion method based on the vector expression of the search condition input from the conversion means. Since having a search means for searching said sub-document corresponding from said storage means,
Necessary information can be searched in real time while editing the document, and the convenience of the search process can be improved. Also, because you do n’t have to find the right place in the document,
Work efficiency can be improved. Further, when performing a search using the vector space method, it is possible to reliably search not only a topic having a high relevance to a search condition but also a document including a plurality of other topics.

【００６０】また、本発明の文書検索装置（請求項２）
によれば、請求項１に記載の文書検索装置において、検
索手段は、前記検索条件のベクター表現と前記サブドキ
ュメントのベクター表現との類似度を求め、求めた類似
度が所定の閾値を超えるサブドキュメントを選択し、選
択したサブドキュメントを類似度に応じて配列したリス
トを検索結果として出力するため、所望のサブドキュメ
ントを容易に選択することができる。Further, the document search device of the present invention (Claim 2)
According to the document search apparatus according to claim 1, the search means obtains a similarity between the vector expression of the search condition and the vector expression of the sub-document, and determines the sub-similarity exceeding the predetermined threshold. Since a document is selected and a list in which the selected sub-documents are arranged according to the similarity is output as a search result, a desired sub-document can be easily selected.

【００６１】また、本発明の文書検索装置（請求項３）
によれば、請求項１または２に記載の文書検索装置にお
いて、さらに、前記文書を構成するセンテンスの区分方
法を指定するための指定手段と、指定手段を介して指定
された前記センテンスの区分方法に基づいて、前記文書
から前記サブドキュメントを生成する生成手段と、を備
え、格納手段は、前記生成手段で生成したサブドキュメ
ントをベクター表現に変換して格納するため、ユーザの
好みの長さのサブドキュメントを検索対象とすることが
できる。Further, the document search device of the present invention (claim 3)
3. The document retrieval apparatus according to claim 1, further comprising: designation means for designating a method of dividing sentences constituting the document, and a method of dividing the sentence designated via the designation means. Generating the sub-document from the document based on the document, wherein the storing means converts the sub-document generated by the generating means into a vector expression and stores the vector. Sub-documents can be searched.

【００６２】また、本発明の文書検索装置（請求項４）
によれば、請求項１〜３のいずれか１つに記載の文書検
索装置において、さらに、前記検索手段による検索結果
に基づいて、所望のサブドキュメントを選択する選択手
段と、画面表示されている文書の任意の場所を指定する
指定手段と、指定手段で指定した場所に前記選択手段で
選択したサブドキュメントを挿入する挿入手段と、を備
えたため、検索結果を引用文や注釈文として利用するこ
とができる。また、画面表示されている編集中の文書の
任意の範囲をクエリーとして指定し、その場で検索の実
行を指定することができるため、編集中にリアルタイム
で引用文や注釈文となり得るサブドキュメントを検索し
て得ることができる。A document search device according to the present invention (claim 4)
According to this, in the document search device according to any one of claims 1 to 3, further, a selection unit for selecting a desired sub-document based on a search result by the search unit is displayed on a screen. The search result is used as a citation or an annotation because it has a designating means for designating an arbitrary place of the document and an inserting means for inserting the sub-document selected by the selecting means at the place designated by the designating means. Can be. In addition, since any range of the document being edited displayed on the screen can be specified as a query and execution of the search can be specified on the spot, sub-documents that can be quoted or commentary in real time during editing can be specified. Can be obtained by searching.

【００６３】さらに、本発明のコンピュータ読み取り可
能な記録媒体（請求項５）によれば、請求項１〜４のい
ずれか１つに記載の文書検索装置の各手段としてコンピ
ュータを機能させるためのプログラムを記録したため、
このプログラムをコンピュータに実行させることによ
り、文書の編集中にリアルタイムで必要な情報を検索す
ることができ、文書中の必要な箇所を検索対象とするこ
とができ、さらに、複数のトピックを含む文書をも確実
に検索することができる文書検索装置を実現することが
できる。Further, according to a computer-readable recording medium of the present invention (claim 5), a program for causing a computer to function as each means of the document search device according to any one of claims 1 to 4 Was recorded,
By executing this program on a computer, it is possible to search for necessary information in real time while editing a document, to search for a necessary portion in the document, and further, to edit a document including a plurality of topics. A document search device capable of reliably searching for a document can also be realized.

[Brief description of the drawings]

【図１】実施の形態１の文書検索装置のシステム構成図
である。FIG. 1 is a system configuration diagram of a document search device according to a first embodiment.

【図２】図１に示したクライアントの処理を示す概略ブ
ロック図である。FIG. 2 is a schematic block diagram illustrating processing of a client illustrated in FIG. 1;

【図３】図１に示した検索サーバの処理を示す概略ブロ
ック図である。FIG. 3 is a schematic block diagram illustrating processing of a search server illustrated in FIG. 1;

【図４】実施の形態１の文書検索装置において、転置フ
ァイルの生成処理を示すフローチャートである。FIG. 4 is a flowchart illustrating a process of generating an inverted file in the document search device according to the first embodiment;

【図５】実施の形態１の文書検索装置において、サブド
キュメントの検索処理を示すフローチャートである。FIG. 5 is a flowchart showing a sub-document search process in the document search device according to the first embodiment.

【図６】実施の形態１の情報検索装置において、（ａ）
は、クエリーとする範囲の指定を行った様子を、（ｂ）
は、検索処理の実行を指定する様子をそれぞれ示す説明
図である。FIG. 6 is a diagram illustrating the information retrieval apparatus according to the first embodiment;
Shows the state of specifying the range to be a query, (b)
FIG. 8 is an explanatory diagram illustrating a state in which execution of a search process is specified;

【図７】実施の形態１の文書検索装置において、サブド
キュメントリストの一例を示す説明図である。FIG. 7 is an explanatory diagram showing an example of a sub-document list in the document search device according to the first embodiment.

【図８】実施の形態２の文書検索装置において、サブド
キュメントの検索を利用した引用文や注釈文の作成処理
を示すフローチャートである。FIG. 8 is a flowchart showing a process of creating a quote or an annotation using a search of a sub-document in the document search device according to the second embodiment.

[Explanation of symbols]

１００クライアント１０１文書ＤＢ１０２転置ファイル１０３検索サーバ１０４ネットワーク２００クエリー２０１，３００自然言語処理モジュール２０２，３０１辞書２０３，３０２文法辞書２０６検索条件２０７入力装置３０３ドキュメント・セット３０４データベース・ビルド・コンポーネント３０５クエリー・ビルド・コンポーネント３０６クエリー・ドキュメント３０７検索エンジン３０８サブドキュメントリスト６００文書６０１範囲（クエリー）６０２メニュー Reference Signs List 100 client 101 document DB 102 transposed file 103 search server 104 network 200 query 201, 300 natural language processing module 202, 301 dictionary 203, 302 grammar dictionary 206 search condition 207 input device 303 document set 304 database build component 305 query / query Build component 306 Query document 307 Search engine 308 Sub-document list 600 Document 601 Range (query) 602 Menu

Claims

[Claims]

1. A document search apparatus for performing search processing using a vector space method, wherein each sentence constituting a document is divided into a group consisting of an arbitrary number of sentences or sentences for each paragraph, and the divided groups are divided into groups. A storage unit that defines a sub-document, converts a prepared document into a vector expression in units of the sub-document, and stores the converted document; a range specification unit that specifies an arbitrary range of the document displayed on the screen; Analyzing means for analyzing the structure of the character string in the range specified by the means, and search condition generating means for generating a search condition using the character string in the specified range, based on the analysis result by the analyzing means, Conversion means for converting the search condition generated by the search condition generation means into a vector expression; And a search means for searching the storage means for the corresponding sub-document based on a document expression.

2. The search means calculates a similarity between the vector expression of the search condition and the vector expression of the sub-document, selects a sub-document whose calculated similarity exceeds a predetermined threshold, and selects the selected sub-document. 2. The document search apparatus according to claim 1, wherein a list in which are arranged according to the similarity is output as a search result.

3. A designation means for designating a method of classifying sentences constituting the document, and the sub-document from the document based on the sentence classification method designated via the designating means. 3. The document search device according to claim 1, further comprising: a generation unit configured to generate the document, wherein the storage unit converts the subdocument generated by the generation unit into a vector expression and stores the vector expression. 4.

4. A selecting unit for selecting a desired sub-document based on a search result by the searching unit; a specifying unit for specifying an arbitrary place of the document displayed on the screen; The document search apparatus according to any one of claims 1 to 3, further comprising: an insertion unit that inserts a sub-document selected by the selection unit into a designated location.

5. A computer-readable recording medium on which a program for causing a computer to function as each unit of the document search device according to claim 1 is recorded.