JPH10214268A

JPH10214268A - Method and device for retrieving document

Info

Publication number: JPH10214268A
Application number: JP9015667A
Authority: JP
Inventors: Toshihiro Fujinami; 稔弘藤並; Tomoyuki Tada; 多田　　智之; Hidenobu Kaneoka; 秀信金岡; Shinichi Mukogawa; 信一向川
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1997-01-29
Filing date: 1997-01-29
Publication date: 1998-08-11

Abstract

PROBLEM TO BE SOLVED: To reduce the required capacity of storage device by preparing the index of registered document from a summarized sentence automatically generated from this document and retrieving the document corresponding to a desired retrieval request based on this index. SOLUTION: A document retrieval system is composed of an input/output part 10, automatic document summarizing part 20, indexing part 30, document compressing/extending part 40, document retrieval part 50, document storage part 60, index storage part 70, document managing file storage part 80 and control part 90. Then, the summarized sentence of registered document is automatically generated from this document, the index of document is prepared from this automatically generated summarized sentence and based on this index, the document corresponding to the desired retrieval request is retrieved. Besides, when the number of times of access to the registered document in the past exceeds a specified value within a fixed period, the index of document is prepared from the source sentence of this registered document and based on this index, the document corresponding to the desired retrieval request is retrieved.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文書要約を用い
て文書を検索する文書検索方法および装置に関し、特
に、文書検索精度への影響を抑えながら登録文書等を格
納する記憶装置の必要容量を大幅に減少させ、かつユー
ザに対してより多くの検索情報を提供することができる
ようにした文書検索方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval method and apparatus for retrieving a document using a document summary, and more particularly, to reducing the required capacity of a storage device for storing registered documents and the like while suppressing the influence on the document retrieval accuracy. The present invention relates to a document search method and apparatus capable of greatly reducing the number of search information and providing more search information to a user.

【０００２】[0002]

【従来の技術】従来、所望の検索要求に対応して複数の
登録文書から所望の文書を検索する文書検索方法および
装置として種々の手法を採用するものが知られている。2. Description of the Related Art Heretofore, there have been known document search methods and apparatuses employing various methods as a document search method and apparatus for searching a desired document from a plurality of registered documents in response to a desired search request.

【０００３】例えば、各文書に対応してインデックスを
設定し、このインデックスに基づき所望の文書を検索す
るもの、各文書に対応して要約文を作成しこの要約文に
より所望の文書を検索するもの等が知られている。For example, one for setting an index corresponding to each document and searching for a desired document based on the index, another for creating a summary sentence corresponding to each document and searching for a desired document based on the summary sentence Etc. are known.

【０００４】また、インデックスを用いるものには、文
字成分方式のインデックスを用いるもの、形態素解析に
よる自動キーワード抽出によるもの等がある。[0004] In addition, as a method using an index, there is a method using a character component type index, a method using automatic keyword extraction by morphological analysis, and the like.

【０００５】[0005]

【発明が解決しようとする課題】しかし、その種の従来
の文書検索方法および装置においては、登録文書等を格
納する記憶装置に多くの記憶容量を必要とし、特に、文
書のインデックスを用いて所望の文書を検索する手法を
採用するものにおいては、このインデックスを格納する
ための記憶容量が多大になるという問題がある。However, in such a conventional document search method and apparatus, a storage device for storing registered documents and the like requires a large storage capacity, and in particular, a storage device for storing registered documents and the like requires a large amount of storage space. However, there is a problem that the storage capacity for storing the index becomes large in the case where the method of searching for the document is adopted.

【０００６】例えば、文字成分方式のインデックスを用
いる場合は、全文検索が可能になるが、そのインデック
スファイルの大きさは当該文書原文の１．５倍にも達す
る。For example, when a character component type index is used, full-text search is possible, but the size of the index file is 1.5 times the size of the original document.

【０００７】また、形態素解析による自動キーワード抽
出による全文検索でも、そのファイルサイズは当該文書
原文とほぼ同等になる。[0007] Also, in a full-text search by automatic keyword extraction by morphological analysis, the file size is almost equal to that of the original document.

【０００８】そこで、この発明は、文書検索精度への影
響を抑えながら登録文書等を格納する記憶装置の必要容
量を大幅に減少させ、かつユーザに対してより多くの検
索情報を提供することができるようにした文書検索方法
および装置を提供することを目的とする。Accordingly, the present invention is capable of significantly reducing the required capacity of a storage device for storing registered documents and the like while suppressing the influence on document search accuracy, and providing more search information to a user. It is an object of the present invention to provide a document search method and apparatus which can be used.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するた
め、請求項１の発明は、登録された複数の文書から所望
の検索要求に対応する文書を検索する文書検索方法にお
いて、上記登録された文書から該文書の要約文を自動生
成し、該自動生成した要約文から上記文書のインデック
スを作成し、上記インデックスに基づき上記所望の検索
要求に対応する文書を検索することを特徴とする。According to a first aspect of the present invention, there is provided a document search method for searching a document corresponding to a desired search request from a plurality of registered documents. A summary of the document is automatically generated from the document, an index of the document is created from the automatically generated summary, and a document corresponding to the desired search request is searched based on the index.

【００１０】また、請求項２の発明は、請求項１の発明
において、上記登録された文書の過去のアクセス回数が
一定の期間内に規定の値を越えている場合は、上記登録
された文書の原文から当該文書のインデックスを作成し
て上記インデックスに基づき上記所望の検索要求に対応
する文書を検索することを特徴とする。According to a second aspect of the present invention, in the first aspect of the invention, if the number of past accesses of the registered document exceeds a prescribed value within a predetermined period, the registered document An index of the document is created from the original text of the document, and a document corresponding to the desired search request is searched based on the index.

【００１１】また、請求項３の発明は、請求項１または
２の発明において、上記要約文に含まれる単語に該単語
より語数の少ない同義語若しくは短縮語が存在する場合
は、該単語を上記同義語若しくは短縮語で置換すること
により上記要約文を小型化することを特徴とする。[0011] In the invention according to claim 3, in the invention according to claim 1 or 2, when a word included in the abstract includes a synonym or abbreviated word having a smaller number of words than the word, the word is replaced with the word. It is characterized in that the above-mentioned summary sentence is reduced in size by replacing it with a synonym or a shortened word.

【００１２】また、請求項４の発明は、請求項３の発明
において、上記要約文に含まれる単語より語数の少ない
同義語若しくは短縮語が存在する単語が複数存在する場
合において、該単語を該要約文に最初に現れる単語を除
いて上記同義語若しくは短縮語で置換することを特徴と
する。Further, in the invention according to claim 4, in the invention according to claim 3, when there are a plurality of words having synonyms or abbreviated words having a smaller number of words than the words included in the summary sentence, the words are replaced with the words. It is characterized in that the word that appears first in the abstract is replaced with the above-mentioned synonym or shortened word.

【００１３】また、請求項５の発明は、登録された複数
の文書から所望の検索要求に対応する文書を検索する文
書検索方法において、上記登録された文書の過去のアク
セス回数が一定の期間内に規定の値を越えている場合
は、上記登録された文書の原文から当該文書の第１のイ
ンデックスを作成して該第１のインデックスに基づき上
記所望の検索要求に対応する文書を検索し、上記登録さ
れた文書の過去のアクセス回数が一定の期間内に規定の
値以下の場合は、上記登録された文書から該文書の要約
文を自動生成するとともに該自動生成した要約文から上
記文書の第２のインデックスを作成して該第２のインデ
ックスに基づき上記所望の検索要求に対応する文書を検
索することを特徴とする。According to a fifth aspect of the present invention, in the document search method for searching for a document corresponding to a desired search request from a plurality of registered documents, the number of past accesses of the registered document is within a predetermined period. If the value exceeds the specified value, a first index of the document is created from the original text of the registered document, and a document corresponding to the desired search request is searched based on the first index, If the number of past accesses of the registered document is equal to or less than a prescribed value within a certain period, a summary sentence of the document is automatically generated from the registered document, and the document summary is generated from the automatically generated summary sentence. A second index is created, and a document corresponding to the desired search request is searched based on the second index.

【００１４】また、請求項６の発明は、登録された複数
の文書から所望の検索要求に対応する文書を検索する文
書検索方法において、上記登録された文書を圧縮して圧
縮文書を自動生成し、上記所望の検索要求に対応して上
記圧縮文書を検索し、該検索した圧縮文書を伸長して出
力することを特徴とする。According to a sixth aspect of the present invention, there is provided a document search method for searching a plurality of registered documents for a document corresponding to a desired search request, wherein the registered document is compressed to automatically generate a compressed document. The compressed document is searched in response to the desired search request, and the searched compressed document is expanded and output.

【００１５】また、請求項７の発明は、請求項６の発明
において、上記登録された文書の過去のアクセス回数が
一定の期間内に規定の値を越えている場合は、上記登録
された文書の原文を検索して該検索した文書原文を出力
し、上記登録された文書の過去のアクセス回数が一定の
期間内に規定の値以下の場合は、上記圧縮文書を検索し
て該検索した圧縮文書を伸長して出力することを特徴と
する。According to a seventh aspect of the present invention, in the sixth aspect of the present invention, if the number of past accesses of the registered document exceeds a prescribed value within a predetermined period, the registered document is And outputs the retrieved document original text. If the past access count of the registered document is equal to or less than a prescribed value within a certain period, the compressed document is retrieved and the retrieved compressed The document is decompressed and output.

【００１６】また、請求項８の発明は、登録された複数
の文書から所望の検索要求に対応する文書を検索する文
書検索方法において、上記登録された文書の過去のアク
セス回数が第１の期間内に第１の規定の値を越えている
場合は、上記登録された文書の原文から当該文書の第１
のインデックスを作成して該第１のインデックスに基づ
き上記登録された文書の原文を検索して該検索した文書
原文を出力し、上記登録された文書の過去のアクセス回
数が上記第１の期間内に上記第１の規定の値以下の場合
は、上記登録された文書の原文から当該文書の第１のイ
ンデックスを作成するとともに上記登録された文書を圧
縮して圧縮文書を自動生成して該第１のインデックスに
基づき上記圧縮文書を検索して該検索した圧縮文書を伸
長して出力し、上記登録された文書の過去のアクセス回
数が上記第１の期間より長い第２の期間内に第２の規定
以下の場合は、上記登録された文書から該文書の要約文
を自動生成するとともに該自動生成した要約文から上記
文書の第２のインデックスを作成して該第２のインデッ
クスに基づき上記圧縮文書を検索して該検索した圧縮文
書を伸長して出力し、上記登録された文書の過去のアク
セス回数が上記第２の期間より長い第３の期間内に零で
ある場合は上記登録された文書を削除して上記要約文の
み保持することを特徴とする。The invention according to claim 8 is a document retrieval method for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein the registered document has been accessed in the past for a first period. If the value exceeds the first prescribed value, the first document of the document
Is created, the original text of the registered document is searched based on the first index, and the searched document original text is output. The number of past accesses of the registered document is within the first period. If the value is equal to or less than the first prescribed value, a first index of the registered document is created from the original text of the registered document, and the registered document is compressed to automatically generate a compressed document. 1, the compressed document is retrieved based on the index, and the retrieved compressed document is decompressed and output. The registered document is accessed in the second period within the second period longer than the first period. In the following cases, a summary sentence of the document is automatically generated from the registered document, and a second index of the document is created from the automatically generated summary sentence based on the second index. When the contracted document is retrieved and the retrieved compressed document is decompressed and output, and the past access count of the registered document is zero within a third period longer than the second period, the registered document is registered. The document is deleted and only the summary sentence is retained.

【００１７】また、請求項９の発明は、請求項８の発明
において、上記要約文に含まれる単語に該単語より語数
の少ない同義語若しくは短縮語が存在する場合は、上記
要約文に含まれる上記語数の少ない同義語若しくは短縮
語が存在する単語を上記同義語若しくは短縮語で置換す
ることにより上記要約文を小型化することを特徴とす
る。According to a ninth aspect of the present invention, in the invention of the eighth aspect, when a word included in the summary sentence includes a synonym or abbreviated word having a smaller number of words than the word, the word is included in the summary sentence. The present invention is characterized by reducing the size of the summary sentence by replacing a word having a synonym or abbreviated word having a small number of words with the synonym or abbreviated word.

【００１８】また、請求項１０の発明は、請求項９の発
明において、上記要約文に含まれる単語より語数の少な
い同義語若しくは短縮語が存在する単語が複数存在する
場合において、該単語を該要約文に最初に現れる単語を
除いて上記同義語若しくは短縮語で置換することを特徴
とする。The invention of claim 10 is the invention of claim 9 in which, when there are a plurality of words having synonyms or abbreviated words having a smaller number of words than the words included in the summary sentence, the words are replaced with the words. It is characterized in that the word that appears first in the abstract is replaced with the above-mentioned synonym or shortened word.

【００１９】また、請求項１１の発明は、登録された複
数の文書から所望の検索要求に対応する文書を検索する
文書検索装置において、上記登録された文書を格納する
文書記憶手段と、上記検索要求を入力する検索要求入力
手段と、上記文書記憶手段に格納された文書から該文書
の要約文を自動生成する自動文書要約手段と、上記自動
文書要約手段により生成された上記要約文から上記文書
のインデックスを作成するインデックス作成手段と、上
記インデックス作成手段により作成されたインデックス
を格納するインデックス記憶手段と、上記インデックス
記憶手段に格納されたインデックスに基づき上記文書記
憶手段に格納された文書の中から上記所望の検索要求に
対応する文書を検索する文書検索手段と、を具備するこ
とを特徴とする。The invention according to claim 11 is a document retrieval apparatus for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein the document storage means for storing the registered document; Search request input means for inputting a request, automatic document summarization means for automatically generating a summary sentence of the document from the document stored in the document storage means, and the above-mentioned document from the summary sentence generated by the automatic document summarization means. Index creating means for creating an index, an index storing means for storing the index created by the index creating means, and a document stored in the document storing means based on the index stored in the index storing means. Document search means for searching for a document corresponding to the desired search request.

【００２０】また、請求項１２の発明は、請求項１１の
発明において、上記要約文に含まれる単語に該単語より
語数の少ない同義語若しくは短縮語が存在する場合は、
該単語を上記同義語若しくは短縮語で置換することによ
り上記要約文を小型化する要約文小型化手段、を更に具
備することを特徴とする。According to a twelfth aspect of the present invention, in the invention of the eleventh aspect, when a word included in the summary includes a synonym or abbreviated word having a smaller number of words than the word,
The present invention is further characterized by further comprising a summary sentence reducing means for reducing the size of the summary sentence by replacing the word with the synonym or the abbreviation.

【００２１】また、請求項１３の発明は、請求項１２の
発明において、上記要約文小型化手段は、上記要約文に
含まれる単語より語数の少ない同義語若しくは短縮語が
存在する単語が複数存在する場合において、該単語を該
要約文に最初に現れる単語を除いて上記同義語若しくは
短縮語で置換することを特徴とする。According to a thirteenth aspect of the present invention, in the twelfth aspect of the invention, the abstract sentence miniaturizing means includes a plurality of synonyms or abbreviated words having a smaller number of words than the words included in the abstract sentence. In this case, the word is replaced with the above-mentioned synonym or abbreviated word except for the word that appears first in the summary sentence.

【００２２】また、請求項１４の発明は、登録された複
数の文書から所望の検索要求に対応する文書を検索する
文書検索装置において、上記登録された文書を格納する
文書記憶手段と、上記検索要求を入力するとともに該検
索要求に対応する検索結果を出力する入出力手段と、上
記文書記憶手段に格納された文書原文から該文書の要約
文を自動生成して上記文祖記憶手段に格納する自動文書
要約手段と、上記文書記憶手段に格納された上記文書原
文若しくは上記要約文から上記文書のインデックスを作
成するインデックス作成手段と、上記インデックス作成
手段により作成されたインデックスを格納するインデッ
クス記憶手段と、上記文書記憶手段に格納された上記文
書原文を圧縮して上記文書記憶手段に圧縮文書として格
納するとともに上記文書記憶手段に格納された上記圧縮
文書を伸長する文書圧縮伸長手段と、上記入力手段から
入力された上記検索要求に対応して上記インデックス記
憶手段に格納されたインデックスに基づき上記上記文書
記憶手段に格納された文書原文若しくは上記圧縮文書若
しくは上記要約文を検索して上記入出力手段に出力する
文書検索処理を実行する文書検索手段と、上記文書検索
手段による上記文書検索処理を管理する管理手段と、を
具備することを特徴とする。According to a fourteenth aspect of the present invention, there is provided a document retrieval apparatus for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein the document storage means for storing the registered document; Input / output means for inputting a request and outputting a search result corresponding to the search request; and automatically generating a summary sentence of the document from the original document stored in the document storage means and storing the summary in the sentence storage means. Automatic document summarization means, index creation means for creating an index of the document from the document original or summary sentence stored in the document storage means, and index storage means for storing the index created by the index creation means Compressing the document original stored in the document storage means, storing the compressed document in the document storage means as a compressed document, A document compression / decompression unit for decompressing the compressed document stored in the document storage unit, and the document storage unit based on the index stored in the index storage unit in response to the search request input from the input unit. A document search unit for executing a document search process for searching the stored document original text or the compressed document or the summary sentence and outputting the document to the input / output unit; and a management unit for managing the document search process by the document search unit. , Is provided.

【００２３】また、請求項１５の発明は、請求項１４の
発明において、上記要約文に含まれる単語に該単語より
語数の少ない同義語若しくは短縮語が存在する場合は、
該単語を上記同義語若しくは短縮語で置換することによ
り上記要約文を小型化する要約文小型化手段、を更に具
備することを特徴とする。According to a fifteenth aspect of the present invention, in the invention of the fourteenth aspect, when a word included in the summary includes a synonym or abbreviated word having a smaller number of words than the word,
The present invention is further characterized by further comprising a summary sentence reducing means for reducing the size of the summary sentence by replacing the word with the synonym or the abbreviation.

【００２４】また、請求項１６の発明は、請求項１５の
発明において、上記要約文小型化手段は、上記要約文に
含まれる単語より語数の少ない同義語若しくは短縮語が
存在する単語が複数存在する場合において、該単語を該
要約文に最初に現れる単語を除いて上記同義語若しくは
短縮語で置換することを特徴とする。According to a sixteenth aspect of the present invention, in the invention of the fifteenth aspect, the abstract sentence miniaturizing means includes a plurality of words including synonyms or abbreviated words having a smaller number of words than the words included in the abstract sentence. In this case, the word is replaced with the above-mentioned synonym or abbreviated word except for the word that appears first in the summary sentence.

【００２５】また、請求項１７の発明は、請求項１４の
発明において、上記検索手段は、上記文書記憶手段に格
納された文書の過去のアクセス回数が第１の期間内に第
１の規定の値を越えている場合は、上記インデックス記
憶手段に格納された上記文書原文から作成された第１の
インデックスに基づき上記文書記憶手段に格納された文
書原文を検索して該検索した文書原文を出力し、上記文
書記憶手段に格納された文書の過去のアクセス回数が上
記第１の期間内に上記第１の規定の値以下の場合は、上
記第１のインデックスに基づき上記圧縮文書を検索して
該検索した圧縮文書を伸長して出力し、上記文書記憶手
段に格納された文書の過去のアクセス回数が上記第１の
期間より長い第２の期間内に第２の規定以下の場合は、
上記インデックス記憶手段に格納された上記要約文から
作成された第２のインデックスに基づき上記圧縮文書を
検索して該検索した圧縮文書を伸長して出力し、上記文
書記憶手段に格納された文書の過去のアクセス回数が上
記第２の期間より長い第３の期間内に零である場合は上
記文書記憶手段に格納された文書を削除して上記要約文
のみ保持することを特徴とするAccording to a seventeenth aspect of the present invention, in the fourteenth aspect of the present invention, the search means determines that a past access count of the document stored in the document storage means is within a first period within a first period. If the value exceeds the value, the document original stored in the document storage unit is searched based on the first index created from the document original stored in the index storage unit, and the searched document original is output. If the past access count of the document stored in the document storage means is equal to or less than the first prescribed value within the first period, the compressed document is searched based on the first index. If the searched compressed document is expanded and output, and the number of past accesses of the document stored in the document storage means is equal to or less than a second rule within a second period longer than the first period,
The compressed document is searched based on the second index created from the summary sentence stored in the index storage means, and the searched compressed document is decompressed and output. When the number of times of access in the past is zero within a third period longer than the second period, the document stored in the document storage unit is deleted and only the summary sentence is retained.

【００２６】[0026]

【発明の実施の形態】以下、この発明の実施の形態につ
いて添付図面を参照して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

【００２７】図１は、この発明に係わる文書検索方法お
よび装置を適用して構成した文書検索システムの一実施
の形態を概略ブロック図で示したものである。FIG. 1 is a schematic block diagram showing an embodiment of a document search system configured by applying the document search method and apparatus according to the present invention.

【００２８】図１において、この文書検索システムは、
文書登録および文書検索要求を行なうとともに文書検索
結果表示を行なう入出力部１０、登録文書から要約文を
自動生成する自動文書要約部２０、登録文書の原文（以
下、登録原文という）若しくは要約文から当該文書のイ
ンデックスを作成するインデキシング部３０、登録原文
を圧縮または伸長する文書圧縮伸長部４０、文書インデ
ックスから所望の検索要求を実行する文書検索部５０、
登録文書若しくはその要約文若しくは圧縮文書を格納す
る文書記憶部６０、登録文書若しくは要約文のインデッ
クスを格納するインデックス記憶部７０、文書管理ファ
イルを格納する文書管理ファイル記憶部８０、この文書
検索システムの全体動作を統括制御する制御部９０を具
備して構成される。Referring to FIG. 1, the document search system includes:
An input / output unit 10 for performing document registration and document search requests and displaying a document search result, an automatic document summarization unit 20 for automatically generating an abstract from a registered document, and an original text of a registered document (hereinafter referred to as a registered original text) or an abstract sentence. An indexing unit 30 for creating an index of the document, a document compression / decompression unit 40 for compressing or decompressing the registered original text, a document search unit 50 for executing a desired search request from the document index,
A document storage unit 60 for storing a registered document or its digest or compressed document, an index storage unit 70 for storing an index of the registered document or digest, a document management file storage unit 80 for storing a document management file, A control unit 90 is provided to control the overall operation.

【００２９】この文書検索システムにおいて、まず、文
書の登録時には、入出力部１０からの文書登録要求を制
御部９０で検知し、これにより制御部９０は入出力部１
０に対して指定文書（登録文書）の文書記憶部６０への
保存を指示する。In this document search system, first, at the time of document registration, the control unit 90 detects a document registration request from the input / output unit 10, whereby the control unit 90 causes the input / output unit 1 to register.
0 instructs the designated document (registered document) to be stored in the document storage unit 60.

【００３０】同時に、制御部９０は、インデキシング部
３０に対して当該文書のインデックスを作成するインデ
キシングの実行を指示する。At the same time, the control unit 90 instructs the indexing unit 30 to execute indexing for creating an index of the document.

【００３１】インデキシング部３０は、上記制御部９０
からのインデキシングの実行の指示により当該文書のイ
ンデックスを生成し、この生成したインデックスをイン
デックス記憶部７０に保存する。The indexing unit 30 includes the control unit 90
An index of the document is generated in accordance with an instruction to execute indexing from, and the generated index is stored in the index storage unit 70.

【００３２】また、このとき制御部９０は、文書管理フ
ァイル記憶部８０に新規エントリを作成し、その登録日
を記録する。At this time, the control unit 90 creates a new entry in the document management file storage unit 80 and records the registration date.

【００３３】文書検索時には、入出力部１０からの文書
検索要求を制御部９０で検知し、これにより制御部９０
は文書検索部５０に対して当該文書検索要求に対応する
文書の検索を指示する。At the time of document search, the control unit 90 detects a document search request from the input / output unit 10, and thereby, the control unit 90
Instructs the document search unit 50 to search for a document corresponding to the document search request.

【００３４】これにより、文書検索部５０は、検索命令
となるキーワードまたは自然言語文にしたがって、イン
デックス記憶部７０および文書記憶部６０に対する検索
処理を行なう。Thus, the document search unit 50 performs a search process on the index storage unit 70 and the document storage unit 60 according to a keyword or a natural language sentence serving as a search command.

【００３５】文書検索部５０による上記検索処理が完了
すると、制御部９０はその検索結果を示す検索結果リス
トを表示する指示を入出力部１０に対して行なう。When the search processing by the document search section 50 is completed, the control section 90 instructs the input / output section 10 to display a search result list indicating the search results.

【００３６】これにより文書検索部５０による検索結果
が入出力部１０に出力されて、検索結果リストが入出力
部１０に表示される。As a result, the search result by the document search unit 50 is output to the input / output unit 10, and a search result list is displayed on the input / output unit 10.

【００３７】そして、ユーザは、上記入出力部１０に表
示される検索結果リストから所望の文書を見つけた場合
は、入出力部１０からその所望の文書のオープン要求を
行ない、この文書のオープン要求は制御部９０で検知さ
れ、これにより制御部９０は文書検索部５０に対して当
該文書を入出力部１０に出力する指示を行なう。When the user finds a desired document from the search result list displayed on the input / output unit 10, the user issues a request to open the desired document from the input / output unit 10. Is detected by the control unit 90, whereby the control unit 90 instructs the document search unit 50 to output the document to the input / output unit 10.

【００３８】ここで、文書検索部５０は、上記所望の文
書が文書記憶部６０に原文のまま格納されている場合
は、制御部９０の制御により、この文書記憶部６０に格
納されている文書をそのまま入出力部１０に出力して、
入出力部１０における当該文書の表示等がおこなわれ
る。Here, when the desired document is stored in its original state in the document storage unit 60, the document search unit 50 controls the document stored in the document storage unit 60 under the control of the control unit 90. Is output to the input / output unit 10 as it is,
The input / output unit 10 displays the document.

【００３９】また、文書検索部５０は、上記所望の文書
が文書記憶部６０に圧縮されて格納されている場合は、
制御部９０の制御により、この文書記憶部６０に格納さ
れている文書を文書圧縮伸長部４０で伸長した後に入出
力部１０に出力して、入出力部１０における当該文書の
表示等がおこなわれる。When the desired document is compressed and stored in the document storage unit 60, the document search unit 50
Under the control of the control unit 90, the document stored in the document storage unit 60 is decompressed by the document compression / decompression unit 40, and then output to the input / output unit 10, and the input / output unit 10 displays the document. .

【００４０】このとき、文書検索部５０は、文書管理フ
ァイル記憶部８０に格納されている上記オープンを行な
った文書の文書管理ファイルを更新し、当該文書管理フ
ァイルに当該文書のアクセス日時を記録する。At this time, the document search unit 50 updates the document management file of the opened document stored in the document management file storage unit 80, and records the access date and time of the document in the document management file. .

【００４１】ところで、この実施の形態の文書検索シス
テムにおいては、文書記憶部６０に格納された登録文書
およびインデックス記憶部７０に格納されたインデック
スの小型化のために、定期的に、例えば金曜日の午後１
０時等に、バッチ処理により、文書記憶部６０に格納さ
れた登録文書およびインデックス記憶部７０に格納され
たインデックスの小型化処理が行なわれる。By the way, in the document search system according to this embodiment, in order to reduce the size of the registered document stored in the document storage unit 60 and the index stored in the index storage unit 70, for example, on Friday, Afternoon 1
At 0 o'clock or the like, the batch processing is performed to reduce the size of the registered document stored in the document storage unit 60 and the index stored in the index storage unit 70.

【００４２】ここで、上記文書記憶部６０に格納された
登録文書の小型化処理およびインデックス記憶部７０に
格納されたインデックスの小型化処理を行なう文書管理
バッチ処理は、当該文書に対するアクセス回数に基づき
行なわれる。Here, the document management batch process for reducing the size of the registered document stored in the document storage unit 60 and the process of reducing the size of the index stored in the index storage unit 70 is based on the number of accesses to the document. Done.

【００４３】すなわち、図２に示すように、このバッチ
処理において、当該文書の登録から半年を経過してお
り、この過去半年間に当該文書に対するアクセス回数が
２回以下の場合は、登録文書を圧縮してこの圧縮した圧
縮文書を文書記憶部６０に格納し、また、インデックス
記憶部７０には圧縮する前の登録原文を作成元としたイ
ンデックスを格納する。That is, as shown in FIG. 2, in this batch processing, if six months have passed since the registration of the document, and if the number of accesses to the document has been two or less in the past six months, the registered document is deleted. The compressed document is compressed and stored in the document storage unit 60, and the index storage unit 70 stores an index based on the registered original text before compression.

【００４４】また、当該文書の登録から１年を経過して
おり、この過去１年間に当該文書に対するアクセス回数
が１回以下の場合は、文書記憶部６０には登録文書を圧
縮した圧縮文書が格納されるが、インデックス記憶部７
０には当該文書の要約文を作成元としたインデックスを
格納する。If one year has passed since the registration of the document, and if the number of accesses to the document has been one or less in the past one year, the document storage unit 60 stores a compressed document obtained by compressing the registered document. The index storage unit 7 is stored.
0 stores an index based on the summary sentence of the document.

【００４５】また、当該文書の登録から２年を経過して
おり、この過去２年間に当該文書に対するアクセス回数
が０回の場合は、当該文書の圧縮文書も削除する。この
場合、インデックス記憶部７０に格納されたインデック
スも削除されることになる。If two years have passed since the registration of the document, and the number of accesses to the document has been zero in the past two years, the compressed document of the document is also deleted. In this case, the index stored in the index storage unit 70 is also deleted.

【００４６】なお、上記文書管理バッチ処理でに用いら
れるアクセス回数は当該文書の登録も１回のアクセスと
してカウントする。Note that the number of accesses used in the document management batch processing is counted as one access for registration of the document.

【００４７】上記文書管理バッチ処理は、文書管理ファ
イル記憶部８０に格納されている文書管理ファイルに基
づき行なわれる。The document management batch process is performed based on a document management file stored in the document management file storage unit 80.

【００４８】図３は、上記文書管理ファイル記憶部８０
に格納される文書管理ファイルの一例を示したものであ
る。FIG. 3 shows the document management file storage section 80.
1 shows an example of a document management file stored in the file management server.

【００４９】この文書管理ファイルは、例えば９６年１
０月３１日時点において文書管理ファイル記憶部８０に
格納されている文書管理ファイルで、文書管理ファイル
においては、各文書の過去３回の文書オープンの履歴が
記録される。ここで、文書登録も１回のアクセスとして
記録されている。This document management file is, for example, 1996
This is a document management file stored in the document management file storage unit 80 as of January 31, and in the document management file, the history of the last three document opens for each document is recorded. Here, document registration is also recorded as one access.

【００５０】図３において、「ＦＬＡＧ」は、文書記憶
部６０に格納された各文書の状態を示すものである。In FIG. 3, "FLAG" indicates the state of each document stored in the document storage unit 60.

【００５１】すなわち、「ＦＬＡＧ」が「Ｎ」の場合
は、文書記憶部６０に格納された文書は非圧縮文書、イ
ンデックス記憶部７０に格納されたインデックスは、登
録原文を作成元とするノーマルインデックスであること
を示す。That is, when “FLAG” is “N”, the document stored in the document storage unit 60 is an uncompressed document, and the index stored in the index storage unit 70 is a normal index whose source is a registered original text. It is shown that.

【００５２】また、「ＦＬＡＧ」が「Ｃ」の場合は、文
書記憶部６０に格納された文書は圧縮文書、インデック
ス記憶部７０に格納されたインデックスは、登録原文を
作成元とするノーマルインデックスであることを示す。When the "FLAG" is "C", the document stored in the document storage unit 60 is a compressed document, and the index stored in the index storage unit 70 is a normal index having a registered original as a creation source. Indicates that there is.

【００５３】また、「ＦＬＡＧ」が「Ｓ」の場合は、文
書記憶部６０に格納された文書は圧縮文書、インデック
ス記憶部７０に格納されたインデックスは、要約文を作
成元とする要約文インデックスであることを示す。When the “FLAG” is “S”, the document stored in the document storage unit 60 is a compressed document, and the index stored in the index storage unit 70 is a summary sentence index whose source is a summary sentence. It is shown that.

【００５４】また、「ＦＬＡＧ」が「Ｘ」の場合は、削
除済みの文書であることを示す。When "FLAG" is "X", it indicates that the document has been deleted.

【００５５】また、「ＤＮ」は、各文書の文書番号、
「ＬＡ１」〜「ＬＡ３」は、過去３回のアクセス日を示
す。"DN" is the document number of each document,
“LA1” to “LA3” indicate the past three access dates.

【００５６】図４乃至図６は上記文書管理バッチ処理を
フローチャートで示したものである。FIG. 4 to FIG. 6 are flowcharts showing the document management batch processing.

【００５７】図４乃至図６において、この文書管理バッ
チ処理がスタートすると（ステップ１０１）、まず、初
期化処理が行なわれる（ステップ１０２）。この初期化
処理は、１）処理の対照となる文書番号「ＤＮ」を「０」にする
（ＤＮ＝０）２）「ＤＡＴＥ０」として半年前の日付を設定する（Ｄ
ＡＴＥ０＝半年前の日付）３）「ＤＡＴＥ１」として１年前の日付を設定する（Ｄ
ＡＴＥ１＝１年前の日付）４）「ＤＡＴＥ２」として２年前の日付を設定する（Ｄ
ＡＴＥ２＝２年前の日付）ことにより行なわれる。4 to 6, when the document management batch process starts (step 101), an initialization process is first performed (step 102). In this initialization processing, 1) the document number “DN” to be processed is set to “0” (DN = 0) 2) The date six months ago is set as “DATE0” (D
ATE0 = date six months ago) 3) Set date one year ago as “DATE1” (D
ATE1 = date one year ago) 4) Set date two years ago as "DATE2" (D
ATE2 = date two years ago).

【００５８】次に、文書管理ファイル記憶部８０から文
書番号「ＤＮ」の文書情報、すなわち「ＦＬＡＧ」、
「ＬＡ１」、「ＬＡ２」、「ＬＡ３」を取得する（ステ
ップ１０３）。Next, the document information of the document number “DN”, ie, “FLAG”,
“LA1”, “LA2”, and “LA3” are acquired (step 103).

【００５９】そして、「ＦＬＡＧ」は「Ｘ」か、すなわ
ち、「ＦＬＡＧ＝＝Ｘ」かを調べる（ステップ１０
４）。Then, it is checked whether "FLAG" is "X", that is, "FLAG == X" (step 10).
4).

【００６０】ここで、「ＦＬＡＧ＝＝Ｘ」である場合は
（ステップ１０４でＹＥＳ）、文書番号「ＤＮ」を
「１」インクリメントし「ＤＮ＝ＤＮ＋１」（ステップ
１０８）、次に、文書管理ファイルは最後かを調べ「Ｆ
ＩＬＥＥＮＤ」（ステップ１０９）、「ＦＩＬＥＥ
ＮＤ」でないと（ステップ１０９でＮＯ）、ステップ１
０３に戻る。Here, if "FLAG == X" (YES in step 104), the document number "DN" is incremented by "1", "DN = DN + 1" (step 108), and then the document management file Is the last one, "F
ILE END ”(step 109),“ FILE E
ND "(NO in step 109), step 1
Return to 03.

【００６１】また、ステップ１０９で、「ＦＩＬＥＥ
ＮＤ」であると判断されると（ステップ１０９でＹＥ
Ｓ）、この文書管理バッチ処理を終了する（ステップ１
１０）。In step 109, "FILE E
ND ”(YE at step 109)
S), the document management batch process ends (step 1).
10).

【００６２】また、ステップ１０４で、「ＦＬＡＧ＝＝
Ｘ」でないと判断されると（ステップ１０４でＹＥ
Ｓ）、次に、ステップ１０３で取得した「ＬＡ１」は
「ＤＡＴＥ２」より古いかを調べる（ステップ１０
５）。In step 104, “FLAG ==
X ”(YE at step 104)
S) Then, it is checked whether “LA1” acquired in step 103 is older than “DATE2” (step 10).
5).

【００６３】ここで、「ＬＡ１」が「ＤＡＴＥ２」より
古いと判断された場合は（ステップ１０５でＹＥＳ）、
文書番号ＤＮの文書を削除し（ステップ１０６）、文書
番号「ＤＮ」の「ＦＬＡＧ」を「Ｘ」に更新して（ステ
ップ１０７）、ステップ１０８に進む。If it is determined that "LA1" is older than "DATE2" (YES in step 105),
The document with the document number DN is deleted (step 106), the "FLAG" of the document number "DN" is updated to "X" (step 107), and the process proceeds to step 108.

【００６４】また、ステップ１０５で、「ＬＡ１」が
「ＤＡＴＥ２」より新しいと判断された場合は（ステッ
プ１０５でＮＯ）、図５のステップ１１１に進む。If it is determined in step 105 that "LA1" is newer than "DATE2" (NO in step 105), the process proceeds to step 111 in FIG.

【００６５】ステップ１１１では、「ＬＡ２」にデータ
があるかを調べる。ここで、「ＬＡ２」にデータがある
判断されると（ステップ１１１でＹＥＳ）、次に、「Ｌ
Ａ２」は「ＤＡＴＥ１」より古いかを調べる（ステップ
１１２）。In step 111, it is checked whether or not there is data in "LA2". Here, if it is determined that data exists in “LA2” (YES in step 111), then “L2”
It is checked whether "A2" is older than "DATE1" (step 112).

【００６６】ここで、「ＬＡ２」が「ＤＡＴＥ１」より
古いと判断されると（ステップ１１２でＹＥＳ）、次
に、「ＦＬＡＧ」は「Ｓ」か、すなわち、「ＦＬＡＧ＝
＝Ｓ」かを調べる（ステップ１１３）。ここで、「ＦＬ
ＡＧ＝＝Ｓ」であると判断されると（ステップ１１３で
ＹＥＳ）、図４のステップ１０８に進む。If it is determined that "LA2" is older than "DATE1" (YES in step 112), then "FLAG" is "S", that is, "FLAG =
= S ”(step 113). Here, "FL
If it is determined that AG == S ”(YES in step 113), the process proceeds to step 108 in FIG.

【００６７】また、ステップ１１３で、「ＦＬＡＧ＝＝
Ｓ」でないと判断されると（ステップ１１３でＮＯ）、
次に、「ＦＬＡＧ」は「Ｃ」か、すなわち、「ＦＬＡＧ
＝＝Ｃ」かを調べる（ステップ１１４）。In step 113, "FLAG ==
S ”(NO in step 113),
Next, “FLAG” is “C”, that is, “FLAG”
== C ”(step 114).

【００６８】ここで、「ＦＬＡＧ＝＝Ｃ」でないと判断
されると（ステップ１１４でＮＯ）、この文書番号「Ｄ
Ｎ」の文書を文書記憶部６０から読み出し、この文書番
号「ＤＮ」の文書を文書圧縮伸長部４０で圧縮してこの
圧縮した文書で文書記憶部６０に格納されている文書番
号「ＤＮ」の文書を更新する圧縮保存を行なう（ステッ
プ１１５）。If it is determined that "FLAG == C" is not satisfied (NO in step 114), the document number "D
The document with the document number "DN" is read from the document storage unit 60, the document with the document number "DN" is compressed by the document compression / decompression unit 40, and the compressed document with the document number "DN" stored in the document storage unit 60 is read. The document is compressed and stored for updating (step 115).

【００６９】そして、この文書番号「ＤＮ」の文書に対
応する要約文に基づき、この新たなインデックスを作成
し、この新たに作成したインデックスでインデックス記
憶部７０にこの文書番号「ＤＮ」の文書に対応して格納
されているインデックスを更新し（ステップ１１６）、
その後、文書番号「ＤＮ」の「ＦＬＡＧ」を「Ｓ」に更
新して（ステップ１１７）、図４のステップ１０８に進
む。The new index is created based on the summary sentence corresponding to the document with the document number “DN”, and the newly created index is stored in the index storage unit 70 with the document with the document number “DN”. The corresponding stored index is updated (step 116).
Thereafter, the "FLAG" of the document number "DN" is updated to "S" (step 117), and the process proceeds to step 108 in FIG.

【００７０】また、ステップ１１４で、「ＦＬＡＧ＝＝
Ｃ」であると判断された場合は（ステップ１１４でＹＥ
Ｓ）、ステップ１１５の処理を行なうことなくステップ
１１６に進む。In step 114, “FLAG ==
C ”(YE at step 114).
S) The process proceeds to step 116 without performing the process of step 115.

【００７１】また、ステップ１１１で、「ＬＡ２」にデ
ータがないと判断された場合は（ステップ１１１でＮ
Ｏ）、次に、「ＬＡ１」は「ＤＡＴＥ１」より古いかを
調べ（ステップ１１８）、ここで、「ＬＡ１」は「ＤＡ
ＴＥ１」より古いと判断されると（ステップ１１８でＹ
ＥＳ）、ステップ１１３に進むが、「ＬＡ１」は「ＤＡ
ＴＥ１」より新しいと判断されると（ステップ１１８で
ＮＯ）、図６のステップ１２６に進む。If it is determined in step 111 that there is no data in "LA2" (N in step 111
O) Then, it is checked whether "LA1" is older than "DATE1" (step 118). Here, "LA1" is "DA1".
If it is determined that the date is older than “TE1” (Y in step 118).
ES), the process proceeds to a step 113, but “LA1” is changed to “DA
If it is determined that it is newer than “TE1” (NO in step 118), the process proceeds to step 126 in FIG.

【００７２】また、ステップ１１２で、「ＬＡ２」は
「ＤＡＴＥ１」より新しいと判断されると（ステップ１
１２でＮＯ）、図６のステップ１１９に進む。If it is determined in step 112 that "LA2" is newer than "DATE1" (step 1)
(NO at 12), proceed to step 119 in FIG.

【００７３】ステップ１１９では、「ＬＡ３」にデータ
があるかを調べる。ここで、「ＬＡ３」にデータがある
判断されると（ステップ１１９でＹＥＳ）、次に、「Ｌ
Ａ３」は「ＤＡＴＥ０」より古いかを調べる（ステップ
１２０）。In step 119, it is checked whether or not there is data in "LA3". If it is determined that there is data in “LA3” (YES in step 119), then “L3”
It is checked whether "A3" is older than "DATE0" (step 120).

【００７４】ここで、「ＬＡ３」が「ＤＡＴＥ０」より
古いと判断されると（ステップ１２０でＹＥＳ）、次
に、「ＦＬＡＧ」は「Ｃ」か、すなわち、「ＦＬＡＧ＝
＝Ｃ」かを調べる（ステップ１２１）。ここで、「ＦＬ
ＡＧ＝＝Ｃ」であると判断されると（ステップ１２１で
ＹＥＳ）、図４のステップ１０８に進む。If it is determined that "LA3" is older than "DATE0" (YES in step 120), then "FLAG" is "C", that is, "FLAG =
= C ”(step 121). Here, "FL
When it is determined that AG == C ”(YES in step 121), the process proceeds to step 108 in FIG.

【００７５】また、ステップ１２１で、「ＦＬＡＧ＝＝
Ｃ」でないと判断されると（ステップ１２１でＮＯ）、
次に、「ＦＬＡＧ」は「Ｎ」か、すなわち、「ＦＬＡＧ
＝＝Ｎ」かを調べる（ステップ１２２）。In step 121, “FLAG ==
C ”(NO in step 121),
Next, “FLAG” is “N”, that is, “FLAG”
== N ”(step 122).

【００７６】ここで、「ＦＬＡＧ＝＝Ｎ」でないと判断
されると（ステップ１２２でＮＯ）、この文書番号「Ｄ
Ｎ」のインデックスを本文より作成し（ステップ１２
３）、その後、文書番号「ＤＮ」の「ＦＬＡＧ」を
「Ｃ」に更新して（ステップ１２５）、図４のステップ
１０８に進む。If it is determined that "FLAG == N" is not satisfied (NO in step 122), the document number "D
N ”is created from the text (step 12).
3) After that, "FLAG" of the document number "DN" is updated to "C" (step 125), and the process proceeds to step 108 of FIG.

【００７７】また、ステップ１２２で「ＦＬＡＧ＝＝
Ｎ」であると判断されると（ステップ１２２でＹＥ
Ｓ）、文書番号「ＤＮ」の文書を文書記憶部６０に圧縮
保存し（ステップ１２４）、ステップ１２３の処理を行
なうことなくステップ１２５に進むまた、ステップ１１
９で、「ＬＡ２」にデータがないと判断された場合は
（ステップ１１９でＮＯ）、次に、「ＬＡ２」は「ＤＡ
ＴＥ０」より古いかを調べ（ステップ１２７）、ここ
で、「ＬＡ２」は「ＤＡＴＥ０」より古いと判断される
と（ステップ１２７でＹＥＳ）、ステップ１２１に進む
が、「ＬＡ２」は「ＤＡＴＥ０」より新しいと判断され
ると（ステップ１２７でＮＯ）、ステップ１２８に進
む。Also, in step 122, "FLAG ==
N ”(YE in step 122).
S) The document with the document number "DN" is compressed and stored in the document storage unit 60 (step 124), and the process proceeds to step 125 without performing the process of step 123.
9, if it is determined that there is no data in "LA2" (NO in step 119), then "LA2"
It is checked whether it is older than “TE0” (step 127). If it is determined that “LA2” is older than “DATE0” (YES in step 127), the process proceeds to step 121, but “LA2” is older than “DATE0”. If it is determined that it is new (NO in step 127), the process proceeds to step 128.

【００７８】また、ステップ１２０で、「ＬＡ３」は
「ＤＡＴＥ０」より新しい（ステップ１２０でＮＯ）と
判断された場合はステップ１２８に進む。If it is determined in step 120 that "LA3" is newer than "DATE0" (NO in step 120), the flow advances to step 128.

【００７９】また、ステップ１２６では、「ＬＡ１」は
「ＤＡＴＥ０」より古いかを調べる。ここで、「ＬＡ
１」は「ＤＡＴＥ０」より古いと判断されると（ステッ
プ１２６でＹＥＳ）、ステップ１２１に進む。In step 126, it is checked whether "LA1" is older than "DATE0". Here, "LA
If it is determined that “1” is older than “DATE0” (YES in step 126), the process proceeds to step 121.

【００８０】また、ステップ１２６で、「ＬＡ１」は
「ＤＡＴＥ０」より新しいと判断されると（ステップ１
２６でＮＯ）、次に、「ＦＬＡＧ」は「Ｎ」か、すなわ
ち、「ＦＬＡＧ＝＝Ｎ」かを調べる（ステップ１２
８）。ここで、「ＦＬＡＧ＝＝Ｎ」であると判断される
と（ステップ１２８でＹＥＳ）、図４のステップ１０８
に進む。If it is determined in step 126 that "LA1" is newer than "DATE0" (step 1)
Then, it is checked whether "FLAG" is "N", that is, "FLAG == N" (step 12).
8). Here, if it is determined that “FLAG == N” (YES in step 128), step 108 in FIG.
Proceed to.

【００８１】また、ステップ１２８で、「ＦＬＡＧ＝＝
Ｎ」でないと判断されると（ステップ１２８でＮＯ）、
次に、「ＦＬＡＧ」は「Ｃ」か、すなわち、「ＦＬＡＧ
＝＝Ｃ」かを調べる（ステップ１２９）。In step 128, “FLAG ==
N ”(NO in step 128),
Next, “FLAG” is “C”, that is, “FLAG”
== C ”(step 129).

【００８２】ここで、「ＦＬＡＧ＝＝Ｃ」でないと判断
されると（ステップ１２９でＮＯ）、この文書番号「Ｄ
Ｎ」のインデックスを本文より作成し（ステップ１３
０）、文書番号「ＤＮ」の文書を伸長保存し（ステップ
１３１）、その後、文書番号「ＤＮ」の「ＦＬＡＧ」を
「Ｎ」に更新して（ステップ１３２）、図４のステップ
１０８に進む。If it is determined that "FLAG == C" is not satisfied (NO in step 129), the document number "D
N ”is created from the text (step 13
0), decompresses and saves the document with the document number "DN" (step 131), and then updates "FLAG" of the document number "DN" with "N" (step 132), and proceeds to step 108 in FIG. .

【００８３】また、ステップ１２９で、「ＦＬＡＧ＝＝
Ｃ」であると判断されると（ステップ１２９でＹＥ
Ｓ）、ステップ１３０の処理を行なうことなくステップ
１３１に進む。Also, at step 129, "FLAG ==
C ”(YE in step 129).
S), the process proceeds to step 131 without performing the process of step 130.

【００８４】図７は、上記図４乃至図６の文書管理バッ
チ処理がなされた後の例えば９６年１１月１日の時点で
文書管理ファイル記憶部８０に格納される文書管理ファ
イルの一例を示したものである。FIG. 7 shows an example of a document management file stored in the document management file storage unit 80, for example, on November 1, 1996 after the above-described document management batch processing has been performed. It is a thing.

【００８５】図３に示した文書管理ファイルと比較する
と明らかになるように、文書番号「ＤＮ」が「００００
０００２」の文書は、過去２年間のアクセス回数が０回
であるので、「ＦＬＡＧ」が「Ｘ」になり削除されたこ
とが示され、また、文書番号「ＤＮ」が「００００００
０３」の文書は、過去半年間のアクセス回数が２回であ
るので、「ＦＬＡＧ」が「Ｃ」になり、文書記憶部６０
に格納された文書は圧縮文書、インデックス記憶部７０
に格納されたインデックスは、登録原文を作成元とする
ノーマルインデックスであることが示され、また、文書
番号「ＤＮ」が「００００４２３３」の文書は、過去１
年間のアクセス回数が１回であるので、「ＦＬＡＧ」が
「Ｓ」になり、文書記憶部６０に格納された文書は圧縮
文書、インデックス記憶部７０に格納されたインデック
スは、要約文を作成元とする要約文インデックスである
ことがしめされ、また、文書番号「ＤＮ」が「０００１
２０１４」の文書は、過去半年間のアクセス回数が２回
であるので、「ＦＬＡＧ」が「Ｃ」になり、文書記憶部
６０に格納された文書は圧縮文書、インデックス記憶部
７０に格納されたインデックスは、登録原文を作成元と
するノーマルインデックスであることが示される。As is clear from comparison with the document management file shown in FIG. 3, the document number “DN” is changed to “0000”.
Since the document “0002” has been accessed 0 times in the past two years, “FLAG” has become “X”, indicating that it has been deleted, and the document number “DN” has been changed to “000000”.
Since the document “03” has been accessed twice in the past six months, “FLAG” becomes “C” and the document storage unit 60
The document stored in the index storage unit 70 is a compressed document.
Indicates that the index is a normal index created from the registered original text, and the document whose document number “DN” is “0000043” is
Since the number of accesses per year is one, “FLAG” becomes “S”, the document stored in the document storage unit 60 is a compressed document, and the index stored in the index storage unit 70 is And the document number “DN” is “0001”.
Since the document “2014” has been accessed twice in the past six months, “FLAG” is changed to “C”, and the document stored in the document storage unit 60 is stored in the compressed document and index storage unit 70. The index indicates that it is a normal index whose origin is the registered original text.

【００８６】図８は、図１に示した自動文書要約部２０
の構成例を示したものである。FIG. 8 shows the automatic document summarizing section 20 shown in FIG.
This is an example of the configuration.

【００８７】図８において、この自動文書要約部２０
は、文書構造解析部２１、形態素解析部２２、構文解析
部２３、重要文判定部２４、同義語・短縮語置換部２
５、形態素解析辞書２６、同義語・短縮語辞書２７を具
備して構成される。In FIG. 8, the automatic document summarizing section 20
Are a document structure analysis unit 21, a morphological analysis unit 22, a syntax analysis unit 23, an important sentence determination unit 24, a synonym / abbreviated word replacement unit 2
5. It comprises a morphological analysis dictionary 26 and a synonym / abbreviated word dictionary 27.

【００８８】ここで、文書構造解析部２１は、電子化さ
れた入力文書を解析し、タイトル、章や節の見出し、段
落、文などの構造を解析する。Here, the document structure analysis unit 21 analyzes the digitized input document and analyzes the structure of titles, chapter and section headings, paragraphs, sentences, and the like.

【００８９】また、形態素解析部２２では、形態素解析
辞書２６を用いて品詞の接続規則などから、タイトルや
本文などを単語単位に分割し、文節を切り出すととも
に、各単語の品詞を同定する。The morphological analysis unit 22 uses the morphological analysis dictionary 26 to divide the title, body, and the like into words based on part-of-speech connection rules, cuts out phrases, and identifies the parts of speech of each word.

【００９０】また、構文解析部２３では、主格（「は」
格、「が」格など）などを解析し、各文内の文節の掛か
り受け構造の解析を行なう。In the parsing unit 23, the nominative character ("wa"
Cases, "ga" cases, etc.) are analyzed, and the structure of the clauses in each sentence is analyzed.

【００９１】また、重要文判定部２４では、文書構造解
析、形態素解析、構文解析の各結果から、文章に含まれ
る各文が、タイトルや章見出し、文全体に対する段落位
置、段落内での文の位置、文章全体での重要語の抽出と
重要度付け、文に含まれる重要語の重要度、文に含まれ
る重要語の数、文中における重要語の主格・目的格など
の使われ方などから、各文の重要度を判定し、文章の要
約率と文間の照応状況などから、要約文とすべき文を決
定する。The important sentence judging section 24 finds each sentence included in the sentence from the results of the document structure analysis, the morphological analysis, and the syntax analysis, as a title, a chapter heading, a paragraph position relative to the entire sentence, and a sentence in the paragraph. Location, extraction and importance of important words in the entire sentence, importance of important words included in the sentence, number of important words included in the sentence, usage of the nominative and objective cases of important words in the sentence, etc. Then, the importance of each sentence is determined, and a sentence to be an abstract sentence is determined based on the summarization rate of sentences and the state of anaphor between sentences.

【００９２】なお、自動文書要約部２０の内、重要文判
定部２４までは、本願発明要旨を構成するものではな
く、文書要約システムなどで既に提案されている手法を
利用することで実現することができ、例えば、上記構成
以外に文脈解析や意味解析を含めることもできる。Note that, out of the automatic document summarizing section 20, the important sentence judging section 24 does not constitute the gist of the present invention, but can be realized by using a technique already proposed in a document summarizing system or the like. For example, a context analysis and a semantic analysis can be included in addition to the above configuration.

【００９３】同義語・短縮語置換部２５では、重要文判
定部２４で得られた要約文を先頭文から、同義語・短縮
語辞書２７に記載されている単語を検索し、重要文判定
部２４で得られた要約文より文字数の少ない同義語また
は短縮語が存在し、かつ、同一単語が２個以上検索され
れば、２個以降の単語をより文字数の少ない同義語また
は短縮語に置換する。The synonym / abbreviated word replacement unit 25 searches the head sentence for the summary sentence obtained by the important sentence determination unit 24 for words described in the synonym / abbreviated word dictionary 27, If there is a synonym or abbreviated word with fewer characters than the summary sentence obtained in step 24, and two or more identical words are searched, the words after two are replaced with synonyms or abbreviated words with fewer characters. I do.

【００９４】例えば、図９に示すような短縮語辞書を用
いた場合の例として、重要文判定部２４で得られた要約
文が図１０に示すものであるときの同義語・短縮語置換
部２５で置換された縮小要約文の一例を図１１に示す。For example, as an example in which a shortened word dictionary as shown in FIG. 9 is used, a synonym / abbreviated word replacement unit when the summary sentence obtained by the important sentence determination unit 24 is as shown in FIG. FIG. 11 shows an example of the reduced summary sentence replaced by 25.

【００９５】すなわち、図１１に示す短縮要約文におい
て、「関西国際空港」という単語は、２回目にでたとき
から「関空」という短縮語に置換され、「新東京国際空
港」という単語は、２回目にでたときから「成田」とい
う短縮語に置換され、結果的に図１０に示す要約文は図
１１に示すように短縮される。That is, in the abbreviated summary sentence shown in FIG. 11, the word "Kansai International Airport" is replaced with the abbreviation "Kanku International Airport" from the second appearance, and the word "New Tokyo International Airport" is From the second appearance, it is replaced with the abbreviation "Narita", and as a result, the summary sentence shown in FIG. 10 is shortened as shown in FIG.

【００９６】なお、１個目の単語検出時に、以降の省略
形を、例えば括弧付きで追加表記を行なうことや、一個
目の単語から省略形に置換することなども可能であり、
一個目の単語検出時に、以降の省略形を、例えば括弧突
きで追加表記を行なう構成をとると、短縮要約文の読み
易さを向上させることができる。When the first word is detected, the following abbreviations can be added, for example, in parentheses, or the first word can be replaced with an abbreviation.
When the first word is detected, if the following abbreviations are additionally described with, for example, parentheses, the readability of the abbreviated summary sentence can be improved.

【００９７】同義語・短縮語置換部２５における同義語
・短縮語の置換処理において、上記同義語・短縮語辞書
２７に記載の単語は、全て上記形態素解析辞書２６にも
登録されているものとする。In the synonym / abbreviated word substitution processing performed by the synonym / abbreviated word substitution unit 25, all the words described in the synonym / abbreviated word dictionary 27 are also registered in the morphological analysis dictionary 26. I do.

【００９８】自動文書要約部２０における文書要約時に
行なった形態素解析部２２による形態素解析結果を用い
て、各文節の自立語が同義語・短縮語になり得る品詞
（名詞あるいは固有名詞など）の場合は、その単語が同
義語・短縮語辞書２７に登録されているかが検出され
る。Using the result of morphological analysis by the morphological analysis unit 22 performed at the time of document summarization by the automatic document summarization unit 20, when the independent words of each clause are parts of speech (such as nouns or proper nouns) that can be synonyms or shortened words Is detected whether the word is registered in the synonym / abbreviated word dictionary 27.

【００９９】また、その単語が検出される毎に、その単
語の出現が１回目であるか否かを示すフラグを準備して
おく。そして、この単語が検出された場合は、そのフラ
グをチェックし、２回目以降に出現した単語は、同義語
・短縮語辞書２７に登録されている語数のより少ない単
語で置換される。Each time the word is detected, a flag indicating whether or not the word appears for the first time is prepared. Then, when this word is detected, the flag is checked, and words appearing for the second time or later are replaced with words having a smaller number of words registered in the synonym / abbreviated word dictionary 27.

【０１００】同義語・短縮語辞書２７に記載の単語が形
態素解析辞書２６に含まれていない場合は、同義語・短
縮語になり得る品詞（名詞あるいは固有名詞など）が連
続している部分を１つの単語となる可能性があるとし、
連続するものを組み合わせて、検査することにより対応
可能である。If the words described in the synonym / abbreviated word dictionary 27 are not included in the morphological analysis dictionary 26, the part of the part of speech (noun or proper noun) that can be a synonym / abbreviated word is Suppose it could be one word,
It is possible to cope by combining and inspecting continuous ones.

【０１０１】なお、図１に示した、文書圧縮伸長部４０
およびインデキシング部３０および文書検索部５０自体
は、本願発明要旨を構成するものではなく、この種の文
書検索システムやファイル圧縮伸長システムにすでに実
施されている手法を利用することにより実現できる。The document compression / decompression unit 40 shown in FIG.
The indexing unit 30 and the document search unit 50 do not constitute the gist of the present invention, but can be realized by using a technique already implemented in this type of document search system or file compression / decompression system.

【０１０２】なお、インデキシング部３０におけるイン
デックス作成の手法としては、文字成分方法、形態素解
析によるキーワード自動抽出方式等があるが、本願発明
ではいずれの手法を採用するものでも利用可能である。The index creation method in the indexing unit 30 includes a character component method, a keyword automatic extraction method by morphological analysis, and the like. In the present invention, any method that employs any method can be used.

【０１０３】ところで、全文検索を行なうシステムで
は、そのインデックスサイズは、方式によって異なるも
のの、通常登録文書サイズの１００％〜１５０％程度で
あるといわれ、また現在普及している文書圧縮ルールに
より文書を圧縮した場合は、原文を１０％のサイズにま
で圧縮することができる。In a full-text search system, the index size is usually about 100% to 150% of the registered document size, though it differs depending on the method. When compressed, the original text can be compressed to a size of 10%.

【０１０４】また、要約文は原文の１０％〜２０％程度
の量で、ほぼ原文の主題を掴むことができるといわれて
いる。It is said that the abstract sentence can roughly grasp the subject of the original sentence in an amount of about 10% to 20% of the original sentence.

【０１０５】したがって、例えば１００Ｍｂｙｔｅの登
録文書で１０％の要約を行なった場合、上記実施例によ
ると、初期状態で、登録文書サイズは１００Ｍｂｙｔ
ｅ、インデックスサイズは１００Ｍｂｙｔｅ、合計２０
０Ｍｂｙｔｅであったのが、原文圧縮状態では、登録文
書サイズが１０Ｍｂｙｔｅ、インデックスサイズは１０
０Ｍｂｙｔｅ、合計１１０Ｍｂｙｔｅの５５％まで文書
量を小型化でき、要約文によるインデックスを用いた状
態では、登録文書サイズが１０Ｍｂｙｔｅ、インデック
スサイズは１０Ｍｂｙｔｅ、合計２０Ｍｂｙｔｅの１０
％まで文書量を小型化できる。Therefore, for example, when a 10% digest is performed for a registered document of 100 Mbytes, the registered document size is 100 Mbytes in the initial state according to the above embodiment.
e, index size is 100 Mbytes, total 20
In the original text compression state, the registered document size is 10 Mbytes and the index size is 10 Mbytes.
The document amount can be reduced to 55% of 0 Mbytes, a total of 110 Mbytes, and in a state using an index based on a summary sentence, the registered document size is 10 Mbytes, the index size is 10 Mbytes, and a total of
% Document size can be reduced.

【０１０６】なお、上記実施形態では日本語を対象にし
た文書検索システムを示したが、本願発明は、他の言語
を対象にする文書検索システムにも同様に適用すること
ができるのは勿論である。In the above embodiment, the document search system for Japanese is shown. However, the present invention can of course be applied to a document search system for other languages as well. is there.

【０１０７】[0107]

【発明の効果】以上説明したようにこの発明によれば、
登録された文書から該文書の要約文を自動生成し、該自
動生成した要約文から上記文書のインデックスを作成
し、上記インデックスに基づき上記所望の検索要求に対
応する文書を検索するように構成したので、文書検索精
度への影響を抑えながら登録文書等を格納する記憶装置
の必要容量を大幅に減少させ、かつユーザに対してより
多くの検索情報を提供することができるという効果を奏
する。As described above, according to the present invention,
A summary of the document is automatically generated from the registered document, an index of the document is created from the automatically generated summary, and a document corresponding to the desired search request is searched based on the index. Therefore, it is possible to significantly reduce the required capacity of the storage device for storing the registered documents and the like while suppressing the influence on the document search accuracy, and to provide more search information to the user.

[Brief description of the drawings]

【図１】この発明に係わる文書検索方法および装置を適
用して構成した文書検索システムの一実施の形態を示す
概略ブロック図。FIG. 1 is a schematic block diagram showing an embodiment of a document search system configured by applying a document search method and apparatus according to the present invention.

【図２】図１に示した文書記憶部に格納された登録文書
の小型化処理およびインデックス記憶部に格納されたイ
ンデックスの小型化処理を行なう文書管理バッチ処理ト
ランスの概要を説明する図。FIG. 2 is a view for explaining an outline of a document management batch processing transformer for performing a process of reducing the size of a registered document stored in a document storage unit and a process of reducing the size of an index stored in an index storage unit shown in FIG. 1;

【図３】図１に示した文書管理ファイル記憶部に格納さ
れる文書管理ファイルの一例を示した図。FIG. 3 is a diagram showing an example of a document management file stored in a document management file storage unit shown in FIG.

【図４】図２で説明した文書管理バッチ処理の詳細を示
すフローチャート。FIG. 4 is a flowchart illustrating details of a document management batch process described in FIG. 2;

【図５】図２で説明した文書管理バッチ処理の詳細を示
すフローチャート。FIG. 5 is a flowchart illustrating details of a document management batch process described in FIG. 2;

【図６】図２で説明した文書管理バッチ処理の詳細を示
すフローチャート。FIG. 6 is a flowchart illustrating details of a document management batch process described in FIG. 2;

【図７】図４乃至図６の文書管理バッチ処理がなされた
後の文書管理ファイル記憶部に格納される文書管理ファ
イルの一例を示した図。FIG. 7 is a diagram showing an example of a document management file stored in a document management file storage unit after the document management batch processing of FIGS. 4 to 6 has been performed;

【図８】図１に示した自動文書要約部の構成例を示した
ブロック図。FIG. 8 is a block diagram illustrating a configuration example of an automatic document summarizing unit illustrated in FIG. 1;

【図９】図８に示した自動文書要約部の同義語・短縮語
置換部で使用される短縮語辞書の一例を示した図。FIG. 9 is a diagram showing an example of a contracted word dictionary used in a synonymous / abbreviated word replacing unit of the automatic document summarizing unit shown in FIG. 8;

【図１０】図８に示した自動文書要約部の重要文判定部
で得られた要約文の一例を示す図。FIG. 10 is a view showing an example of a summary sentence obtained by an important sentence determination unit of the automatic document summarization unit shown in FIG. 8;

【図１１】図１０に示した要約文が図８に示した自動文
書要約部の同義語・短縮語置換部でどのように置換され
るかの一例を示した図。11 is a diagram showing an example of how the summary sentence shown in FIG. 10 is replaced by a synonym / abbreviated word replacement unit of the automatic document summary unit shown in FIG. 8;

[Explanation of symbols]

１０入出力部２０自動文書要約部３０インデキシング部４０文書圧縮伸長部５０文書検索部６０文書記憶部７０インデックス記憶部８０文書管理ファイル記憶部９０制御部２１文書構造解析部２２形態素解析部２３構文解析部２４重要文判定部２５同義語・短縮語置換部２６形態素解析辞書２７同義語・短縮語辞書 Reference Signs List 10 input / output unit 20 automatic document summarization unit 30 indexing unit 40 document compression / decompression unit 50 document search unit 60 document storage unit 70 index storage unit 80 document management file storage unit 90 control unit 21 document structure analysis unit 22 morphological analysis unit 23 syntax analysis Section 24 important sentence determination section 25 synonym / abbreviated word replacement section 26 morphological analysis dictionary 27 synonym / abbreviated word dictionary

───────────────────────────────────────────────────── フロントページの続き (72)発明者向川信一京都府京都市右京区花園土堂町10番地オムロン株式会社内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Shinichi Mukakawa 10 Odron-cho, Hanazono-cho, Ukyo-ku, Kyoto-shi, Kyoto

Claims

[Claims]

1. A document retrieval method for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein a summary of the document is automatically generated from the registered document, and the automatically generated summary is generated. A document index corresponding to the desired search request based on the index.

2. If the number of past accesses of the registered document exceeds a prescribed value within a predetermined period, an index of the document is created from the original text of the registered document, and the index is stored in the index. 2. The document search method according to claim 1, wherein a document corresponding to the desired search request is searched based on the search request.

3. When a synonym or abbreviated word having a smaller number of words than the word included in the summary sentence exists,
3. The document search method according to claim 1, wherein the abstract is reduced in size by replacing the word with the synonym or abbreviation.

4. When there are a plurality of words having a synonym or abbreviated word having a smaller number of words than the words included in the summary sentence, the word is replaced with the synonym or the abbreviation except for the first word appearing in the summary sentence. 4. The document search method according to claim 3, wherein replacement is performed with a shortened word.

5. A document retrieval method for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein a past access count of the registered document exceeds a prescribed value within a predetermined period. If there is, the first index of the document is created from the original text of the registered document, and a document corresponding to the desired search request is searched based on the first index. If the number of accesses of the document is equal to or less than a prescribed value within a certain period, a summary of the document is automatically generated from the registered document, and a second index of the document is created from the automatically generated summary. A document corresponding to the desired search request based on the second index.

6. A document retrieval method for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein the registered document is compressed to automatically generate a compressed document. A document search method characterized by correspondingly searching the compressed document, and expanding and outputting the searched compressed document.

7. If the number of past accesses of the registered document exceeds a prescribed value within a certain period, search the original of the registered document and output the searched document original. If the number of past accesses of the registered document is equal to or less than a prescribed value within a certain period, the compressed document is searched, and the searched compressed document is expanded and output. 6. The document search method described in 6.

8. A document retrieval method for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, wherein a past access count of the registered document is a first prescribed number within a first period. If the value exceeds the value, a first index of the document is created from the original text of the registered document, and the original text of the registered document is searched based on the first index. If the number of past accesses of the registered document is equal to or less than the first prescribed value within the first period, the first index of the document is obtained from the original text of the registered document. Creating and automatically generating a compressed document by compressing the registered document; searching the compressed document based on the first index; expanding and outputting the searched compressed document; Past Is less than or equal to the second rule within a second period longer than the first period,
A summary of the document is automatically generated from the registered document, a second index of the document is created from the automatically generated summary, and the compressed document is searched based on the second index. If the number of past accesses of the registered document is zero within a third period longer than the second period, the registered document is deleted and the summary is deleted. A document search method characterized by retaining only sentences.

9. When a word included in the summary sentence includes a synonym or abbreviated word having a smaller number of words than the word,
9. The document retrieval method according to claim 8, wherein the abstract sentence is miniaturized by replacing a word having a small number of synonyms or abbreviated words included in the abstract sentence with the synonym or abbreviated word. Method.

10. When there are a plurality of words having a synonym or abbreviated word having a smaller number of words than the words included in the summary sentence, the word is replaced with the synonym or the abbreviation except for the first word appearing in the summary sentence. 10. The document search method according to claim 9, wherein replacement is performed with a shortened word.

11. A document retrieval apparatus for retrieving a document corresponding to a desired retrieval request from a plurality of registered documents, a document storage means for storing the registered document, and a retrieval request input for inputting the retrieval request Means, automatic document summarization means for automatically generating a summary sentence of the document from the document stored in the document storage means, and index creation for creating an index of the document from the summary sentence generated by the automatic document summarization means Means; an index storage means for storing an index created by the index creation means; and a response to the desired search request from documents stored in the document storage means based on the index stored in the index storage means. A document search device comprising: a document search unit that searches for a document to be searched.

12. When a synonym or abbreviated word having a smaller number of words than the word included in the abstract exists, the word is replaced with the synonym or abbreviated word to reduce the size of the abstract. 12. The document search apparatus according to claim 11, further comprising:

13. The summary sentence miniaturizing means, wherein, when there are a plurality of words having a synonym or abbreviated word having a smaller number of words than the words included in the summary sentence, the word appears first in the summary sentence 13. The document search apparatus according to claim 12, wherein a word is replaced with the synonym or abbreviation.

14. A document search device for searching a document corresponding to a desired search request from a plurality of registered documents, a document storage means for storing the registered document, and inputting the search request and performing the search. An input / output unit that outputs a search result corresponding to the request; an automatic document summarization unit that automatically generates a summary sentence of the document from the original document stored in the document storage unit and stores the summary sentence in the sentence storage unit; Index creation means for creating an index of the document from the document original or the summary sentence stored in the document storage means; index storage means for storing the index created by the index creation means; and storage in the document storage means Compressed document original text stored in the document storage means as a compressed document and stored in the document storage means A document compression / decompression unit for decompressing the compressed document, and a document original stored in the document storage unit based on an index stored in the index storage unit in response to the search request input from the input unit. Or a document search unit for executing a document search process for searching for the compressed document or the summary sentence and outputting it to the input / output unit; and a management unit for managing the document search process by the document search unit. A document search device characterized by the following.

15. When a word included in the summary includes a synonym or abbreviated word having a smaller number of words than the word, the word is replaced with the synonym or abbreviated word to reduce the size of the summary sentence. 15. The document search apparatus according to claim 14, further comprising: a summary sentence reducing unit.

16. The summary sentence minimizing means, wherein, when there are a plurality of words having a synonym or abbreviated word having a smaller number of words than the words included in the summary sentence, the word appears first in the summary sentence 16. The document search device according to claim 15, wherein a word is replaced with the synonym or the abbreviation.

17. The method according to claim 17, wherein the past access count of the document stored in the document storage means exceeds a first prescribed value within a first period, and stores the document in the index storage means. A document original stored in the document storage unit is searched based on the first index created from the obtained document original, and the searched document original is output. If the access count is equal to or less than the first prescribed value within the first period, the compressed document is searched based on the first index, and the searched compressed document is expanded and output. If the number of past accesses of the document stored in the storage unit is equal to or less than a second rule within a second period longer than the first period, the document is created from the summary sentence stored in the index storage unit. Second The compressed document is retrieved based on the index of the above, and the retrieved compressed document is decompressed and output. The number of past accesses of the document stored in the document storage means is within a third period longer than the second period. 15. The document search apparatus according to claim 14, wherein when the number is zero, the document stored in the document storage unit is deleted and only the summary sentence is retained.