JPH06332947A

JPH06332947A - Method and device for storing and reproducing data

Info

Publication number: JPH06332947A
Application number: JP5119728A
Authority: JP
Inventors: Katsumi Murai; 克己村井; Kenji Hashimoto; 賢治橋本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-05-21
Filing date: 1993-05-21
Publication date: 1994-12-02

Abstract

PURPOSE:To accelerate retrieval by reducing the access of a storage filing device by preparing the associated list of character connection with the character 'i' of recorded document data as entry in advance and continuously arranging this associated list as many as possible for each entry inside the storage file. CONSTITUTION:The document data stored in a storage device 1 are read out, the statistical appearance information of a connecting character 'j' connected with the character 'i' is provided, and the character type 'j' connected to the character 'i' is recorded in a first character connection initial reservation area together with the position information of the storage device 1. On the other hand, another second character connection initial reservation area not depending on the character type of character connection is secured and when exceeding the first area, the character type 'j' connected to the character 'i' is recorded in the second area together with the position information by abandoning or moving this first area. At such a time, which area writes character connection is recognized by writing an identification code in the first area, for example, and first observing the first area.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、大量の文書データを蓄
えた２次記憶装置から検索用のインデックス情報を付与
することなしに要求された文書データを引き出してくる
全文検索方式を基本とした検索方法及び検索装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is based on a full-text search method in which required document data is retrieved from a secondary storage device storing a large amount of document data without adding index information for retrieval. The present invention relates to a search method and a search device.

【０００２】[0002]

【従来の技術】近年、ワードプロセッサーやパーソナル
コンピューターの普及により大量の文書データが仕事場
や家庭において流通利用される状況になってきた。この
大量の文書データを整理して有効に利用していくため
に、大容量データベースと高速検索マシンが研究開発さ
れている。しかし、従来の検索マシンでは検索用にイン
デックス情報を付ける必要があり、データ量が増大する
につれてこのインデックス付け作業に大変な労力が必要
となってきた。これに対して、前記インデックス情報を
つける作業を必要としない方法として、前記インデック
ス情報なしにこの大量の文書データの中から的確かつ高
速に所望の文書データを探し出すことができる全文検索
方式に基づく検索装置が発表されている。例えば、１９
７０年スロトニック（Slotnick,D.L.）が提案したロジ
ック・パー・トラックディスクは、２次記憶装置の一種
であるディスクの各ヘッドに検索専用のプロセッサーを
付加し、検索条件を満足した情報だけをホストコンピュ
ーターに転送することにより検索の高速化を試みたもの
であり、具体的な装置としてトロント大学のＲＡＰ等が
実現している。一方、また全文検索用テキストサーチマ
シンの研究の例としては、「K.Murai et.al.: Jpn. J.
Appl. Phys. Vol.31(1992), pp.680-687, Part1, No.
2B, February 1992, Index-Free Full-Text Search F
unction Installed into Optical Disk Drive」等があ
り、原文書の文字連接情報を用い、検索語に含まれる文
字連接を含む文書を候補として引く内容番地記憶の表の
形の連想記憶を用い、候補とされた文書ファイルをハー
ドウェアで高速に全文検索して実際に検索語が含まれて
いるかをチェックする構成により、高速な全文検索を実
現している。2. Description of the Related Art In recent years, due to the spread of word processors and personal computers, a large amount of document data has been distributed and used at work and home. In order to organize and effectively use this large amount of document data, large-capacity databases and high-speed search machines have been researched and developed. However, in the conventional search machine, it is necessary to add index information for search, and as the amount of data increases, this indexing work requires a great deal of labor. On the other hand, as a method that does not require the work of adding the index information, a search based on a full-text search method that can accurately and quickly find desired document data from the large amount of document data without the index information. The device has been announced. For example, 19
Logic Par Track Disk, proposed by Slotnick (DL) in 1970, is a kind of secondary storage device with a dedicated search processor attached to each head of the disk to host only the information satisfying the search conditions. This is an attempt to speed up the search by transferring it to a computer, and a concrete device such as RAP of the University of Toronto has been realized. On the other hand, as an example of research on a text search machine for full text search, `` K. Murai et.al .: Jpn. J.
Appl. Phys. Vol.31 (1992), pp.680-687, Part1, No.
2B, February 1992, Index-Free Full-Text Search F
unction Installed into Optical Disk Drive, etc., and uses associative memory in the form of a table of content address memory that uses the character concatenation information of the original document and draws the document containing the character concatenation included in the search word as a candidate. A high-speed full-text search is realized by a structure in which a full-text search of a document file is performed at high speed by hardware to check whether the search word is actually included.

【０００３】[0003]

【発明が解決しようとする課題】しかしながらこのよう
な連想記憶構造を文字連接を用いて実際のコンピュータ
ファイル上で実現しようとしても、ディレクトリ管理さ
れている文書ファイルを矛盾なく管理するのは容易では
なく、最も時間のかかる記憶ファイル装置のファイルの
読みだしをセクタの連続性を確保して効率的でかつ小容
量かつ高速に候補文書ファイルを候補として提示するこ
とは困難であった。However, even if such an associative memory structure is to be realized on an actual computer file by using character concatenation, it is not easy to manage the document files managed in the directory without contradiction. It has been difficult to present a candidate document file as a candidate efficiently, with a small capacity and at high speed by ensuring sector continuity in reading a file from the storage file device, which takes the longest time.

【０００４】本発明ではこれら従来の装置においては効
率が悪く、また検索速度的にも十分でなかったのもの
を、効率的な内容番地記憶連想方法で小型高性能な検索
装置を提供することを目的とする。In the present invention, it is an object of the present invention to provide a small-sized and high-performance search device by an efficient content address storage association method, which is inefficient in these conventional devices and insufficient in search speed. And

【０００５】[0005]

【課題を解決するための手段】上記問題点を解決するた
めに本発明の内容番地記憶検索方法は、記憶装置に蓄え
られた文書データを読み出し、適当な文書量毎に識別名
称を付与し、かつその文書データの内容を調べ、文字ｉ
に連接する文字ｊの文字並びを抽出して内容番地記憶構
造の表として書き込んでおき、この書き込みの際には、
予め特定の文書に関し、文字ｉと連接する連接文字ｊの
統計的出現情報を得て、単位文書あたりに出現する文字
ｉと文字ｊの連接文字の組み合わせ数を予測して文字ｉ
毎の連接種類数に応じて確保した第１の領域に文字ｉに
連接する文字種ｊを前記記憶ファイル装置の位置情報と
ともに記録する書き込み、また文字連接の文字種類によ
らない別の第２の文字連接初期予約領域を確保してお
き、もし最初に文字ｉについて確保されていた第１の初
期予約領域を越えてしまった場合はこの領域を廃棄ない
しは移動して、第２の記録領域に文字種ｉに対応した予
測記録領域を新たに確保して文字ｉに連接する文字種ｊ
を前記記憶ファイル装置の位置情報とともに記録する。
この時第１の文字連接初期予約領域に文字連接が書き込
まれているのか、また第２の初期予約領域の特定位置に
文字連接が書き込まれているのかは、例えば第１の初期
予約領域に識別符号を書いておき、最初に第１の初期予
約領域をまず見るようにして知る。In order to solve the above problems, a content address storage retrieval method of the present invention reads document data stored in a storage device and assigns an identification name for each appropriate document amount, Also, the contents of the document data are examined and the character
Is extracted and written as a table of the content address storage structure. At the time of this writing,
For a specific document, the statistical appearance information of the concatenated character j that is concatenated with the character i is obtained in advance, and the number of combinations of the concatenated characters of the character i and the character j that appear per unit document is predicted to predict the character i.
A second character that does not depend on the character type of the character concatenation and the character type j that is concatenated to the character i is recorded together with the position information of the storage file device in the first area secured according to the number of concatenation types for each The concatenated initial reserved area is reserved, and if it exceeds the first reserved area reserved for the character i at the beginning, this area is discarded or moved and the character type i is stored in the second recording area. Character type j that is connected to the character i by newly securing a predicted recording area corresponding to
Is recorded together with the position information of the storage file device.
At this time, whether the character concatenation is written in the first character concatenated initial reserved area or whether the character concatenated is written in a specific position of the second initial concatenated area is identified, for example, in the first initial reserved area. The code is written and the first initial reserved area is first seen to find out.

【０００６】次に検索要求者から出された検索文字列
（例えば検索単語）とそれらの検索論理式を受け付け、
それら検索文字列を調べて文字並びの文字ｉを先頭と
し、引き続く文字ｊがどのようなものであるかを知って
先ほどの第１の初期予約領域、次にもし識別符号が存在
すればポインタに従って第２の初期予約領域を読んで、
文字ｊが含まれて検索文字列の連接文字並びがすべて含
まれているもの（ＡＮＤ）の位置情報（識別名称）を持
つ文書ファイルを検索候補として選び、記憶ファイル中
の実際の記録場所を得て全文検索する。Next, a search character string (for example, a search word) issued by the search requester and their search logical expressions are accepted,
By checking those search character strings and starting with the character i in the character sequence and knowing what the succeeding character j is, the first initial reserved area as described above, and then if the identification code exists, follow the pointer. Read the second initial reserved area,
A document file having position information (identification name) of the character string (AND) including all the concatenated character sequences of the search character string is selected as a search candidate, and the actual recording location in the storage file is obtained. Search the full text.

【０００７】本発明は、基本的に先行技術である全文検
索用テキストサーチマシンと同様に内容番地記憶による
連想検索方式を採用したものであるが（表と記述）、本
発明においてはこの連想表を構築していく際にディスク
ファイル中の連続性を考慮し、文書ファイル本文の増加
に対応して連想表の記録領域を確保することが可能な構
成となっている。The present invention basically employs an associative search method by content address storage similar to the prior art text search machine for full text search (table and description), but in the present invention, this associative table is used. When constructing the, the continuity in the disk file is taken into consideration, and the recording area of the associative table can be secured in response to the increase in the text of the document file.

【０００８】[0008]

【作用】本発明によれば上記のように予め記録文書デー
タの文字ｉをエントリとする文字連接の連想表を作って
おき、データメモリ回路、文字検索回路とを具備して、
ホストコンピューターからの要求を受け付けて記憶ファ
イル装置から読み出された文書データが、この表を引い
て検索文字列と同一の文字並びがあるかどうか調べ、も
しあったなら文書記録場所に対応する識別名称を得、記
憶ファイルの記録場所のディレクトリエントリを得る。
この時の文字連接の連想表は記憶ファイル内で少なくと
もディレクトリエントリ毎に可能な限り連続配置させる
ことができ、記憶ファイル装置のアクセスが少なくな
り、検索の高速化が可能となる。According to the present invention, as described above, an associative table of character associative data in which the character i of the recorded document data is used as an entry is prepared in advance, and the data memory circuit and the character search circuit are provided.
The document data read from the storage file device in response to the request from the host computer is checked to see if it has the same character sequence as the search character string by looking up this table, and if there is, an identification corresponding to the document recording location. Get the name and get the directory entry of the storage location of the storage file.
At this time, the associative table of character concatenation can be arranged continuously in the storage file for at least each directory entry, access to the storage file device can be reduced, and search speed can be increased.

【０００９】[0009]

【実施例】以下本発明の実施例を図面を用いて詳細に説
明する。Embodiments of the present invention will be described in detail below with reference to the drawings.

【００１０】図１は本発明の内容番地記憶検索装置の構
成図である。図１において、１は大量の文書データを蓄
えておく２次記憶装置であり、２は文字列の一致を検出
するデータマッチング回路であり、３はホストコンピュ
ータである。FIG. 1 is a block diagram of a content address storage / retrieval device of the present invention. In FIG. 1, 1 is a secondary storage device for storing a large amount of document data, 2 is a data matching circuit for detecting matching of character strings, and 3 is a host computer.

【００１１】図２は前記２次記憶装置のデータ記憶内容
であり、４は予め文字ｉと連接する連接文字ｊの特定の
文書に関する出現頻度予測した文字連接出現頻度情報の
記録領域、５は単位文書あたりに出現する文字ｉと文字
ｊの連接文字の組み合わせの連接種類数に対応した大き
さの文字ｉ個分の領域を初期設定する第１の文字連接初
期予約領域、６は第２の文字連接初期予約領域、また７
は第３の初期予約領域である。第３の初期予約領域には
検索対象となる文書が格納される。FIG. 2 is a data storage content of the secondary storage device, where 4 is a recording area of character concatenated appearance frequency information in which the appearance frequency of a concatenated character j that is concatenated with a character i in advance with respect to a particular document is predicted, A first character concatenation initial reserved area for initializing an area for i characters of a size corresponding to the number of concatenation types of concatenation of the concatenated characters of characters i and j appearing per document, 6 is a second character Connection initial reservation area, again 7
Is a third initial reserved area. A document to be searched is stored in the third initial reserved area.

【００１２】図３は連接情報の書き込み手順を示すフロ
ーチャートである。以上の様に構成された検索装置にお
いて、図１、図２、図３を用いて動作を説明する。ホス
トコンピュータ３の要求に従い第３の初期予約領域７か
ら文書データを読みだし、２文字（ｉ，ｊ）の連接が対
応する文書ファイルに存在するか調べ分類する。この時
同一の連接があった場合には重複しているとして削除す
る。第１の文字連接初期予約領域の先頭に書き込まれて
いるフラグ情報を読み取り、もし２文字（ｉ，ｊ）の連
接が第１の文字連接初期予約領域５に書き込む余裕があ
ると判定されるならこれを書き込み、もしそうでないな
らポインタ情報を読み取り、第２の文字連接初期予約領
域６に文字ｊと文書ファイル番号を書き込む。もし第２
の文字連接予約領域に未だ文字ｉに対応する領域が確保
されていないなら、現時点での出現頻度から計算して領
域を確保するととも第１の領域の対応するデータをコピ
ーし、フラグ情報を変更して新たに作成した領域のポイ
ンタを書き込む。FIG. 3 is a flow chart showing the procedure for writing the connection information. The operation of the search device configured as described above will be described with reference to FIGS. 1, 2, and 3. According to the request from the host computer 3, the document data is read from the third initial reserved area 7, and it is checked whether there is a concatenation of two characters (i, j) in the corresponding document file and the document file is classified. At this time, if the same connection is made, it is deleted as a duplicate. If the flag information written at the beginning of the first character concatenated initial reserved area is read and it is determined that the concatenation of two characters (i, j) can be written in the first character concatenated initial reserved area 5, Write this, and if not, read the pointer information and write the character j and the document file number in the second character concatenated initial reserved area 6. If the second
If the area corresponding to the character i is not yet secured in the character concatenation reserved area of, the area is secured by calculating from the appearance frequency at the current time, and the corresponding data in the first area is copied and the flag information is changed. Then, the pointer of the newly created area is written.

【００１３】検索時にはホストコンピュータ３の検索文
字列の検索要求に従い検索文字列の文字連接を一組以上
得て、２次記憶装置に格納されている各々の先頭文字ｉ
に対応する第１の文字連接初期予約領域先頭のフラグ情
報を読み取り、このまま初期予約領域のデータを読み取
るべきか、あるいは第２の文字連接初期予約領域のデー
タを読み取るべきかを決定する。文字ｉをエントリとす
る格納内容の中にもし文字ｊが含まれているならば対応
する文書番号を読み取った後、対応するディレクトリエ
ントリの第３の文書ファイルを読み取り、データマッチ
ング回路２で真に検索文字列が含まれているかをしら
べ、対応する文をユーザに提示する。At the time of search, one or more sets of character concatenations of the search character string are obtained in accordance with the search request for the search character string from the host computer 3, and each leading character i stored in the secondary storage device is acquired.
The flag information at the beginning of the first character concatenated initial reserved area corresponding to is read, and it is determined whether the data of the initial reserved area should be read as it is or the data of the second character concatenated initial reserved area should be read. If the character j is included in the stored contents having the character i as an entry, the corresponding document number is read, then the third document file of the corresponding directory entry is read, and the data matching circuit 2 makes a true determination. Check whether the search string is included and present the corresponding sentence to the user.

【００１４】次に別の実施例を図と共に示す。図４は第
１の文字連接初期予約領域５を非常に少なく確保したも
のであり、２次記憶装置１内に確保されると共にホスト
コンピュータ３の主記憶にコピーされる。このことによ
り、文字連接データ書き込み時は主記憶を主にアクセス
し、主記憶が溢れた時点で２次記憶装置の領域に転送さ
れる。Next, another embodiment will be described with reference to the drawings. In FIG. 4, the first character concatenation initial reserved area 5 is reserved in a very small amount, which is reserved in the secondary storage device 1 and is copied to the main memory of the host computer 3. As a result, the main memory is mainly accessed when writing the character concatenated data, and when the main memory overflows, the data is transferred to the area of the secondary storage device.

【００１５】図５は２次記憶装置１内に確保され、ホス
トコンピュータ３の主記憶にコピーされるセクター使用
状況管理のフラグ情報を図示したものである。これは予
め領域を確保していない文字連接記録領域に連続的に領
域確保するため、物理的に連続領域が確保できるかどう
かをこの管理情報を読み取って調べ確保するのである。FIG. 5 shows flag information of sector usage status management which is secured in the secondary storage device 1 and is copied to the main memory of the host computer 3. This is because the area is continuously secured in the character concatenated recording area where the area is not secured in advance, and therefore, whether or not the continuous area can be physically secured is read and secured by reading this management information.

【００１６】次に別の実施例を図６と共に示す。図６に
おいて８は統計データにより予め高頻度に発生すること
が予想される文字連接の先頭文字に対応する領域と特定
の文字によらない予備領域である。Next, another embodiment will be shown with reference to FIG. In FIG. 6, reference numeral 8 denotes an area corresponding to the first character of the character concatenation that is expected to occur in advance frequently with statistical data, and a spare area that does not depend on a specific character.

【００１７】また９は文書データ領域であり、５の第１
の文字連接初期予約領域にも予め統計データにより高頻
度に発生すると予測される文字連接に対しては、フラグ
情報と対応する第２の文字連接初期予約領域へのポイン
タを書きこんでおく。Reference numeral 9 is a document data area, which is the first of the five.
For the character concatenation that is predicted to occur frequently in advance in the character concatenation initial reserved area according to the statistical data, a pointer to the second character concatenation initial reserved area corresponding to the flag information is written.

【００１８】次に別の実施例を図７と図８と共に示す。
図７では２次記憶に記録されている文書の概略位置を与
える文書番号と実際の文書ファイルの記憶領域について
示している。文書ファイルはそれぞれ大きさが異なるわ
けだが、いま仮想的な文書番号を考える。ここで１セク
ターを８ＫＢとし、ファイルシステムが２セクタの１６
ＫＢで１クラスタ単位でファイルを管理しているとする
と、文書Ａ、文書Ｂ、文書Ｃ、文書Ｄはすべて１クラス
タ以内なので、あまった領域を含めて連続に配置されて
いるとするとこれらをまとめ６４ＫＢ毎に一つの文書番
号ｍを付与する。また大きな文書ファイルＥについては
分割して複数の文書番号ｎとｎ＋１を付与する。Next, another embodiment will be described with reference to FIGS.
FIG. 7 shows the document number that gives the approximate position of the document recorded in the secondary storage and the storage area of the actual document file. Document files have different sizes, but now consider a virtual document number. Here, 1 sector is set to 8 KB and the file system is 2 sectors 16
Assuming that files are managed in units of 1 cluster in KB, document A, document B, document C, and document D are all within 1 cluster, so if they are arranged consecutively including the redundant area, these are summarized. One document number m is given for every 64 KB. A large document file E is divided and a plurality of document numbers n and n + 1 are given.

【００１９】図８はこの文書番号とファイルシステムと
の管理テーブルで、２次記憶の文書記録位置の概略番地
を示す文書番号は、固定領域にて詳細情報へのポインタ
と分割や連結等の情報を示すフラグと共に記録されてい
る。連想表に連接文字ｊと共に記録されている文書番号
を得たならば、簡単な計算で詳細情報の先頭記録番地を
得ることができ、実際の文書ファイルはファイル名等を
参照して具体的に得ることができる。FIG. 8 is a management table of this document number and file system. The document number indicating the approximate address of the document recording position in the secondary storage is a pointer to detailed information in a fixed area and information such as division and connection. Is recorded together with a flag indicating. If the document number recorded in the associative table together with the concatenated character j is obtained, the head recording address of the detailed information can be obtained by a simple calculation. Obtainable.

【００２０】次に別の実施例を図９と共に示す。図９は
文書番号とファイルシステムとの管理テーブルであり、
１０は「文書番号２」を付与されたファイルシステムの
ディレクトリ番号である。このディレクトリ番号に通常
ありえないＦＦＦＦＨの値を書き込んでいる。これはも
しある文書ファイルをファイルシステムで削除してしま
うと、虫食いのように空いた領域にまた追加した文書が
記録されてしまい、ファイルの連続性が保てなくなるた
めである。ここでは実際のファイルは消去せず、管理テ
ーブル上のみで削除している。Next, another embodiment will be shown with FIG. FIG. 9 is a management table of document numbers and file systems.
Reference numeral 10 is a directory number of the file system to which "document number 2" is added. A value of FFFFH which is not normally possible is written in this directory number. This is because if a certain document file is deleted by the file system, the added document will be recorded in an empty area like a worm eating, and the continuity of the file cannot be maintained. Here, the actual file is not deleted, but deleted only on the management table.

【００２１】なお本発明は上記実施例に限定されるもの
ではなく、本発明の主旨に基づいて種々の変形が可能で
あり、これらを本発明の範囲から排除するものではな
い。The present invention is not limited to the above embodiments, and various modifications can be made based on the gist of the present invention, and these modifications are not excluded from the scope of the present invention.

【００２２】[0022]

【発明の効果】以上、詳細に説明したように、本発明に
よれば次のような効果を得ることができる。As described above in detail, according to the present invention, the following effects can be obtained.

【００２３】(1)光ディスクや磁気ディスクに代表され
る２次記憶ではアクセス時間が最も問題になるが、ファ
イルシステムは通常物理的な連続性まで考慮していな
い。(1) Although the access time is the most problematic in the secondary storage represented by the optical disk and the magnetic disk, the file system usually does not consider the physical continuity.

【００２４】このため特に最初に領域を確保した状態で
セクタの連続性を確保することができ、この後、特に専
用のファイル管理を行うことにより内容番地記憶による
高速検索に最も適したファイル構造を作りあげることが
できる。Therefore, it is possible to secure the continuity of the sector particularly in the state where the area is secured first. After that, the file structure most suitable for the high speed search by the content address storage is provided by performing the dedicated file management. Can be made up.

【００２５】(2)通常のファイルシステムと検索の為の
管理が自然に行うことが出来、簡単かつ効率のよい検索
が可能になる。(2) A normal file system and management for retrieval can be naturally performed, and retrieval can be performed easily and efficiently.

[Brief description of drawings]

【図１】本発明の一実施例における内容番地記憶検索装
置の構成図FIG. 1 is a configuration diagram of a content address storage / retrieval device according to an embodiment of the present invention.

【図２】本発明の一実施例における２次記憶装置のデー
タ記憶内容を示す図FIG. 2 is a diagram showing data storage contents of a secondary storage device according to an embodiment of the present invention.

【図３】本発明の一実施例における連接情報の書き込み
手順を示すフロー図FIG. 3 is a flow chart showing a procedure for writing connection information according to an embodiment of the present invention.

【図４】本発明の別の一実施例における２次記憶装置の
データ記憶内容を示す図FIG. 4 is a diagram showing data storage contents of a secondary storage device according to another embodiment of the present invention.

【図５】本発明の別の一実施例における記憶ファイルの
使用状況管理のフラグ情報内容を示す図FIG. 5 is a diagram showing flag information contents for use status management of a storage file in another embodiment of the present invention.

【図６】本発明の別の一実施例における２次記憶装置の
データ記憶内容を示す図FIG. 6 is a diagram showing data storage contents of a secondary storage device according to another embodiment of the present invention.

【図７】本発明の別の一実施例における文書番号と文書
ファイルの分割・結合関係を示す図FIG. 7 is a diagram showing a document number / document file division / combination relationship in another embodiment of the present invention.

【図８】本発明の別の一実施例における文書番号と文書
ファイルの管理関係を示す図FIG. 8 is a diagram showing a document number and document file management relationship in another embodiment of the present invention.

【図９】本発明の別の一実施例における文書番号と文書
ファイルの管理関係を示す図FIG. 9 is a diagram showing a document number and document file management relationship in another embodiment of the present invention.

[Explanation of symbols]

１２次記憶装置２データマッチング回路３ホストコンピュータ４文字連接出現頻度情報の記録領域５第１の文字連接初期予約領域６第２の文字連接初期予約領域７第３の初期予約領域８第２の文字連接初期予約領域９文書データ領域１０削除対象となっているファイルシステムのディレ
クトリ番号1 Secondary Storage Device 2 Data Matching Circuit 3 Host Computer 4 Recording Area for Character Concatenation Appearance Frequency Information 5 First Character Concatenation Initial Reserved Area 6 Second Character Concatenation Initial Reserved Area 7 Third Initial Reserved Area 8 Second Character concatenation initial reserved area 9 Document data area 10 Directory number of file system to be deleted

Claims

[Claims]

1. A document data stored in a storage device is read, an identification name is given to each appropriate document amount, the contents of the document data are examined, and a character sequence of a character j connected to a character i is extracted. Are written as a table of the content address storage structure, and at the time of writing, the statistical appearance information of the concatenated character j that is concatenated with the character i is obtained in advance for a specific document, and the character i that appears per unit document is acquired. A writing to record the character type j connected to the character i in the first area secured according to the number of connected types for each character i by predicting the number of connected characters of the character j and the position information of the storage file device;
Another second character concatenation initial reserved area that does not depend on the character type of the character concatenation is reserved, and if the first initial reserved area reserved for the character i is exceeded, this area is discarded or Move to the second recording area and type i
Whether a character concatenation j is concatenated with the character i is recorded together with the position information of the storage file device and whether the character concatenation is written in the first character concatenation initial reserved area. Whether the character concatenation is written in a specific position of the second initial reserved area is determined by, for example, writing an identification code in the first initial reserved area, first reading the first initial reserved area, and then searching for it. Accepts a search character string (for example, a search word) issued by a user and their search logical expressions, examines the search character strings, and starts with the character i in the character sequence and determines what the subsequent character j is. Knowing that the first initial reserved area, and then the second initial reserved area if the identification code is present, is read according to the pointer, the character j is included and the concatenated character sequence of the search character string is included. Are Of select a document file having position information of (the AND) (identification name) as the search candidate, data storage reproduction method of searching the full text to obtain the actual recording position in the storage file.

2. A storage device for storing data such as a document, a data matching circuit means, and appearance frequency prediction information relating to a specific document of a concatenated character j that is concatenated with a character i in advance,
A first character concatenation initial reserved area for initializing an area for i characters of a size corresponding to the number of concatenation types of concatenated characters of the character i and the character j appearing per unit document in the storage device A first reserving means for reserving; a second reserving means for reserving a second character connection initial reserved area which is not initially divided by the character type in the storage device; and the first character connection Identification means for identifying whether character concatenation is written in the initial reserved area and whether character concatenation is written in the second initial reserved area, and a third initial reservation for recording the target document in the storage device. It comprises a third reserving means for reserving an area and a control device for controlling storage, retrieval and reproduction of data. The control device controls the characters i and j of the target document recorded in the third initial reserved area. Examine the existence of articulation The character type j concatenated to the character i is written in the area reserved in the character concatenated reserved area by the first securing means together with the position information of the storage device, and the character i of the character i secured by the first securing means is written. If it exceeds the initial recording area, the initial reserved area is discarded, and the appearance prediction information is calculated and corrected in the area secured by the second recording area initial securing means to create a predictive recording area corresponding to the character type i. The character type j newly secured and connected to the character i is recorded together with the position information of the storage file device, and at the time of search, after the connection between the constituent characters i and j of the search word is obtained, the i is determined by the identifying means. It is characterized by detecting whether it is in the area 1 or in the second area and detecting the approximate position of the reading target document for the position information where the concatenated character j exists, and then reading and searching the target document. Recording and reproducing apparatus.

3. The control device secures the first character concatenation initial reserved area in the storage device, copies it in another memory when the concatenation data is created, and uses the data in the memory to concatenate the concatenation data. 3. The method according to claim 2, wherein
The data storage and reproduction device described.

4. A storage device for storing data such as a document, a first character concatenation initial reserved area securing means for initializing an area for a character i in the storage device, and a character type of character concatenation in the storage device. The second character concatenation initial reserved area securing means and the sector usage status managing means for managing the usage status of the second character concatenated initial reserved area are provided. When the character concatenation is to be recorded beyond the reserved initial recording area of the character i, the initial reserved area is discarded, and the use status management means examines the continuous free area to determine the second recording area initial securing means. A data storage / reproduction device characterized in that a predictive recording area corresponding to a character type i is newly secured in the area secured by and the character type j connected to the character i is recorded together with the position information of the storage device.

5. A storage device for storing data such as a document, a data matching circuit means, a first storage table, a first storage buffer, and the first storage table divided according to a character type i. Then, when the character type i is given, means for detecting the address of each corresponding divided area and reading / writing data, and whether or not there is room to newly write data in each divided area of the first storage table Means for writing the data occupation identification information or the value of the data amount itself as the data occupation information to a fixed address in each of the divided areas, and preset as initial information, or from the recording target document in the storage device. The identification information of the high-frequency character concatenation, which is set from the extracted character or the occurrence frequency of the character concatenation, is written in a specific address of each of the divided areas, and the Means for reserving an area in the storage buffer and providing a spare area, and if the identification information of the high-frequency character concatenation does not indicate a high frequency, the data occupancy information can be read to correspond to each of the divided areas if writable. Data is added to the address to be read, and when the identification information of the high-frequency character concatenation indicates high frequency, the link address information is read and stored as data in the first storage buffer or the storage file device. Means for writing a set of the other character of the character concatenation and character concatenation general position information; and, if the data occupation identification information indicates that there is no room for writing data, the data in each divided area is stored in the first memory. The link address information of the corresponding destination is written to a fixed address in each of the divided areas, indicating that the buffer or the storage file device has been moved. And reading the document data from the storage file device, checking the character concatenation, detecting the divided area address of the first storage table from one character of the character concatenation, and reading the identification information of the data amount occupation status. If the first storage table is writable, the set of the other character of the character concatenation and the character concatenation approximate position information is written as data, and if the first storage table is not writable, the link address information is read. And a means for writing the set of the other character of the character concatenation and the character concatenation general position information as data to the first storage buffer or the storage file device, and when one of the character concatenation is input when a search word is input. According to the data occupation identification information of the divided area address corresponding to the character,
Go to the first storage table, the first storage buffer, or the storage file device to read the actual recording area of the other character of the character concatenation of the target document, and read the character that matches the other character of the search word. A data storage / reproduction device characterized in that a target document is narrowed down by character concatenation approximate position information of the target document obtained by checking whether there is any.

6. A storage device for storing data such as a document, a data matching circuit means, and a document for managing document data written in the storage device in a directory for a minimum number of recording units (clusters) in a directory. Area management means, management number assigning means for assigning a number to each predetermined P cluster unit, which is P times as large as the cluster, and corresponding to a rough recording address in the storage file device, and the document data is larger than the P cluster unit. For example, if a plurality of management numbers are given and divided and managed, and if it is smaller than a P cluster unit, the same management number as a plurality of documents is given and concatenated, and identification information including division or concatenation with the general number as an address; And a management table having a pointer to a detailed recording area by the document area management means to which a management number is assigned. Data storage reproduction apparatus characterized by managing the document data of the storage device in a schematic address.

7. A storage device, a data matching circuit means, a document area management means for managing a directory of document data written in the storage device in units of a minimum number of recording units (clusters), and a cluster for managing the clusters. A management number assigning unit that assigns a number for each P cluster unit that is predetermined P times to correspond to a rough recording address in the storage file device, and a plurality of management numbers if the document data is larger than the P cluster unit If the number is smaller than the P cluster unit, the same management number as that of a plurality of documents is given to perform connection management, and identification information including division or connection with the general number as an address and the management number is given. A management table containing the pointer to the detailed recording area by the document area management means, and the document data in the storage device are roughly In the case of deleting a specific document file of the document data, the corresponding directory number of the detailed recording area pointed to by the pointer is changed to a number capable of identifying nonexistence. A data recording / reproducing device characterized by management.