JP2012141830A

JP2012141830A - Index generation device and method

Info

Publication number: JP2012141830A
Application number: JP2010294595A
Authority: JP
Inventors: Naohito Katsumata; 尚人勝又
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2010-12-29
Filing date: 2010-12-29
Publication date: 2012-07-26
Anticipated expiration: 2030-12-29
Also published as: JP5323806B2

Abstract

PROBLEM TO BE SOLVED: To provide an index generation device and a method for generating information for partially decompressing a compressed file in order to read a necessary record from the compressed file.SOLUTION: An index generation device 10 successively transfers a plurality of records composing an uncompressed file 51 to a file compression device 60, and applies and successively stores the indexes of the records to be transferred in an index file 31. Then, the index generation device 10 monitors a compression processing state in the file compression device 60, and acquires the block size of one compressed block from the file compression device 60 in response to the detection of the end of the compression processing of one block, and calculates the block position of one compressed block from the beginning of a compressed file 61 on the basis of the acquired block size of one compressed block, and further stores the block position in the index file 31.

Description

本発明は、インデックス生成装置及び方法に関する。 The present invention relates to an index generation apparatus and method.

従来、ファイルシステムは、ファイルを保存する際に、保存するデータ量を削減するために、ファイルを圧縮して保存することがある。保存された圧縮後ファイルは、解凍されることによって、元のファイルに復元される。ここで、圧縮処理は、ファイルの全体を一括して処理するので、通常、圧縮後ファイルの一部だけを解凍することはできない。 Conventionally, when a file system stores a file, the file system may be compressed and stored in order to reduce the amount of data to be stored. The saved post-compression file is decompressed and restored to the original file. Here, since the compression process processes the entire file at once, it is usually not possible to decompress only part of the compressed file.

このような圧縮及び解凍処理において、ファイルの一部を変更した場合、変更前の原ファイルの圧縮ファイルと、変更した部分のみを圧縮した圧縮ファイルとを保存し、解凍の際に変更前の原ファイルと組み合わせる技術を開示する特許文献１が知られている。 In such compression and decompression processing, when a part of the file is changed, the compressed file of the original file before the change and the compressed file obtained by compressing only the changed part are saved, and the original file before the change is stored at the time of decompression. Patent Document 1 that discloses a technique for combining with a file is known.

特許文献１に開示された技術において、制御部は、原ファイルを圧縮した圧縮ファイルを圧縮ファイル格納領域に保持し、原ファイルが変更された際に、その変更部分を示すポインタをポインタ格納領域に保持し、原ファイルの変更部分を部分圧縮した変更圧縮ファイルを変更圧縮ファイル格納領域に保持する。そして、制御部は、変更前のファイルが必要な場合、圧縮／伸張処理部を通じて圧縮ファイル格納領域の圧縮ファイルを伸張して変更前のファイルを復元し、変更後のファイルが必要な場合、圧縮／伸張処理部を通じて変更圧縮ファイル格納領域の変更圧縮ファイルを伸張し、そのファイルをポインタ格納領域のポインタに従って変更前のファイルに組み込み、変更後のファイルを復元する。 In the technique disclosed in Patent Document 1, the control unit holds a compressed file obtained by compressing an original file in a compressed file storage area, and when the original file is changed, a pointer indicating the changed part is stored in the pointer storage area. Hold the changed compressed file obtained by partially compressing the changed portion of the original file, and hold it in the changed compressed file storage area. Then, the control unit decompresses the compressed file in the compressed file storage area through the compression / decompression processing unit when the file before the change is necessary, and restores the file before the change, and compresses the file after the change is necessary. / The modified compressed file in the modified compressed file storage area is decompressed through the decompression processing unit, the file is incorporated into the file before modification according to the pointer in the pointer storage area, and the modified file is restored.

特開平０７−１４１２３４号公報JP 07-141234 A

しかしながら、特許文献１に開示された技術は、変更後のファイルを変更前のファイルに戻す場合に備えて、変更前のファイルの圧縮後ファイルと、変更した部分のみの圧縮後ファイルとを保持しているだけである。すなわち、特許文献１に開示された技術であっても、圧縮後ファイルから必要なデータを取得するためには、圧縮後ファイルを全て解凍する必要がある。 However, the technique disclosed in Patent Literature 1 retains a compressed file of a file before the change and a compressed file of only the changed portion in preparation for returning the changed file to the file before the change. It ’s just that. That is, even with the technique disclosed in Patent Document 1, in order to acquire necessary data from the compressed file, it is necessary to decompress all the compressed files.

そこで、圧縮後ファイルから必要なレコードを読み出すために、圧縮後ファイルを部分的に解凍するための情報を生成する装置及び方法が求められている。 Therefore, there is a need for an apparatus and a method for generating information for partially decompressing a compressed file in order to read a necessary record from the compressed file.

本発明は、圧縮後ファイルから必要なレコードを読み出すために、圧縮後ファイルを部分的に解凍するための情報を生成するインデックス生成装置及び方法を提供することを目的とする。 An object of the present invention is to provide an index generation apparatus and method for generating information for partially decompressing a compressed file in order to read a necessary record from the compressed file.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）ファイルをブロックごとに圧縮・解凍するファイル圧縮装置と通信可能なインデックス生成装置であって、圧縮対象のファイルを構成する複数のレコードを前記ファイル圧縮装置に順次引き渡すレコード引渡手段と、前記引き渡すレコードのインデックスを付与しインデックス記憶手段に順次記憶するレコード情報取得手段と、前記ファイル圧縮装置における圧縮処理状態を監視し、１ブロックの圧縮処理の終了を検知する圧縮処理状態監視手段と、前記圧縮処理状態監視手段が前記１ブロックの圧縮処理の終了を検知したことに応じて、圧縮後の前記１ブロックに係る情報を前記ファイル圧縮装置から取得して、取得した圧縮後の前記１ブロックに係る情報に基づいて、圧縮後の前記１ブロックを特定するための情報を生成して前記インデックス記憶手段にさらに記憶するブロック情報取得手段と、を備えるインデックス生成装置。 (1) An index generation device that can communicate with a file compression device that compresses and decompresses a file block by block, and sequentially delivers a plurality of records constituting a file to be compressed to the file compression device; A record information acquisition unit that assigns an index of a record to be delivered and sequentially stores the index in the index storage unit; a compression processing state monitoring unit that monitors a compression processing state in the file compression device and detects the end of one block compression processing; In response to the fact that the compression processing state monitoring unit detects the end of the compression processing of the one block, information related to the one block after compression is acquired from the file compression apparatus, and the acquired one block after compression is acquired. Based on such information, information for specifying the one block after compression is generated and Index generation apparatus and a block information obtaining means further stores the index storage unit.

（１）の構成によれば、本発明に係るインデックス生成装置は、圧縮対象のファイルを構成する複数のレコードをファイル圧縮装置に順次引き渡し、引き渡すレコードのインデックスを付与しインデックス記憶手段に順次記憶する。次に、インデックス生成装置は、ファイル圧縮装置における圧縮処理状態を監視し、１ブロックの圧縮処理の終了を検知したことに応じて、圧縮後の１ブロックに係る情報をファイル圧縮装置から取得して、取得した圧縮後の１ブロックに係る情報に基づいて、圧縮後の１ブロックを特定するための情報を生成してインデックス記憶手段にさらに記憶する。 According to the configuration of (1), the index generation device according to the present invention sequentially delivers a plurality of records constituting a file to be compressed to the file compression device, assigns an index of the records to be delivered, and sequentially stores them in the index storage means. . Next, the index generation device monitors the compression processing state in the file compression device, and acquires information on one block after compression from the file compression device in response to detecting the end of the compression processing of one block. Based on the acquired information relating to one block after compression, information for specifying one block after compression is generated and further stored in the index storage means.

すなわち、本発明に係るインデックス生成装置は、ファイル圧縮装置に圧縮対象のファイルをレコードごとに引き渡し、引き渡すレコードのインデックスをインデックス記憶手段に順次記憶し、ファイル圧縮装置から、圧縮後の１ブロックに係る情報を取得し、圧縮後の１ブロックを特定するための情報を生成してインデックス記憶手段にさらに記憶する。したがって、本発明に係るインデックス生成装置は、圧縮後ファイルから必要なレコードを読み出すために、圧縮のために引き渡したレコードのインデックスを順次記憶し、圧縮後の１ブロックを特定するための情報をさらに記憶するので、圧縮後ファイルを部分的に解凍するための情報を生成することができる。 That is, the index generation device according to the present invention delivers a file to be compressed to a file compression device for each record, sequentially stores the index of the delivered record in an index storage unit, and relates to one block after compression from the file compression device. Information is acquired, information for specifying one block after compression is generated, and further stored in the index storage means. Therefore, the index generation device according to the present invention sequentially stores the index of the record delivered for compression in order to read out the necessary record from the compressed file, and further includes information for specifying one block after compression. Since it is stored, information for partially decompressing the compressed file can be generated.

（２）前記レコード情報取得手段は、前記引き渡すレコードのレコードサイズに基づいて、前記引き渡すレコードを含む圧縮前ファイルのブロックであって前記引き渡すレコードを含む圧縮後のブロックに対応する圧縮前ファイルのブロックの先頭からの、前記引き渡すレコードのオフセットを算出して、レコードサイズと共に前記引き渡すレコードに関連付けて前記インデックス記憶手段にさらに記憶する、（１）に記載のインデックス生成装置。 (2) Based on the record size of the record to be transferred, the record information acquisition unit is a block of the file before compression that is a block of the file before compression that includes the record to be transferred and that includes the record to be transferred. The index generation device according to (1), wherein an offset of the record to be transferred from the head is calculated and further stored in the index storage unit in association with the record to be transferred together with a record size.

（２）の構成によれば、（２）に係るインデックス生成装置は、（１）において、引き渡すレコードを含む圧縮前ファイルのブロックであって引き渡すレコードを含む圧縮後のブロックに対応する圧縮前ファイルのブロックの先頭からの、引き渡すレコードのオフセットを算出して、レコードサイズと共に引き渡すレコードに関連付けてインデックス記憶手段にさらに記憶する。したがって、（２）に係るインデックス生成装置は、圧縮後ファイルから必要なレコードを読み出すために、圧縮のために引き渡すレコードのインデックスを順次記憶し、圧縮後のブロックに対応する圧縮前ファイルのブロックの先頭からのオフセットとレコードサイズとを共に記憶するので、圧縮後ファイルの該当する１ブロックのうち引き渡したレコードまでを解凍するための情報を生成することができる。 According to the configuration of (2), the index generation device according to (2) is the pre-compression file corresponding to the post-compression block including the record to be transferred and including the record to be transferred in (1). The offset of the record to be transferred from the head of the block is calculated, and further stored in the index storage means in association with the record to be transferred together with the record size. Therefore, the index generation device according to (2) sequentially stores the index of the record to be delivered for compression in order to read out the necessary record from the compressed file, and stores the block of the file before compression corresponding to the block after compression. Since both the offset from the head and the record size are stored, it is possible to generate information for decompressing up to the delivered record in one corresponding block of the compressed file.

（３）ファイルをブロックごとに圧縮・解凍するファイル圧縮装置と通信可能なインデックス生成装置が実行する方法であって、圧縮対象のファイルを構成する複数のレコードを前記ファイル圧縮装置に順次引き渡すレコード引渡ステップと、前記引き渡すレコードのインデックスを付与しインデックス記憶手段に順次記憶するレコード情報取得ステップと、前記ファイル圧縮装置における圧縮処理状態を監視し、１ブロックの圧縮処理の終了を検知する圧縮処理状態監視ステップと、前記圧縮処理状態監視ステップが前記１ブロックの圧縮処理の終了を検知したことに応じて、圧縮後の前記１ブロックに係る情報を前記ファイル圧縮装置から取得して、取得した圧縮後の前記１ブロックに係る情報に基づいて、圧縮後の前記１ブロックを特定するための情報を生成して前記インデックス記憶手段にさらに記憶するブロック情報取得ステップと、を備える方法。 (3) A method executed by an index generation device that can communicate with a file compression device that compresses and decompresses a file block by block, and sequentially delivers a plurality of records constituting a file to be compressed to the file compression device. A record information acquisition step for assigning an index of the record to be delivered and sequentially storing the index in the index storage means, and a compression processing state monitor for monitoring a compression processing state in the file compression device and detecting the end of the compression processing of one block And when the compression processing state monitoring step detects the end of the compression processing of the one block, the information related to the one block after compression is acquired from the file compression device, and the acquired post-compression Based on the information related to the one block, the one block after compression is How and a block information obtaining step of further stored in the index storage means to generate information for constant.

したがって、本発明に係る方法は、（１）と同様に、圧縮後ファイルから必要なレコードを読み出すために、圧縮のために引き渡すレコードのインデックスを順次記憶し、圧縮後の１ブロックを特定するための情報をさらに記憶するので、圧縮後ファイルを部分的に解凍するための情報を生成することができる。 Therefore, the method according to the present invention sequentially stores the index of the record delivered for compression in order to read out the necessary record from the compressed file, and specifies one block after compression, as in (1). This information is further stored, so that information for partially decompressing the compressed file can be generated.

本発明によれば、ファイルをブロックごとに圧縮・解凍するファイル圧縮装置と連携して動作し、圧縮後ファイル中のブロックを特定可能な情報を生成して記憶するインデックス生成装置を提供することができる。その結果、ユーザは、圧縮後ファイルから所定のレコードを読み出す際に、解凍すべきブロックを特定することができる。 According to the present invention, it is possible to provide an index generation device that operates in cooperation with a file compression device that compresses and decompresses a file block by block, and generates and stores information that can identify a block in a post-compression file. it can. As a result, the user can specify a block to be decompressed when reading a predetermined record from the compressed file.

本発明の一実施形態に係るインデックス生成装置の機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure of the index production | generation apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係るインデックス生成装置の機能を説明するための説明図である。It is explanatory drawing for demonstrating the function of the index production | generation apparatus which concerns on one Embodiment of this invention. 本発明の実施形態１に係るインデックスファイルの一例を示す図である。It is a figure which shows an example of the index file which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係るインデックス生成装置のインデックス生成処理を示すフローチャートである。It is a flowchart which shows the index production | generation process of the index production | generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係るインデックス生成装置によって生成されたインデックスファイルを利用して検索する検索処理を示すフローチャートである。It is a flowchart which shows the search process which searches using the index file produced | generated by the index production | generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係るインデックスファイルの一例を示す図である。It is a figure which shows an example of the index file which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るインデックス生成装置のインデックス生成処理を示すフローチャートである。It is a flowchart which shows the index production | generation process of the index production | generation apparatus which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るインデックス生成装置によって生成されたインデックスファイルを利用して検索する検索処理を示すフローチャートである。It is a flowchart which shows the search process which searches using the index file produced | generated by the index production | generation apparatus which concerns on Embodiment 2 of this invention.

以下、本発明の実施形態について図を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施形態のインデックス生成装置１０は、コンピュータ及びその周辺装置に適用される。本実施形態における各部は、コンピュータ及びその周辺装置が備えるハードウェア並びに該ハードウェアを制御するソフトウェアによって構成される。 The index generation device 10 of this embodiment is applied to a computer and its peripheral devices. Each unit in the present embodiment is configured by hardware included in a computer and its peripheral devices, and software that controls the hardware.

上記ハードウェアには、制御部としてのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）の他、記憶部、通信装置、表示装置及び入力装置が含まれる。記憶部としては、例えば、メモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ、ＲＯＭ：ＲｅａｄＯｎｌｙＭｅｍｏｒｙ等）、ハードディスクドライブ（ＨＤＤ：ＨａｒｄＤｉｓｋＤｒｉｖｅ）及び光ディスク（ＣＤ：ＣｏｍｐａｃｔＤｉｓｋ、ＤＶＤ：ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ等）ドライブが挙げられる。通信装置としては、例えば、ファイル圧縮装置６０と通信するための各種有線及び無線インターフェース装置が挙げられる。表示装置としては、例えば、液晶ディスプレイやプラズマディスプレイ等の各種ディスプレイが挙げられる。入力装置としては、例えば、キーボード及びポインティング・デバイス（マウス、トラッキングボール等）が挙げられる。 The hardware includes a storage unit, a communication device, a display device, and an input device in addition to a CPU (Central Processing Unit) as a control unit. Examples of the storage unit include a memory (RAM: Random Access Memory, ROM: Read Only Memory, etc.), a hard disk drive (HDD: Hard Disk Drive), and an optical disk (CD: Compact Disc, DVD: Digital Versatile Drive, etc.). It is done. Examples of the communication device include various wired and wireless interface devices for communicating with the file compression device 60. Examples of the display device include various displays such as a liquid crystal display and a plasma display. Examples of the input device include a keyboard and a pointing device (mouse, tracking ball, etc.).

上記ソフトウェアには、上記ハードウェアを制御するコンピュータ・プログラムやデータが含まれる。コンピュータ・プログラムやデータは、記憶部により記憶され、制御部により適宜実行、参照される。また、コンピュータ・プログラムやデータは、通信回線を介して配布されることも可能であり、ＣＤ−ＲＯＭ等のコンピュータ可読媒体に記録して配布されることも可能である。 The software includes a computer program and data for controlling the hardware. The computer program and data are stored in the storage unit, and are appropriately executed and referenced by the control unit. The computer program and data can be distributed via a communication line, or can be recorded on a computer-readable medium such as a CD-ROM and distributed.

［実施形態１］
図１は、本発明の一実施形態に係るインデックス生成装置１０の機能構成を示す機能ブロック図である。図１において、インデックス生成装置１０の説明のためにファイル圧縮装置６０の構成を示す。図２は、本発明の一実施形態に係るインデックス生成装置１０の機能を説明するための説明図である。図３は、本発明の実施形態１に係るインデックスファイル３１の一例を示す図である。図３（１）は、インデックスファイル３１の一例を示し、図３（２）は、圧縮前ファイル５１と、圧縮後ファイル６１と、圧縮後ファイル６１から部分解凍された解凍ブロック５２との一例を示している。 [Embodiment 1]
FIG. 1 is a functional block diagram showing a functional configuration of an index generation device 10 according to an embodiment of the present invention. In FIG. 1, the configuration of the file compression device 60 is shown for explaining the index generation device 10. FIG. 2 is an explanatory diagram for explaining the function of the index generation device 10 according to an embodiment of the present invention. FIG. 3 is a diagram showing an example of the index file 31 according to the first embodiment of the present invention. 3A shows an example of the index file 31, and FIG. 3B shows an example of the pre-compression file 51, the post-compression file 61, and the decompression block 52 partially decompressed from the post-compression file 61. Show.

インデックス生成装置１０は、ファイルをブロックごとに圧縮・解凍するファイル圧縮装置６０と通信可能に接続されている。インデックス生成装置１０は、インデックス記憶手段してのインデックスファイル３１と、レコード引渡手段としてのレコード引渡部１１と、レコード情報取得手段としてのレコード情報取得部１２と、圧縮処理状態監視手段としての圧縮処理状態監視部１３と、ブロック情報取得手段としてのブロック情報取得部１４と、を備えている。以下、図１において、図２及び図３を参照しながら各部ごとに詳述する。 The index generation device 10 is communicably connected to a file compression device 60 that compresses and decompresses a file for each block. The index generation device 10 includes an index file 31 as an index storage unit, a record transfer unit 11 as a record transfer unit, a record information acquisition unit 12 as a record information acquisition unit, and a compression process as a compression processing state monitoring unit A state monitoring unit 13 and a block information acquisition unit 14 as a block information acquisition unit are provided. 1 will be described in detail with reference to FIGS. 2 and 3. FIG.

レコード引渡部１１は、圧縮対象のファイル（圧縮前ファイル５１）を構成する複数のレコードをファイル圧縮装置６０に順次引き渡す。具体的には、レコード引渡部１１は、圧縮前ファイル５１を構成するレコードを、１レコードずつ取得し、順次、ファイル圧縮装置６０に引き渡す。例えば、レコード引渡部１１は、レコード長が可変の場合、レコードの開始コードから終了コードまでを１レコードとして取得し、レコード長が固定の場合、固定長の１レコードを取得し、取得したレコードをファイル圧縮装置６０とのインターフェースに従い、引き渡す。 The record delivery unit 11 sequentially delivers a plurality of records constituting the file to be compressed (pre-compression file 51) to the file compression apparatus 60. Specifically, the record delivery unit 11 acquires records constituting the pre-compression file 51 one record at a time, and sequentially delivers the records to the file compression apparatus 60. For example, when the record length is variable, the record delivery unit 11 acquires the record from the start code to the end code as one record. When the record length is fixed, the record delivery unit 11 acquires one fixed-length record, Delivered according to the interface with the file compression device 60.

ファイル圧縮装置６０において、対象レコード受付部７１は、引き渡された１レコードを、圧縮用バッファ６２に順次、格納する。ファイル情報更新部７２は、ファイル情報テーブル６３を更新する。ファイル情報テーブル６３は、例えば、圧縮を完了するごとに作成される番号であるブロック番号と、引き渡されたレコードを格納する圧縮用バッファ６２の残りサイズであるバッファサイズと、圧縮したブロックのサイズである圧縮後ブロックサイズとをメモリ上に記憶する。バッファサイズ監視部７３は、圧縮用バッファ６２のバッファサイズを監視する。ブロック圧縮部７４は、圧縮用バッファ６２が所定の条件になると、圧縮用バッファ６２内のデータを圧縮し、圧縮後のブロックを順次生成する。さらに、圧縮後ブロックサイズ記録部７５は、圧縮後ブロックサイズをファイル情報テーブル６３に記録する。
すなわち、図２に示すように、圧縮前ファイル５１を構成するレコードが引き渡されて、圧縮用バッファ６２が所定の条件になると、圧縮後ファイル６１の１ブロック（例えば、ブロック６１ｉ）が生成される。 In the file compression device 60, the target record receiving unit 71 sequentially stores the delivered one record in the compression buffer 62. The file information update unit 72 updates the file information table 63. The file information table 63 includes, for example, a block number that is created every time compression is completed, a buffer size that is the remaining size of the compression buffer 62 that stores the delivered record, and a compressed block size. A certain post-compression block size is stored in memory. The buffer size monitoring unit 73 monitors the buffer size of the compression buffer 62. When the compression buffer 62 reaches a predetermined condition, the block compression unit 74 compresses the data in the compression buffer 62 and sequentially generates the compressed blocks. Further, the post-compression block size recording unit 75 records the post-compression block size in the file information table 63.
That is, as shown in FIG. 2, when the record constituting the pre-compression file 51 is delivered and the compression buffer 62 satisfies a predetermined condition, one block (for example, block 61i) of the post-compression file 61 is generated. .

ここで、所定の条件は、圧縮用バッファ６２のバッファサイズが０又は負になった場合である。例えば、引き渡された１レコードを格納し、圧縮用バッファ６２のバッファサイズが０になった場合に、ファイル圧縮装置６０は、１ブロックの圧縮データを生成する。あるいは、引き渡された１レコードを格納し、圧縮用バッファ６２のバッファサイズが負になった場合に、ファイル圧縮装置６０は、１ブロックの圧縮データを生成し、最後に受け付けた１レコードを、クリアした圧縮用バッファ６２に格納する。
さらに、ファイル圧縮装置６０は、圧縮を完了するごとに圧縮後の１ブロックにブロック番号を対応付ける。圧縮後の１ブロックのサイズは、ブロックを構成する最初のデータから最後のデータまでの長さをバイト（８ビット）数で表したものであり、ブロックごとに異なる。 Here, the predetermined condition is when the buffer size of the compression buffer 62 becomes 0 or negative. For example, when one delivered record is stored and the buffer size of the compression buffer 62 becomes 0, the file compression apparatus 60 generates one block of compressed data. Alternatively, when one delivered record is stored and the buffer size of the compression buffer 62 becomes negative, the file compression device 60 generates one block of compressed data and clears the last received record The data is stored in the compression buffer 62.
Furthermore, the file compression apparatus 60 associates a block number with one block after compression every time compression is completed. The size of one block after compression is the length from the first data to the last data constituting the block expressed by the number of bytes (8 bits), and is different for each block.

レコード情報取得部１２は、レコード引渡部１１が引き渡すレコードのインデックスを付与しインデックスファイル３１に順次記憶する。レコードのインデックスは、予め作成されているとしてもよいし、圧縮前ファイル５１のフォーマット情報により取得するとしてもよい。具体的には、レコード情報取得部１２は、図３（１）において「ｋｅｙ１」等で示されるレコードのインデックスを、レコードに対応付けて予め作成されているインデックス情報から取得する。あるいは、レコード情報取得部１２は、読み出したレコードから、レコード中のインデックスの相対位置及び長さによってインデックス（例えば、ログファイルにおける時刻情報）を読み出して取得する。そして、レコード情報取得部１２は、取得したインデックスをインデックスファイル３１に図３（１）のように順次記憶する。 The record information acquisition unit 12 assigns an index of the record delivered by the record delivery unit 11 and sequentially stores it in the index file 31. The index of the record may be created in advance, or may be acquired from the format information of the pre-compression file 51. Specifically, the record information acquisition unit 12 acquires an index of a record indicated by “key1” or the like in FIG. 3A from index information created in advance in association with the record. Alternatively, the record information acquisition unit 12 reads and acquires an index (for example, time information in the log file) from the read record according to the relative position and length of the index in the record. Then, the record information acquisition unit 12 sequentially stores the acquired index in the index file 31 as shown in FIG.

圧縮処理状態監視部１３は、ファイル圧縮装置６０における圧縮処理状態を監視し、１ブロックの圧縮処理の終了を検知する。具体的には、圧縮処理状態監視部１３は、ファイル圧縮装置６０とのインターフェースに従い、ファイル圧縮装置６０における圧縮処理状態を示すファイル情報テーブル６３を監視する。すなわち、圧縮処理状態監視部１３は、例えば、圧縮処理状態を示すファイル情報テーブル６３のブロック番号が増加（＋１）したことによって圧縮処理の終了を検知する。 The compression processing state monitoring unit 13 monitors the compression processing state in the file compression device 60 and detects the end of compression processing for one block. Specifically, the compression processing state monitoring unit 13 monitors the file information table 63 indicating the compression processing state in the file compression device 60 in accordance with the interface with the file compression device 60. That is, the compression processing state monitoring unit 13 detects the end of the compression processing when the block number of the file information table 63 indicating the compression processing state is increased (+1), for example.

圧縮処理状態監視部１３は、バッファサイズから、圧縮処理の終了を検知するとしてもよい。例えば、ファイル圧縮装置６０が引き渡された１レコードを格納すると、圧縮用バッファ６２のバッファサイズは減少し、ファイル圧縮装置６０が圧縮後に圧縮用バッファ６２をクリアすると、バッファサイズは増加する。 The compression process state monitoring unit 13 may detect the end of the compression process from the buffer size. For example, when the file compression device 60 stores the delivered one record, the buffer size of the compression buffer 62 decreases, and when the file compression device 60 clears the compression buffer 62 after compression, the buffer size increases.

ブロック情報取得部１４は、圧縮処理状態監視部１３が１ブロックの圧縮処理の終了を検知したことに応じて、圧縮後の１ブロックに係る情報をファイル圧縮装置６０から取得して、取得した圧縮後の１ブロックに係る情報に基づいて、圧縮後の１ブロックを特定するための情報を生成してインデックスファイル３１にさらに記憶する。 The block information acquisition unit 14 acquires the information related to one block after compression from the file compression device 60 in response to the compression processing state monitoring unit 13 detecting the end of the compression processing of one block, and acquires the compression Based on the information relating to the subsequent one block, information for specifying the compressed one block is generated and further stored in the index file 31.

すなわち、ブロック情報取得部１４は、圧縮後の１ブロックのブロックサイズをファイル圧縮装置６０から取得し、取得した圧縮後の１ブロックのブロックサイズに基づいて、圧縮後ファイル６１の先頭からのブロック位置を算出する。具体的には、ブロック情報取得部１４は、ファイル圧縮装置６０とのインターフェースに従い、圧縮後の１ブロックのブロックサイズを、圧縮処理の終了ごとに取得し、取得したブロックサイズを加算して、圧縮後ファイル６１の先頭からのブロック位置を算出する。図３（２）のｂｌｏｃｋ＿ｐｏｓ＿１〜ｂｌｏｃｋ＿ｐｏｓ＿ｎは、このようにして算出されたブロック位置を示している。 That is, the block information acquisition unit 14 acquires the block size of one block after compression from the file compression apparatus 60, and the block position from the head of the compressed file 61 based on the acquired block size of one block after compression. Is calculated. Specifically, the block information acquisition unit 14 acquires the block size of one block after compression according to the interface with the file compression device 60 at each end of the compression process, adds the acquired block size, and compresses it. The block position from the head of the subsequent file 61 is calculated. Block_pos_1 to block_pos_n in FIG. 3 (2) indicate the block positions calculated in this way.

次に、ブロック情報取得部１４は、算出した圧縮後ファイル６１の先頭からのブロック位置をインデックスファイル３１にさらに記憶する。具体的には、ブロック情報取得部１４は、図３（１）に示すように、インデックス（例えば、ｋｅｙ１）に対応付けて、インデックスに対応するレコードを含むブロック（例えば、ブロック６１ｉ）のブロック位置（例えば、ｂｌｏｃｋ＿ｐｏｓ＿ｉ）をインデックスファイル３１にさらに記憶する。 Next, the block information acquisition unit 14 further stores the calculated block position from the beginning of the compressed file 61 in the index file 31. Specifically, as illustrated in FIG. 3A, the block information acquisition unit 14 associates with the index (for example, key1) and includes the block position of the block (for example, the block 61i) including the record corresponding to the index. (For example, block_pos_i) is further stored in the index file 31.

インデックスファイル３１は、図３（１）に示すように、順次記憶されたインデックス（例えば、ｋｅｙ１）と、区切りコード（例えば、￥ｔ）と、インデックスに対応するレコードを含む圧縮後のブロックのブロック位置（例えば、ｂｌｏｃｋ＿ｐｏｓ＿ｉ）と、から構成される。圧縮後ファイル６１を解凍する場合、検索すべきインデックスに対応するレコードを含む圧縮後のブロックのブロック位置（例えば、図３（２）において「ｋｅｙ１」に対応するレコード１を含む圧縮後のブロック６１ｉのｂｌｏｃｋ＿ｐｏｓ＿ｉ）を指定すると、指定したブロック位置のブロックのみが部分解凍され、解凍ブロック５２が生成される。 As shown in FIG. 3A, the index file 31 is a block of a block after compression including an index (for example, key1), a delimiter code (for example, \ t), and a record corresponding to the index. Position (for example, block_pos_i). When decompressing the post-compression file 61, the block position of the post-compression block including the record corresponding to the index to be searched (for example, the post-compression block 61i including the record 1 corresponding to “key1” in FIG. 3B) Block_pos_i) is specified, only the block at the designated block position is partially decompressed, and the decompressed block 52 is generated.

図４は、本発明の実施形態１に係るインデックス生成装置１０のインデックス生成処理を示すフローチャートである。ファイル圧縮装置６０のファイル圧縮処理と関連させて説明する。 FIG. 4 is a flowchart showing the index generation processing of the index generation device 10 according to the first embodiment of the present invention. A description will be given in connection with the file compression processing of the file compression device 60.

ステップＳ１００において、インデックス生成装置１０のＣＰＵ（以下、単にＣＰＵという）は、圧縮後ファイル６１の先頭からのブロック位置の値を初期化する。その後、ＣＰＵは、処理をステップＳ１０１に移す。 In step S 100, the CPU of the index generation device 10 (hereinafter simply referred to as “CPU”) initializes the block position value from the beginning of the compressed file 61. Thereafter, the CPU moves the process to step S101.

ステップＳ１０１において、ＣＰＵ（レコード引渡部１１）は、圧縮前ファイル５１の１レコードを読み込む。より具体的には、ＣＰＵは、ファイル処理関数によりオープンした圧縮前ファイル５１から、レコード取得関数により、１レコードを読み込む。ＣＰＵは、処理をステップＳ１０２に移す。 In step S101, the CPU (record delivery unit 11) reads one record of the pre-compression file 51. More specifically, the CPU reads one record from the uncompressed file 51 opened by the file processing function by the record acquisition function. The CPU moves the process to step S102.

ステップＳ１０２において、ＣＰＵ（レコード引渡部１１）は、ファイル圧縮装置６０とのインターフェースに従い、ステップＳ１０１で読み込んだ１レコードをファイル圧縮装置６０に送信する。その後、ＣＰＵは、処理をステップＳ１０３に移す。
一方、ファイル圧縮装置６０は、ステップＳ２０１において、インデックス生成装置１０から１レコードを受信する。次に、ファイル圧縮装置６０は、ステップＳ２０２において、インデックス生成装置１０から受信した１レコードを圧縮用バッファ６２に記憶し、圧縮用バッファ６２のバッファサイズを監視する。 In step S 102, the CPU (record delivery unit 11) transmits the one record read in step S 101 to the file compression apparatus 60 according to the interface with the file compression apparatus 60. Thereafter, the CPU moves the process to step S103.
On the other hand, the file compression apparatus 60 receives one record from the index generation apparatus 10 in step S201. Next, in step S202, the file compression apparatus 60 stores one record received from the index generation apparatus 10 in the compression buffer 62, and monitors the buffer size of the compression buffer 62.

ステップＳ１０３において、ＣＰＵ（レコード情報取得部１２）は、レコードのインデックスを付与しインデックスファイル３１に記憶する。より具体的には、ＣＰＵは、ステップＳ１０２において送信したレコードのインデックスを、レコード内の相対位置及びインデックスの長さにより取得し、取得したインデックスを圧縮後のブロックのブロック位置と関連付けてインデックスファイル３１に記憶する。その後、ＣＰＵは、処理をステップＳ１０４に移す。 In step S 103, the CPU (record information acquisition unit 12) assigns a record index and stores it in the index file 31. More specifically, the CPU acquires the index of the record transmitted in step S102 based on the relative position and the length of the index in the record, and associates the acquired index with the block position of the compressed block to generate the index file 31. To remember. Thereafter, the CPU moves the process to step S104.

ステップＳ１０４において、ＣＰＵ（圧縮処理状態監視部１３）は、圧縮処理状態を監視する。より具体的には、ＣＰＵは、ファイル圧縮装置６０とのインターフェースに従い、ファイル圧縮装置６０におけるファイル情報テーブル６３のバッファサイズを監視する。その後、ＣＰＵは、処理をステップＳ１０５に移す。
一方、ファイル圧縮装置６０は、ステップＳ２０３において、ステップＳ２０２における圧縮用バッファ６２のバッファサイズの監視によって、バッファサイズがオーバーした（バッファサイズが０以下になった）か否かを判断し、０以下になったと判断した場合（ＹＥＳの場合）、ファイル圧縮装置６０は、制御をステップＳ２０４に移し、０以下ではないと判断した場合（ＮＯの場合）、ファイル圧縮装置６０は、制御をステップＳ２０１に移す。次に、ファイル圧縮装置６０は、ステップＳ２０４において、圧縮用バッファ６２の圧縮を開始し、圧縮を完了して圧縮後の１ブロックを作成し、圧縮後ファイル６１のブロック番号を増加（＋１）する。次に、ファイル圧縮装置６０は、ステップＳ２０５において、終了か否かを判断し、終了でない（例えば、一定時間内に１レコードを受信した）と判断した場合、処理をステップＳ２０１に移し、終了である（例えば、一定時間内に１レコードを受信しない）と判断した場合、処理を終了する。 In step S104, the CPU (compression processing state monitoring unit 13) monitors the compression processing state. More specifically, the CPU monitors the buffer size of the file information table 63 in the file compression device 60 in accordance with the interface with the file compression device 60. Thereafter, the CPU moves the process to step S105.
On the other hand, in step S203, the file compression apparatus 60 determines whether or not the buffer size has exceeded (the buffer size has become 0 or less) by monitoring the buffer size of the compression buffer 62 in step S202. If it is determined that the file compression device 60 has reached (YES), the file compression device 60 moves the control to step S204. If it is determined that it is not less than 0 (NO), the file compression device 60 moves the control to step S201. Transfer. Next, in step S204, the file compression apparatus 60 starts compression of the compression buffer 62, completes compression, creates one block after compression, and increments (+1) the block number of the file 61 after compression. . Next, in step S205, the file compression apparatus 60 determines whether or not the process is completed. If the file compression apparatus 60 determines that the process is not completed (for example, one record has been received within a predetermined time), the process proceeds to step S201. If it is determined that there is a certain record (for example, one record is not received within a certain time), the process is terminated.

ステップＳ１０５において、ＣＰＵ（圧縮処理状態監視部１３）は、１ブロック分の圧縮完了か否かを判断する。より具体的には、ＣＰＵは、ステップＳ１０４において監視したブロック番号が、増加（＋１）したか否かを判断する。この判断がＹＥＳの場合、ＣＰＵは、処理をステップＳ１０６に移し、この判断がＮＯの場合、ＣＰＵは、処理をステップＳ１０１に移す。 In step S105, the CPU (compression processing state monitoring unit 13) determines whether or not compression for one block has been completed. More specifically, the CPU determines whether or not the block number monitored in step S104 has increased (+1). If this determination is YES, the CPU moves the process to step S106, and if this determination is NO, the CPU moves the process to step S101.

ステップＳ１０６において、ＣＰＵ（ブロック情報取得部１４）は、圧縮後のブロックサイズを取得する。より具体的には、ＣＰＵは、ファイル圧縮装置６０とのインターフェースに従い、ファイル圧縮装置６０から圧縮後のブロックサイズを取得する。その後、ＣＰＵは、処理をステップＳ１０７に移す。 In step S106, the CPU (block information acquisition unit 14) acquires the compressed block size. More specifically, the CPU acquires the block size after compression from the file compression device 60 in accordance with the interface with the file compression device 60. Thereafter, the CPU moves the process to step S107.

ステップＳ１０７において、ＣＰＵ（ブロック情報取得部１４）は、圧縮後のブロックサイズをブロック位置に加算する。その後、ＣＰＵは、処理をステップＳ１０８に移す。 In step S107, the CPU (block information acquisition unit 14) adds the compressed block size to the block position. Thereafter, the CPU moves the process to step S108.

ステップＳ１０８において、ＣＰＵ（レコード引渡部１１）は、圧縮終了か否かを判断する。より具体的には、ＣＰＵは、圧縮前ファイル５１の最後のレコードを含むブロックが圧縮完了した（ステップＳ１０１においてファイル終了であった）か否かを判断する。この判断がＹＥＳの場合、ＣＰＵは、処理を終了し、この判断がＮＯの場合、ＣＰＵは、処理をステップＳ１０１に移す。 In step S108, the CPU (record delivery unit 11) determines whether or not the compression is finished. More specifically, the CPU determines whether or not the block including the last record of the pre-compression file 51 has been compressed (the file has been terminated in step S101). If this determination is YES, the CPU ends the process, and if this determination is NO, the CPU moves the process to step S101.

図５は、本発明の実施形態１に係るインデックス生成装置１０によって生成されたインデックスファイル３１を利用して検索する検索処理を示すフローチャートである。検索処理は、インデックス生成装置１０とは別の、例えば、検索装置（図示せず）によって実行される処理として説明するが、インデックス生成装置１０によって実行されてもよい。 FIG. 5 is a flowchart showing a search process for searching using the index file 31 generated by the index generation device 10 according to the first embodiment of the present invention. The search process will be described as a process executed by, for example, a search apparatus (not shown) different from the index generation apparatus 10, but may be executed by the index generation apparatus 10.

検索装置は、ステップＳ３０１において、検索すべきレコードに含まれるインデックスにより、インデックスファイル３１を検索する。次に、検索装置は、ステップＳ３０２において、検索したインデックスに対応するブロック位置を取得する。次に、検索装置は、ステップＳ３０３において、圧縮後ファイル６１を構成するブロックのうち、取得したブロック位置から始まるブロックのみを解凍する。次に、検索装置は、ステップＳ３０４において、解凍したブロックのファイルから、インデックスにより検索し、検索すべきレコードを取得する。なお、検索装置は、検索すべきレコードがブロックに跨っている場合、次のブロックも解凍し、検索すべきレコードを取得する。
このように、インデックス生成装置１０は、圧縮後ファイル６１を構成するブロックのうち、検索装置が検索すべきレコードを含むブロックのみを部分解凍して検索すべきレコードを検索できるようなインデックスファイル３１を生成することができる。 In step S301, the search device searches the index file 31 using the index included in the record to be searched. Next, in step S302, the search device acquires a block position corresponding to the searched index. Next, in step S303, the search device decompresses only the blocks starting from the acquired block position among the blocks constituting the post-compression file 61. Next, in step S304, the search device searches from the decompressed block file using an index, and acquires a record to be searched. In addition, when the record to be searched spans the block, the search device also decompresses the next block and acquires the record to be searched.
As described above, the index generation device 10 can extract the index file 31 that can search the record to be searched by partially decompressing only the block including the record to be searched by the search device among the blocks constituting the post-compression file 61. Can be generated.

［実施形態２］
実施形態２におけるインデックス生成装置１０の機能を、図１、図２及び図６を参照しながら説明する。図６は、本発明の実施形態２に係るインデックスファイル３１の一例を示す図である。図６（１）は、インデックスファイル３１の一例を示し、図６（２）は、圧縮前ファイル５１と、圧縮後ファイル６１と、圧縮後ファイル６１から部分解凍された解凍ブロック５３との一例を示している。 [Embodiment 2]
The function of the index generation device 10 according to the second embodiment will be described with reference to FIG. 1, FIG. 2, and FIG. FIG. 6 is a diagram showing an example of the index file 31 according to the second embodiment of the present invention. 6 (1) shows an example of the index file 31, and FIG. 6 (2) shows an example of the pre-compression file 51, the post-compression file 61, and the decompression block 53 partially decompressed from the post-compression file 61. Show.

図１におけるレコード情報取得部１２は、レコード引渡部１１が引き渡すレコードのレコードサイズに基づいて、引き渡すレコードを含む圧縮前ファイル５１のブロックであって引き渡すレコードを含む圧縮後のブロックに対応する圧縮前ファイル５１のブロックの先頭からの、引き渡すレコードのオフセット（以後、レコードオフセットという）を算出して、レコードサイズと共に引き渡すレコードに関連付けてインデックスファイル３１にさらに記憶する。 The record information acquisition unit 12 in FIG. 1 is based on the record size of the record delivered by the record delivery unit 11 and is a pre-compression block corresponding to a block after compression that includes a record to be delivered that is a block of the pre-compression file 51 that includes the record to be delivered. The offset of the record to be delivered (hereinafter referred to as the record offset) from the head of the block of the file 51 is calculated, and further stored in the index file 31 in association with the record to be delivered together with the record size.

インデックスファイル３１は、図６（１）に示すように、順次記憶されたインデックス（例えば、ｋｅｙ１）と、区切りコード（例えば、￥ｔ）と、インデックスに対応するレコードを含む圧縮後のブロック位置（例えば、ｂｌｏｃｋ＿ｐｏｓ＿ｉ）と、区切りコード（例えば、￥ｔ）と、インデックスに対応するレコードを含む圧縮前ファイル５１のレコードオフセット（例えば、引き渡すレコード１を含む圧縮後のブロック６１ｉに対応する圧縮前ファイル５１のブロック５１ｉの先頭からのレコードオフセット、すなわち、解凍ブロック５３のｏｆｆｓｅｔ＿１）と、区切りコード（例えば、￥ｔ）と、インデックスに対応するレコードを含む圧縮前ファイル５１のレコードサイズ（例えば、引き渡すレコード１を含む圧縮後のブロック６１ｉに対応する圧縮前ファイル５１のレコードサイズ、すなわち、解凍ブロック５３のｌｅｎｇｔｈ＿１）とから構成される。 As shown in FIG. 6 (1), the index file 31 is a compressed block position (eg, key 1), a delimiter code (eg, \ t), and a block position after compression (including a record corresponding to the index). For example, block_pos_i), delimiter code (for example, \ t), and record offset of the uncompressed file 51 including the record corresponding to the index (for example, the uncompressed file 51 corresponding to the compressed block 61i including the record 1 to be delivered). Record offset from the beginning of the block 51i, that is, offset_1 of the decompression block 53), delimiter code (for example, \ t), and record size of the uncompressed file 51 including the record corresponding to the index (for example, record 1 to be delivered) Compressed block containing Record size of the compressed before file 51 corresponding to 61i, that is, constituted from the Length_1) of decompression block 53.

具体的には、図６（２）において、レコード情報取得部１２は、引き渡すレコードを含む圧縮後のブロック（例えば、ブロック６１ｉ）に対応する圧縮前ファイル５１のブロック（例えば、ブロック５１ｉ）の先頭からのレコードオフセットを、引き渡すレコードのレコードサイズに基づいて算出する。すなわち、レコード情報取得部１２は、圧縮されるごとにレコードオフセットの初期値を圧縮バッファからあふれたサイズ（例えば、最初のブロックは０、レコードがブロックに跨って記憶される場合に次のブロックにあふれたサイズ）にし、引き渡すレコードのレコードサイズをレコードオフセットに加算して、圧縮前ファイル５１のブロックの先頭からの、引き渡すレコードのレコードオフセットを算出する。そして、レコード情報取得部１２は、図６（１）に示すように、レコードオフセット（例えば、ｏｆｆｓｅｔ＿１）とレコードサイズ（例えば、ｌｅｎｇｔｈ＿１）とを共に、引き渡すレコードのインデックス（例えば、ｋｅｙ１）に関連付けてインデックスファイル３１にさらに記憶する。このように記憶されたレコードオフセットを利用すると、検索装置（図示せず）は、圧縮後のブロック６１ｉのうち先頭から検索すべきレコードを含んだ部分までを解凍した解凍ブロック５３を生成し、検索することができる。検索装置（図示せず）は、圧縮後のブロック（例えば、ブロック６１ｉ）のうち先頭から検索すべきレコードを含んだ部分までを解凍した解凍ブロック５３を生成し、解凍ブロック５３の中を検索せずに、解凍ブロック５３の最後からレコードサイズ分のレコードを容易に抽出して検索結果とすることができる。 Specifically, in FIG. 6B, the record information acquisition unit 12 starts the block (for example, block 51i) of the uncompressed file 51 corresponding to the compressed block (for example, block 61i) that includes the delivered record. The record offset from is calculated based on the record size of the record to be delivered. That is, the record information acquisition unit 12 sets the initial value of the record offset every time it is compressed to a size that overflows from the compression buffer (for example, the first block is 0, and the record is stored across blocks) Overflow size), the record size of the record to be delivered is added to the record offset, and the record offset of the record to be delivered from the head of the block of the file 51 before compression is calculated. Then, as shown in FIG. 6A, the record information acquisition unit 12 associates both the record offset (for example, offset_1) and the record size (for example, length_1) with the index of the record to be delivered (for example, key1). Further stored in the index file 31. When the record offset stored in this way is used, the search device (not shown) generates a decompression block 53 in which the compressed block 61i is decompressed from the beginning to the portion including the record to be searched. can do. A retrieval device (not shown) generates a decompression block 53 that decompresses the compressed block (for example, block 61i) from the beginning to the portion including the record to be retrieved, and searches the decompression block 53. Instead, records corresponding to the record size can be easily extracted from the end of the decompression block 53 to obtain a search result.

図７は、本発明の実施形態２に係るインデックス生成装置１０のインデックス生成処理を示すフローチャートである。ファイル圧縮装置６０のファイル圧縮処理と関連させて説明する。 FIG. 7 is a flowchart showing the index generation processing of the index generation device 10 according to the second embodiment of the present invention. A description will be given in connection with the file compression processing of the file compression device 60.

ステップＳ１１０は、実施形態１のステップＳ１００と同様であるので、省略する。 Since step S110 is the same as step S100 of the first embodiment, a description thereof will be omitted.

ステップＳ１１１において、ＣＰＵ（レコード引渡部１１）は、圧縮前ファイル５１の１レコードを読み込む。より具体的には、ＣＰＵは、ファイル処理関数によりオープンした圧縮前ファイル５１から、レコード取得関数により、１レコードを読み込み、レコードサイズを取得する。そして、ＣＰＵは、処理をステップＳ１１２に移す。 In step S111, the CPU (record delivery unit 11) reads one record of the pre-compression file 51. More specifically, the CPU reads one record from the pre-compression file 51 opened by the file processing function using the record acquisition function, and acquires the record size. Then, the CPU moves the process to step S112.

ステップＳ１１２において、ＣＰＵ（レコード引渡部１１）は、ファイル圧縮装置６０とのインターフェースに従い、ステップＳ１１１で読み込んだ１レコードをファイル圧縮装置６０に送信する。その後、ＣＰＵは、処理をステップＳ１１３に移す。 In step S 112, the CPU (record delivery unit 11) transmits the one record read in step S 111 to the file compression apparatus 60 according to the interface with the file compression apparatus 60. Thereafter, the CPU moves the process to step S113.

ステップＳ１１３において、ＣＰＵ（レコード情報取得部１２）は、レコードのインデックス、ブロック位置、レコードオフセット及びレコードサイズをインデックスファイル３１に記憶する。より具体的には、ＣＰＵは、ステップＳ１１２において送信したレコードのインデックスを、レコード内の相対位置及びインデックスの長さにより取得する。次に、ＣＰＵは、ブロック位置、レコードオフセット及びレコードサイズをレコードのインデックスに対応付けてインデックスファイル３１に記憶する。次に、ＣＰＵは、ステップＳ１１１で取得したレコードサイズをレコードオフセットに加算し、記憶する。その後、ＣＰＵは、処理をステップＳ１１４に移す。 In step S 113, the CPU (record information acquisition unit 12) stores the record index, block position, record offset, and record size in the index file 31. More specifically, the CPU acquires the index of the record transmitted in step S112 based on the relative position in the record and the length of the index. Next, the CPU stores the block position, record offset, and record size in the index file 31 in association with the record index. Next, the CPU adds the record size acquired in step S111 to the record offset and stores it. Thereafter, the CPU moves the process to step S114.

ステップＳ１１４〜Ｓ１１６は、実施形態１のステップＳ１０４〜Ｓ１０６と同様であるので省略する。実施形態１のステップＳ１０７に対応するステップＳ１１７において、ＣＰＵは、圧縮後のブロックサイズをブロック位置に加算すると共に、レコードオフセットの初期値をセットする。Ｓ１１８は、実施形態１のステップＳ１０８と同様であるので省略する。 Steps S114 to S116 are the same as steps S104 to S106 of the first embodiment, and are therefore omitted. In step S117 corresponding to step S107 in the first embodiment, the CPU adds the compressed block size to the block position and sets an initial value of the record offset. Since S118 is the same as step S108 in the first embodiment, a description thereof will be omitted.

図８は、本発明の実施形態２に係るインデックス生成装置１０によって生成されたインデックスファイル３１を利用して検索する検索処理を示すフローチャートである。検索処理は、インデックス生成装置１０とは別の、例えば、検索装置（図示せず）によって実行される処理として説明するが、インデックス生成装置１０によって実行されてもよい。 FIG. 8 is a flowchart showing search processing for searching using the index file 31 generated by the index generation device 10 according to the second embodiment of the present invention. The search process will be described as a process executed by, for example, a search apparatus (not shown) different from the index generation apparatus 10, but may be executed by the index generation apparatus 10.

検索装置は、ステップＳ３１１において、検索すべきレコードに含まれるインデックスにより、インデックスファイル３１を検索する。次に、検索装置は、ステップＳ３１２において、検索したインデックスに対応するブロック位置、レコードオフセット、及びレコードサイズを取得する。次に、検索装置は、ステップＳ３１３において、圧縮後ファイル６１を構成するブロックのうち、取得したブロック位置の圧縮後のブロックについて、検索すべきレコードを含む部分までを、レコードオフセットとレコードサイズとを利用して、解凍する。次に、検索装置は、ステップＳ３１４において、解凍したブロックのファイルから、インデックスにより検索し、検索すべきレコードを取得する。なお、検索装置は、ステップＳ３１４において、解凍したブロックの最後のレコードを検索すべきレコードとして取得するとしてもよい。検索すべきレコードがブロックに跨っている場合、検索装置は、レコードサイズを利用して次のブロックも解凍する。
このように、インデックス生成装置１０は、圧縮後ファイル６１を構成するブロックのうち、検索装置が検索すべきレコードを含むブロックのみを、検索すべきレコードを含む部分まで解凍して検索できるようなインデックスファイル３１を生成することができる。 In step S311, the search device searches the index file 31 using the index included in the record to be searched. Next, in step S312, the search device acquires a block position, a record offset, and a record size corresponding to the searched index. Next, in step S313, the search device calculates the record offset and the record size up to the portion including the record to be searched for the block after the compression at the acquired block position among the blocks constituting the post-compression file 61. Use and thaw. Next, in step S314, the search device searches the decompressed block file using an index and obtains a record to be searched. In step S314, the search device may acquire the last record of the decompressed block as a record to be searched. When the record to be searched is straddling the block, the search device also decompresses the next block using the record size.
In this way, the index generation device 10 can search by decompressing only the block including the record to be searched by the search device to the portion including the record to be searched among the blocks constituting the post-compression file 61. A file 31 can be generated.

本実施形態１によれば、インデックス生成装置１０は、圧縮前ファイル５１を構成する複数のレコードをファイル圧縮装置６０に順次引き渡し、引き渡すレコードのインデックスを付与しインデックスファイル３１に順次記憶する。次に、インデックス生成装置１０は、ファイル圧縮装置６０における圧縮処理状態を監視し、１ブロックの圧縮処理の終了を検知したことに応じて、圧縮後の１ブロックのブロックサイズをファイル圧縮装置６０から取得し、取得した圧縮後の１ブロックのブロックサイズに基づいて、圧縮後の１ブロックの、圧縮後ファイル６１の先頭からのブロック位置を算出し、インデックスファイル３１にさらに記憶する。したがって、インデックス生成装置１０は、圧縮後ファイル６１から必要なレコードを読み出すために、圧縮のために引き渡すレコードのインデックスを順次記憶し、圧縮後の１ブロックの、圧縮後ファイル６１の先頭からのブロック位置をさらに記憶して、圧縮後ファイル６１を部分的に解凍するための情報を生成する。その結果、インデックス生成装置１０は、圧縮後ファイル６１から所定のレコードを読み出す際に、解凍すべきブロックを特定することができる情報を提供することができる。 According to the first embodiment, the index generation device 10 sequentially delivers a plurality of records constituting the pre-compression file 51 to the file compression device 60, assigns an index of the record to be delivered, and sequentially stores it in the index file 31. Next, the index generation device 10 monitors the compression processing state in the file compression device 60, and determines the block size of one block after compression from the file compression device 60 in response to detecting the end of the compression processing of one block. Based on the acquired block size of one block after compression, the block position from the head of the post-compression file 61 of one block after compression is calculated and further stored in the index file 31. Therefore, the index generation device 10 sequentially stores the index of the record to be delivered for compression in order to read out the necessary record from the compressed file 61, and one block after compression is a block from the head of the compressed file 61. The location is further stored, and information for partially decompressing the post-compression file 61 is generated. As a result, the index generation device 10 can provide information that can specify a block to be decompressed when a predetermined record is read from the compressed file 61.

本実施形態２によれば、インデックス生成装置１０は、引き渡すレコードを含む圧縮前ファイル５１のブロックであって引き渡すレコードを含む圧縮後のブロックに対応する圧縮前ファイル５１のブロックの先頭からの、引き渡すレコードのオフセットを算出して、レコードサイズと共に引き渡すレコードに関連付けてインデックスファイル３１にさらに記憶する。したがって、インデックス生成装置１０は、圧縮後ファイル６１から必要なレコードを読み出すために、圧縮のために引き渡すレコードのインデックスを順次記憶し、圧縮後の１ブロックの、圧縮後ファイル６１の先頭からのブロック位置を記憶し、圧縮前ファイル５１のブロックの先頭からの、引き渡すレコードのレコードオフセットとレコードサイズとをさらに記憶して、圧縮後ファイル６１の該当する１ブロックのうち必要なレコードまでを解凍するための情報を生成する。その結果、インデックス生成装置１０は、圧縮後ファイル６１から所定のレコードを読み出す際に、解凍すべきブロックと解凍すべきデータ量とを特定することができる情報を提供することができる。 According to the second embodiment, the index generation apparatus 10 delivers from the head of the block of the uncompressed file 51 corresponding to the block after compression that includes the delivered record that is the block of the uncompressed file 51 including the delivered record. The record offset is calculated and stored in the index file 31 in association with the record to be delivered together with the record size. Therefore, the index generation device 10 sequentially stores the index of the record to be delivered for compression in order to read out the necessary record from the compressed file 61, and one block after compression is a block from the head of the compressed file 61. In order to store the position, further store the record offset and record size of the record to be delivered from the head of the block of the uncompressed file 51, and decompress the necessary record in the corresponding block of the compressed file 61 Generate information for. As a result, the index generating apparatus 10 can provide information that can specify the block to be decompressed and the amount of data to be decompressed when a predetermined record is read from the post-compression file 61.

なお、本実施形態では、インデックス生成装置１０として構成したが、この構成に限られない。例えば、ファイル圧縮装置６０に組み込まれて、ファイル圧縮装置６０のプログラムとのインターフェースに従い、インデックス生成装置１０と同様の機能を実現させるコンピュータ・プログラムとして構成してもよい。 In the present embodiment, the index generation device 10 is configured, but the configuration is not limited thereto. For example, the program may be configured as a computer program that is incorporated in the file compression device 60 and realizes the same function as the index generation device 10 according to an interface with the program of the file compression device 60.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

１０インデックス生成装置
１１レコード引渡部
１２レコード情報取得部
１３圧縮処理状態監視部
１４ブロック情報取得部
３１インデックスファイル
５１圧縮前ファイル
６１圧縮後ファイル
DESCRIPTION OF SYMBOLS 10 Index production | generation apparatus 11 Record delivery part 12 Record information acquisition part 13 Compression process status monitoring part 14 Block information acquisition part 31 Index file 51 File before compression 61 File after compression

Claims

An index generation device capable of communicating with a file compression device that compresses and decompresses a file block by block,
Record delivery means for sequentially delivering a plurality of records constituting a file to be compressed to the file compression device;
Record information acquisition means for providing an index of the record to be delivered and sequentially storing the index storage means;
Compression processing status monitoring means for monitoring the compression processing status in the file compression device and detecting the end of compression processing of one block;
In response to the compression processing state monitoring means detecting the end of the compression processing of the one block, information related to the one block after compression is acquired from the file compression apparatus, and the acquired one block after compression is acquired. Block information acquisition means for generating information for specifying the one block after compression based on the information relating to the information and further storing the information in the index storage means;
An index generation device comprising:

The record information acquisition means, based on the record size of the record to be transferred, is a block of the file before compression that includes the record to be transferred and corresponds to one block after compression that includes the record to be transferred. The index generation device according to claim 1, further comprising: calculating an offset of the record to be delivered and further storing the offset in association with the record to be delivered together with a record size in the index storage unit.

A method executed by an index generation device that can communicate with a file compression device that compresses and decompresses a file block by block,
A record delivery step of sequentially delivering a plurality of records constituting a file to be compressed to the file compression device;
A record information acquisition step of assigning an index of the record to be delivered and sequentially storing the index storage means;
A compression processing state monitoring step of monitoring a compression processing state in the file compression device and detecting the end of compression processing of one block;
In response to the compression processing state monitoring step detecting the end of the compression processing of the one block, information related to the one block after compression is acquired from the file compression apparatus, and the acquired one block after compression is acquired. A block information acquisition step of generating information for specifying the one block after compression based on the information related to the information and further storing the information in the index storage means;
A method comprising: