JPH06236302A

JPH06236302A - File compressing method

Info

Publication number: JPH06236302A
Application number: JP5021151A
Authority: JP
Inventors: Fumio Kida; 文夫木田
Original assignee: DAIMARU JOHO CENTER KK
Current assignee: DAIMARU JOHO CENTER KK
Priority date: 1993-02-09
Filing date: 1993-02-09
Publication date: 1994-08-23
Anticipated expiration: 2012-08-13
Also published as: JP2639776B2

Abstract

PURPOSE:To attain random access and to efficiently compress a file. CONSTITUTION:A reference record wherein a value predicted to be the highest appearance frequency in an object file is set is previously generated in a data part except a record key, and data parts L1 and L2 whose values malues match each other when a record to be compressed and the reference record are compared with each other are compressed into compressed codes to generate a compression result record whose record length is shorter than its original length while a compression identification code and the compressed code are added to data of differences from the reference record.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数のレコードにて同
一値となるデータ項目を圧縮してVSAMデータセット等に
おけるランダムアクセスが可能なファイル圧縮方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file compression method capable of random access in a VSAM data set or the like by compressing data items having the same value in a plurality of records.

【０００２】[0002]

【従来の技術】従来のファイル圧縮方法の主なものは以
下のとおりである。 (1) ファイル内レコードの類似性を利用して前レコード
との差分を圧縮レコードとして出力するシーケンシャル
データセット向けのファイル圧縮方法。 (2) レコード内の同一文字，繰り返しパターンを圧縮す
るランダムアクセス可能なファイル圧縮方法。 (3) 前レコードとの差分に含まれる同一文字，繰り返し
パターンを圧縮する前述の２方法を組み合わせたファイ
ル圧縮方法。2. Description of the Related Art The main conventional file compression methods are as follows. (1) A file compression method for sequential data sets that outputs the difference from the previous record as a compressed record by using the similarity of the records in the file. (2) A random-accessible file compression method that compresses the same characters and repeated patterns in a record. (3) A file compression method that combines the above-mentioned two methods of compressing the same character and repeated pattern included in the difference from the previous record.

【０００３】[0003]

【発明が解決しようとする課題】以上のように、従来の
ファイル圧縮方法の(1) 及び(3) は、ファイル先頭から
の順次アクセスしかできず、原理的にランダムアクセス
が不可能であり、(2) はCPU 使用量が多い割に一般的に
圧縮効率があまり良くないという問題がある。As described above, in the conventional file compression methods (1) and (3), only sequential access from the beginning of the file is possible, and in principle random access is impossible. In (2), there is a problem that the compression efficiency is generally not very good despite the large amount of CPU usage.

【０００４】本発明はこのような問題点を解決するため
になされたものであって、複数の圧縮対象レコードにて
同一値となると予測されるデータを設定した参照レコー
ドを作成し、参照レコードと圧縮対象レコードとの差分
を圧縮レコードとして出力することにより圧縮効率が良
くてランダムアクセスが可能なファイル圧縮方法の提供
を目的とする。The present invention has been made to solve such a problem, and a reference record in which data predicted to have the same value is set in a plurality of records to be compressed is created, and a reference record is created. An object of the present invention is to provide a file compression method that outputs a difference from a record to be compressed as a compressed record, has high compression efficiency, and allows random access.

【０００５】[0005]

【課題を解決するための手段】本発明に係るファイル圧
縮方法は、複数のレコードでその値が同一であるデータ
部分を圧縮するファイル圧縮方法において、圧縮対象レ
コードのキー項目を除くデータ部分のうち、複数のレコ
ードにて同一値となると予測されるデータを設定した参
照レコードをあらかじめ作成しておき、圧縮対象レコー
ドと参照レコードとを比較してその値が参照レコードの
値と一致する圧縮対象レコードのデータ部分を圧縮する
ことを特徴とする。A file compression method according to the present invention is a file compression method for compressing a data portion having the same value in a plurality of records. , A reference record in which data that is expected to have the same value in multiple records is created in advance, the compression target record is compared with the reference record, and the value matches the value of the reference record. The data part of is compressed.

【０００６】また、本発明に係るファイル圧縮方法は、
参照レコードを圧縮することを特徴とする。Further, the file compression method according to the present invention is
It is characterized by compressing the reference record.

【０００７】さらに、本発明に係るファイル圧縮方法
は、圧縮したレコードに圧縮したことを示すコードを付
与することを特徴とする。Further, the file compression method according to the present invention is characterized in that a code indicating compression is added to the compressed record.

【０００８】[0008]

【作用】本発明に係るファイル圧縮方法は、圧縮対象フ
ァイル内で出現頻度が高いデータを設定した参照レコー
ドを予め作成しておき、圧縮対象レコードと参照レコー
ドとの差分を圧縮結果レコードとして出力し、VSAMデー
タセット等におけるランダムアクセスを可能とする。With the file compression method according to the present invention, a reference record in which data having a high frequency of appearance is set in a compression target file is created in advance, and the difference between the compression target record and the reference record is output as a compression result record. , Random access to VSAM data set etc. is possible.

【０００９】[0009]

【実施例】本発明のファイル圧縮方法（以下、本発明方
法という）を実現するVCP (VSAMCOMPRESSION PACKAGE)
について図に基づき詳述する。図１はVCP における圧縮
方法の概要を示すレコードフォーマット図である。VCP
では、圧縮禁止エリアとするレコードキー項目以降の項
目に、複数のレコード間でその値が一致すると予想され
る値を設定した参照レコードをあらかじめ作成してお
き、圧縮対象レコードを参照レコードと比較して参照レ
コードと値が一致する部分（図中、ハッチングで示す）
を圧縮する。Embodiments VCP (VSAM COMPRESSION PACKAGE) for realizing the file compression method of the present invention (hereinafter referred to as the present invention method)
Will be described in detail with reference to the drawings. FIG. 1 is a record format diagram showing an outline of a compression method in VCP. VCP
Then, create a reference record in which values that are expected to match among multiple records are set in advance for items after the record key item that is the compression prohibited area, and the compression target record is compared with the reference record. That has the same value as the reference record (indicated by hatching in the figure)
Compress.

【００１０】圧縮の際、一致部分のBYTE数が３BYTE以上
で252 BYTE以下のときは、識別コードX'FD' 及び一致部
分のレコード長(L2)の２BYTEからなる圧縮コードに置き
換え、252 BYTE以上で32760 BYTE以下のときは、識別コ
ードX'FE' 及び一致部分のレコード長(L1)の３BYTEから
なる圧縮コードに置き換える。なお、一致部分の連続が
２BYTE以下のときは圧縮によって逆にBYTE数が増して圧
縮効果が得られないので圧縮コードへの置き換えは行わ
ない。When the number of BYTEs in the matching portion is 3 BYTE or more and 252 BYTE or less at the time of compression, the identification code X'FD 'and the compression code consisting of 2 BYTE of the record length (L2) of the matching portion are replaced, and 252 BYTE or more. If it is 32760 BYTE or less, it is replaced with a compression code consisting of the identification code X'FE 'and 3 BYTE of the record length (L1) of the matching portion. When the number of consecutive matched portions is 2 BYTE or less, the number of BYTEs is increased by compression and the compression effect cannot be obtained. Therefore, the replacement with the compression code is not performed.

【００１１】VCP では圧縮コードの識別文字としてX'F
D' 及びX'FE' を使用するが、圧縮対象レコード内に圧
縮コード以外を意味するX'FE' ，X'FD' が存在する場合
も考えられるので、圧縮コード以外の意味のX'FD' ，X'
FE' が出現する場合は、X'FD'又はX'FE' を２個連続で
出力して圧縮コードとの判別を可能とする。In VCP, X'F is used as an identification character of the compression code.
Although D'and X'FE 'are used, it is possible that X'FE' and X'FD ', which means other than the compression code, exist in the record to be compressed, so X'FD with the meaning other than the compression code is used. ', X'
When FE 'appears, two X'FD' or X'FE 'are output in succession to enable discrimination from the compressed code.

【００１２】また、圧縮結果のレコード長が元のレコー
ド長を超える場合は圧縮処理をバイパスして元のレコー
ドをそのまま出力するので、同一データセットに非圧縮
レコードと圧縮レコードとの混在を可能とするため、圧
縮レコードには圧縮禁止エリアの直後に、X'FE FF FF'
及びレコード長(RL)からなる５BYTEの圧縮識別エリアを
設ける。If the record length of the compression result exceeds the original record length, the compression process is bypassed and the original record is output as it is. Therefore, it is possible to mix uncompressed records and compressed records in the same data set. Therefore, the compressed record has X'FE FF FF 'immediately after the compression prohibited area.
And a compressed identification area of 5 BYTE consisting of the record length (RL).

【００１３】図２はVCP により作成されたVSAMデータセ
ットのフォーマット図である。本実施例では、アクセス
時のレコード圧縮／復元のために、VSAMデータセット内
にALL LOW-VALUE (X'0000 …')をキーとする参照レコー
ドを保持し、データレコードは、この参照レコードを用
いてVCP 圧縮した可変長レコードとして記録される。VC
P では参照レコードは提供ユーティリティでのロード時
に対象VSAMファイル内に作成され、VCP の提供するアク
セスルーチンでは、この参照レコードを用いてデータの
復元／圧縮を行うため、アクセスの都度、利用者が外部
から参照レコードを与える必要はない。FIG. 2 is a format diagram of a VSAM data set created by VCP. In this embodiment, in order to compress / decompress the record at the time of access, a reference record having ALL LOW-VALUE (X'0000 ... ') as a key is held in the VSAM data set, and the data record stores this reference record. Recorded as a VCP-compressed variable length record. VC
In P, the reference record is created in the target VSAM file when it is loaded by the provided utility, and the access routine provided by VCP uses this reference record to decompress / compress the data. You don't have to give a reference record from.

【００１４】また、VCP のアクセスルーチンでは一般VS
AM（非VCP のVSAM）の取り扱いも可能とするため、一般
VSAMのALL LOW-VALUE キーのレコードとVCP 形式の参照
レコードとを識別する目的で、参照レコード自体も、全
てスペース (X'40')のデータとの間でVCP 圧縮した形式
としている。なお、参照レコードが圧縮できない場合を
想定し、ロード時に圧縮禁止エリアの直後が必ずX'FE F
F FF' となるよう考慮している。In addition, in the VCP access routine, the general VS
Since AM (non-VCP VSAM) handling is also possible, general
For the purpose of distinguishing the VSAM ALL LOW-VALUE key record from the VCP format reference record, the reference record itself is also VCP-compressed with all space (X'40 ') data. Assuming that the reference record cannot be compressed, the area immediately after the compression prohibited area must always be X'FE F when loading.
It is considered to be F FF '.

【００１５】参照レコードの保持方法としては、上述の
ALL LOW-VALUE のような特殊キーをもつデータとして圧
縮対象ファイル内に登録する以外に、参照レコードを、
圧縮対象ファイルごとに決定した名称をもつロードモジ
ュールとして作成し、ライブラリ管理を行う方法も可能
である。なお、参照レコードをロードモジュールとして
保持しても、圧縮対象ファイル内に登録した場合と同様
に、圧縮対象ファイルへの初回アクセス時に参照レコー
ドをメモリ空間に取り込み、以降のアクセスではメモリ
空間に取り込んだ参照レコードを使用するので、モジュ
ールのローディング等の無駄なアクセスを圧縮の都度行
う必要はない。The method for holding the reference record is as described above.
In addition to registering as a data with a special key such as ALL LOW-VALUE in the compression target file, the reference record is
It is also possible to create a load module having a name determined for each compression target file and manage the library. Even if the reference record is retained as a load module, the reference record is loaded into the memory space when the compression target file is accessed for the first time, and is loaded into the memory space in the subsequent accesses, as in the case where it is registered in the compression target file. Since the reference record is used, it is not necessary to perform unnecessary access such as module loading each time compression is performed.

【００１６】以下、VCP の具体的な処理手順を、図３乃
至図11のフローチャートに基づいて説明する。（Ａ）圧縮VSAM作成（図３〜図５参照）まず、参照レコードとなるサンプルレコードを圧縮す
る。圧縮対象のVSAMファイル及びサンプルレコードが登
録されているサンプルファイルをオープンし(S1, S2)、
サンプルファイルをリードする(S3)。VSAMファイルから
キー長，キー位置，及び最大レコード長の情報を得て(S
4)、キー長とキー位置からキーの終端位置を計算し(S
5)、サンプルレコードの先頭からキー終端位置までX'0
0' をセットする(S6)。次に、サンプルレコードのキー
終端から後を、全てスペース(X'40') のレコードをサン
プルとして後述のような圧縮処理を行う(S7)。The concrete processing procedure of the VCP will be described below with reference to the flowcharts of FIGS. 3 to 11. (A) Creation of compressed VSAM (see FIGS. 3 to 5) First, a sample record serving as a reference record is compressed. Open the sample file in which the VSAM file to be compressed and the sample record are registered (S1, S2),
Read the sample file (S3). Obtain key length, key position, and maximum record length information from the VSAM file (S
4) Calculate the key end position from the key length and key position (S
5), X'0 from the beginning of the sample record to the key end position
Set 0 '(S6). Next, after the key end of the sample record, a compression process as described later is performed using all the records of space (X'40 ') as samples (S7).

【００１７】圧縮の結果、〔キー終端までの長さ＋圧縮
識別コードの長さ＋サンプルレコード圧縮データの長
さ〕を WRITEレコード長とし(S8)、 WRITEレコード長と
VSAM最大レコード長とを比較する(S9)。WRITEレコード
長がVSAM最大レコード長以下の場合は〔サンプルレコー
ド先頭〜キー終端＋圧縮識別コード＋サンプルレコード
圧縮データ〕をVSAMレコードとする(S10) 。As a result of the compression, the [length to the end of key + compression identification code length + length of sample record compressed data] is set as the WRITE record length (S8), and the WRITE record length is set.
Compare with VSAM maximum record length (S9). If the WRITE record length is less than or equal to the VSAM maximum record length, [SAMPLE RECORD HEAD-KEY END + COMPRESSION ID CODE + SAMPLE RECORD COMPRESSED DATA] is taken as the VSAM record (S10).

【００１８】ステップS9の比較の結果、 WRITEレコード
長がVSAM最大レコード長を超える場合は圧縮識別コード
をサンプルレコードキー終端の直後に設け(S11) 、サン
プルレコードをVSAMレコード(S12) 、サンプルレコード
長を WRITEレコード長とする(S13) 。上述のようにして
得られた圧縮又は非圧縮のVSAMレコード、即ち、サンプ
ルレコードをVSAMファイルに書き込み(S14) 、サンプル
ファイルをクローズする(S15) 。If the WRITE record length exceeds the VSAM maximum record length as a result of the comparison in step S9, a compression identification code is provided immediately after the end of the sample record key (S11), the sample record is the VSAM record (S12), and the sample record length. As the WRITE record length (S13). The compressed or uncompressed VSAM record obtained as described above, that is, the sample record is written to the VSAM file (S14), and the sample file is closed (S15).

【００１９】次に、サンプルレコードを基に圧縮対象デ
ータを圧縮する。入力データファイルをオープンし(S1
6) 、入力データファイルをリードする(S17) 。入力レ
コードのキー終端から後をサンプルレコードをサンプル
として後述のような圧縮処理を行う(S18) 。〔キー終端
までの長さ＋圧縮識別コードの長さ＋入力レコード圧縮
データの長さ〕を WRITEレコード長とし(S19) 、 WRITE
レコード長とVSAM最大レコード長とを比較する(S20) 。Next, the compression target data is compressed based on the sample record. Open the input data file (S1
6) Read the input data file (S17). After the key end of the input record, the sample record is used as a sample for compression processing as described later (S18). Set [length to end of key + length of compressed identification code + length of input record compressed data] as WRITE record length (S19), WRITE
The record length is compared with the VSAM maximum record length (S20).

【００２０】WRITEレコード長がVSAM最大レコード長以
下の場合は〔サンプルレコード先頭〜キー終端＋圧縮識
別コード＋サンプルレコード圧縮データ〕をVSAMレコー
ドとする(S21) 。一方、ステップS21 の比較の結果、 W
RITEレコード長がVSAM最大レコード長を超える場合は入
力レコードをVSAMレコード(S22) 、入力レコード長を W
RITEレコード長とする(S23) 。上述のようにして得られ
た圧縮又は非圧縮のVSAMレコードをライトし(S24) 、ス
テップS17 に戻って、全データレコードに対して同様の
処理を繰り返す。入力データファイルがエンドオブファ
イルに達したらVSAMファイル及び入力データファイルを
クローズする(S25, 26) 。When the WRITE record length is equal to or less than the VSAM maximum record length, [SAMPLE RECORD HEAD-KEY END + COMPRESSION IDENTIFICATION + SAMPLE RECORD COMPRESSED DATA] is set as a VSAM record (S21). On the other hand, as a result of the comparison in step S21, W
If RITE record length exceeds VSAM maximum record length, input record is VSAM record (S22), input record length is W
RITE record length (S23). The compressed or non-compressed VSAM record obtained as described above is written (S24), the process returns to step S17, and the same processing is repeated for all data records. When the input data file reaches the end of file, the VSAM file and the input data file are closed (S25, 26).

【００２１】（Ｂ）圧縮処理（図６，図７参照）変数I,J,K にそれぞれ“１”をセットし(S41) 、変数Ｉ
の値をレジスタSV-Iに退避する(S42) 。圧縮対象である
入力レコード（サンプルレコードの圧縮の場合はサンプ
ルレコード）のＩBYTE目とサンプルレコード（サンプル
レコードの圧縮の場合は全てスペースのレコード）のＪ
BYTE目とを比較する(S43) 。比較結果が一致した場合
は、変数I,J の値をそれぞれ“１”だけインクリーズし
て(S44) 、変数Ｉの値と入力レコード長とを比較し(S4
5) 、変数Ｉの値が入力レコード長以下であって、全レ
コード長に対する処理が終了していない場合はステップ
S43 に戻る。(B) Compression processing (see FIGS. 6 and 7) Variables I, J and K are set to "1" (S41), and variable I is set.
The value of is saved in the register SV-I (S42). IBYTE of the input record to be compressed (sample record in case of sample record compression) and J of sample record (all space records in case of sample record compression)
Compare with BYTE eye (S43). If the comparison results match, the values of variables I and J are each incremented by "1" (S44), and the value of variable I and the input record length are compared (S4
5), if the value of variable I is less than or equal to the input record length and processing for all record lengths has not ended,
Return to S43.

【００２２】変数Ｉの値が入力レコード長を超えた場
合、又は入力レコードのＩBYTE目がサンプルレコードの
ＪBYTE目と一致しなかった場合は、（I-SVI)を圧縮コー
ドの一致部分のレコード長の変数Ｌ（バイナリ２BYTE）
にセットする(S46) 。変数Ｌの値を“２”と比較し(S4
7) 、“２”を超えている場合は“252 ”とを比較し(S4
8) 、変数Ｌの値が“３”以上で“252 ”以下の場合は
X'FD' を出力レコードのＫBYTE目、変数Ｌの下位１BYTE
を出力レコードの(K+1) BYTE目にそれぞれ出力し(S49,
50) 、変数Ｋの値を“２”だけインクリーズしてステッ
プS62 に移行する(S51) 。When the value of the variable I exceeds the input record length, or when the IBYTEth of the input record does not match the JBYTEth of the sample record, (I-SVI) is set to the record length of the matching portion of the compression code. Variable L (binary 2 BYTE)
Set to (S46). Compare the value of variable L with "2" (S4
7) If it exceeds “2”, compare it with “252” (S4
8) If the value of variable L is "3" or more and "252" or less,
X'FD 'is KBYTE of output record, lower 1 BYTE of variable L
To the (K + 1) BYTE of the output record (S49,
50), the value of the variable K is incremented by "2" and the process proceeds to step S62 (S51).

【００２３】ステップS48 の比較の結果、変数Ｌの値が
“252 ”を超えている場合はX'FE'を出力レコードのＫB
YTE目、変数Ｌの上位１BYTEを出力レコードの(K+1) BYT
E目、下位１BYTEを出力レコードの(K+2) BYTE目に出力
し(S52, 53, 54) 、変数Ｋを“３”だけインクリーズし
てステップS62 に移行する(S55) 。When the value of the variable L exceeds "252" as a result of the comparison in step S48, X'FE 'is output as KB of the record.
YTE eye, upper 1 BYTE of variable L is output record (K + 1) BYT
The Eth and lower 1 BYTE are output to the (K + 2) BYTE of the output record (S52, 53, 54), the variable K is incremented by "3", and the process proceeds to step S62 (S55).

【００２４】一方、ステップS47 の比較の結果、変数Ｌ
の値が“２”以下の場合、即ち、サンプルレコードと一
致しなかった場合又は一致部分が２BYTE以下で圧縮効果
が得られない場合は、変数Ｌの値と“０”とを比較し(S
56) 、変数Ｌが“０”の場合、即ち、サンプルレコード
と一致しなかった場合はステップS62 に移行し、入力レ
コードのＩBYTE目がX'FD' 又はX'FE' であるか否かを判
断した後(S64) 、入力レコードのＩBYTE目を出力レコー
ドのＫBYTE目に出力し(S67) 、変数I,J,K の値をそれぞ
れ“１”だけインクリーズし(S68, 69) 、ステップS62
に戻る。なお、ここでX'FD', X'FE'とのチェックが不要
なのは、サンプルレコードにはX'FD', X'FE'を含まない
ということを前提としているためである。従って、サン
プルレコードと一致したときはX'FD', X'FE'ではありえ
ない。On the other hand, as a result of the comparison in step S47, the variable L
If the value of is less than “2”, that is, if it does not match the sample record or if the matching part is less than 2 BYTE and the compression effect is not obtained, the value of the variable L is compared with “0” (S
56) If the variable L is “0”, that is, if it does not match the sample record, the process proceeds to step S62, and it is determined whether the IBYTE entry of the input record is X'FD 'or X'FE'. After the judgment (S64), the IBYTE eye of the input record is output to the KBYTE eye of the output record (S67), and the values of variables I, J, and K are incremented by "1" (S68, 69), and step S62.
Return to. The check with X'FD 'and X'FE' is unnecessary here because it is premised that the sample record does not include X'FD 'and X'FE'. Therefore, when it matches the sample record, it cannot be X'FD 'or X'FE'.

【００２５】ステップS56 の比較の結果、変数Ｌの値が
“０”でない場合は“１”と比較し(S57) 、変数Ｌの値
が“１”の場合は入力レコードの(I-1) BYTE目を出力レ
コードのＫBYTE目に出力し(S60) 、変数Ｋの値を“１”
だけインクリーズし(S61) 、ステップS62 に移行する。As a result of the comparison in step S56, if the value of the variable L is not "0", it is compared with "1" (S57), and if the value of the variable L is "1", (I-1) of the input record. Output the BYTE eye to the KBYTE eye of the output record (S60) and set the value of variable K to "1".
Only the increment is performed (S61), and the process proceeds to step S62.

【００２６】ステップS56, 57 の比較の結果、変数Ｌの
値が“２”の場合は入力レコードの(I-2) BYTE目を出力
レコードのＫBYTE目に出力し(S58) 、変数Ｋの値を
“１”だけインクリーズし(S59) 、さらに入力レコード
の(I-1) BYTE目を出力レコードのＫBYTE目に出力し(S6
0) 、変数Ｋの値を“１”だけインクリーズし(S61) 、
ステップS62 に移行する。If the value of the variable L is "2" as a result of the comparison in steps S56 and 57, the (I-2) BYTE eye of the input record is output to the KBYTE eye of the output record (S58) and the value of the variable K is output. Is incremented by "1" (S59), and the (I-1) BYTE eye of the input record is output to the KBYTE eye of the output record (S6
0), the value of the variable K is incremented by "1" (S61),
Control goes to step S62.

【００２７】次に、変数Ｉの値と入力レコード長とを比
較し(S62) 、変数Ｉの値が入力レコード長以下の場合は
入力レコードのＩBYTE目とサンプルレコードのＪBYTE目
とを比較する(S63) 。比較結果が一致しない場合は、入
力レコードのＩBYTE目をX'FE' 及びX'FD' と比較する(S
64) 。入力レコードのＩBYTE目がX'FE' 又はX'FD' のい
ずれかである場合は圧縮コードとの区別のため、入力レ
コードのＩBYTE目を出力レコードのＫBYTE目に出力し(S
65) 、変数Ｋの値を“１”だけインクリーズした後(S6
6) 、入力レコードのＩBYTE目（２個目のX'FD' 又はX'F
E' ）を出力レコードのＫBYTE目に出力し(S67) 、変数
I,J,K の値をそれぞれ“１”だけインクリーズする(S6
8,69) 。Next, the value of the variable I and the input record length are compared (S62), and if the value of the variable I is less than or equal to the input record length, the IBYTE's eye of the input record and the JBYTE's eye of the sample record are compared ( S63). If the comparison result does not match, the IBYTE'th entry of the input record is compared with X'FE 'and X'FD' (S
64). If the IBYTE index of the input record is either X'FE 'or X'FD', the IBYTE index of the input record is output to the KBYTE index of the output record to distinguish it from the compression code (S
65) After incrementing the value of the variable K by "1" (S6
6), input record IBYTE (second X'FD 'or X'F
E ') is output to the KBYTE of the output record (S67), and the variable
Increments the I, J, and K values by "1" (S6
8,69).

【００２８】ステップS64 の比較の結果、入力レコード
のＩBYTE目がX'FE' 又はX'FD' のいずれでもない場合は
入力レコードのＩBYTE目を出力レコードのＫBYTE目に出
力し(S67) 、変数I,J,K の値をそれぞれ“１”だけイン
クリーズし(S68, 69) 、ステップS62 に戻る。ステップ
S62 の比較の結果、変数Ｉの値が入力レコード長を超え
た場合は変数Ｋの値を出力レコード長とし(S70) 、１つ
のデータレコードに対する圧縮処理を終了する。また、
ステップS63 の比較の結果、入力レコードのＩBYTE目と
サンプルレコードのＪBYTE目とが一致した場合はステッ
プS42 に戻る。If the result of the comparison in step S64 is that the IBYTE entry of the input record is neither X'FE 'nor X'FD', the IBYTE entry of the input record is output to the KBYTE entry of the output record (S67), and the variable The values of I, J and K are incremented by "1" respectively (S68, 69) and the process returns to step S62. Step
When the value of the variable I exceeds the input record length as a result of the comparison in S62, the value of the variable K is set as the output record length (S70), and the compression process for one data record is completed. Also,
As a result of the comparison in step S63, if the IBYTE's eye of the input record and the JBYTE's eye of the sample record match, the process returns to step S42.

【００２９】（Ｃ）圧縮VSAMアクセスルーチン（図８〜図10参照）サンプルレコードの準備完了を示す初期設定SWが“１”
であるか否かを判断し(S81）、“１”（準備完了）の場
合はステップS93 に移行し、“１”でない場合は初期設
定SWに“１”をセットして(S82) 、VSAMファイルをオー
プンし(S83）、VSAMファイルからキー長，キー位置，及
び最大レコード長情報を得て(S84）、キー長とキー位置
からキーの終端位置を計算する(S85）。(C) Compressed VSAM access routine (see FIGS. 8 to 10) The initial setting SW indicating the preparation completion of the sample record is “1”
(S81), if "1" (ready), move to step S93. If not "1", set "1" in the initial setting SW (S82), VSAM The file is opened (S83), key length, key position, and maximum record length information is obtained from the VSAM file (S84), and the end position of the key is calculated from the key length and key position (S85).

【００３０】レコードの先頭からキー終端位置までX'0
0' をセットし(S86）、VSAMファイルをランダムにREAD
してX'00' をキーとするサンプルレコードを検索する(S
87）。VSAMファイル内にX'00' をキーとするレコードが
ない場合は圧縮処理SWに“０”をセットする(S89) 。X'0 from the beginning of the record to the key end position
Set 0 '(S86) and randomly read VSAM file
Search for sample records with X'00 'as the key (S
87). If there is no record with the key of X'00 'in the VSAM file, "0" is set in the compression processing SW (S89).

【００３１】VSAMファイル内にX'00' をキーとするレコ
ードがある場合は、キー終端位置の直後が圧縮識別コー
ド(X'FE FF FF') であるか否かを判断する(S88）。圧縮
識別コードの場合はレコードのキー終端から後を、全て
スペースのレコード(X'40')をサンプルとして後述のよ
うに復元処理を行い(S90）、復元結果をサンプルレコー
ドとし(S91) 、圧縮処理SWに“１”をセットする(S92)
。一方、ステップS88 の判断の結果、キー終端位置の
直後が圧縮識別コードでない場合は圧縮処理SWに“０”
をセットする(S89）。If there is a record having X'00 'as a key in the VSAM file, it is judged whether or not the compression identification code (X'FE FF FF') is immediately after the key end position (S88). In the case of the compressed identification code, after the key end of the record, the record of all spaces (X'40 ') is used as a sample for restoration processing as described later (S90), and the restoration result is used as a sample record (S91), and compressed. Set "1" to the processing switch (S92)
. On the other hand, if the result of determination in step S88 is that the compression identification code is not immediately after the key end position, "0" is set in the compression processing SW.
Set (S89).

【００３２】圧縮処理SWが“１”であるか否かを判断し
(S93）、“０”（圧縮ファイル以外）であれば通常のVS
AMアクセスを行い(S109)、“１”（圧縮ファイル）であ
れば、参照系処理であるか更新系処理であるかを判断す
る(S94, 95）。参照系処理の場合はVSAMをアクセスし、
アクセス要求された方法でVSAMファイルを検索し(S9
6）、VSAMファイル内に参照対象のレコードが存在しな
い場合は参照を終了する。It is determined whether or not the compression processing SW is "1".
(S93), if it is "0" (other than compressed file), it is normal VS
AM access is performed (S109), and if it is "1" (compressed file), it is determined whether it is a reference processing or an update processing (S94, 95). For reference processing, access VSAM,
Search the VSAM file using the method requested for access (S9
6) If the reference record does not exist in the VSAM file, end the reference.

【００３３】VSAMファイル内に参照対象のレコードが存
在する場合はキー直後のコードが圧縮識別コードである
か否かを判断し(S97）、圧縮識別コードである場合はVS
AMレコードのキー終端から後をサンプルレコードをサン
プルとして後述する復元処理を行い(S98）、復元結果レ
コードを参照結果レコードとして(S99）、参照を終了す
る。ステップS97 の判断の結果、キー直後が圧縮識別コ
ードでない場合は復元処理をバイパスしてVSAMレコード
をそのまま参照結果レコードとし(S100 ）、参照を終了
する。If the record to be referenced exists in the VSAM file, it is judged whether the code immediately after the key is the compression identification code (S97).
Restoration processing, which will be described later, is performed using the sample record as a sample after the key end of the AM record (S98), the restoration result record is set as the reference result record (S99), and the reference is terminated. If the result of determination in step S97 is that the key immediately after the key is not the compressed identification code, the decompression process is bypassed and the VSAM record is used as is as the reference result record (S100), and the reference is terminated.

【００３４】一方、更新系処理の場合は、新たにVSAMフ
ァイルに登録する入力レコードのキー終端から後をサン
プルレコードをサンプルとして前述のような圧縮処理を
行う(S101)。〔キー終端までの長さ＋圧縮識別コードの
長さ＋入力レコード圧縮データの長さ〕を WRITEレコー
ド長とし(S102)、 WRITEレコード長とVSAM最大レコード
長とを比較する(S103)。On the other hand, in the case of the update processing, the above-described compression processing is performed by using the sample record as a sample after the key end of the input record newly registered in the VSAM file (S101). The [length up to the end of key + compression identification code length + compressed input record data length] is set as the WRITE record length (S102), and the WRITE record length is compared with the VSAM maximum record length (S103).

【００３５】WRITEレコード長がVSAM最大レコード長以
下の場合は、〔入力レコード先頭〜キー終端＋圧縮識別
コード＋入力レコード圧縮データ〕をVSAMレコードと
し、VSAMファイルをアクセスする(S104)。また、ステッ
プS103の比較の結果、 WRITEレコード長がVSAM最大レコ
ード長を超える場合は圧縮しない入力レコードそのまま
をVSAMレコードとし(S105)、入力レコード長を WRITEレ
コード長として(S106)、VSAMファイルをアクセスする(S
107)。If the WRITE record length is less than or equal to the VSAM maximum record length, the [input record start-key end + compression identification code + input record compressed data] is set as a VSAM record and the VSAM file is accessed (S104). If the WRITE record length exceeds the VSAM maximum record length as a result of comparison in step S103, the input record that is not compressed is used as the VSAM record (S105), the input record length is used as the WRITE record length (S106), and the VSAM file is accessed. Yes (S
107).

【００３６】（Ｄ）復元処理（図11参照）変数I,J,K にそれぞれ“１”をセットし(S121)、入力レ
コード（サンプルレコードの復元の場合はサンプルレコ
ード）のＩBYTE目がX'FD' 又はX'FE' であるか否かを判
断する(S122)。入力レコードのＩBYTE目がX'FD' 又はX'
FE' のいずれでもない場合は入力レコードのＩBYTE目を
出力レコードのＫBYTE目に出力し(S123)、変数I,J,K の
値をそれぞれ“１”だけインクリーズして(S124, 125)
、ステップS138に移行する。(D) Restoration processing (see FIG. 11) Variables I, J, and K are set to "1" (S121), and the IBYTE entry of the input record (sample record in the case of restoration of the sample record) is X '. It is determined whether it is FD 'or X'FE' (S122). IBYTE of input record is X'FD 'or X'
If it is neither FE ', the IBYTE of the input record is output to the KBYTE of the output record (S123), and the values of variables I, J, and K are incremented by "1" (S124, 125).
, And proceeds to step S138.

【００３７】ステップS122の判断の結果、入力レコード
のＩBYTE目がX'FD' 又はX'FE' のいずれかである場合は
入力レコードのＩBYTE目を(I+1) BYTE目と比較し(S12
6)、一致する場合（圧縮コード以外のデータの場合）は
変数Ｉの値を“１”だけインクリーズし(S127)、入力レ
コードのＩBYTE目を出力レコードのＫBYTE目に出力し(S
123)、変数I,J,K の値をそれぞれ“１”だけインクリー
ズして(S124, 125) 、ステップS138に移行する。If the result of determination in step S122 is that the IBYTE index of the input record is either X'FD 'or X'FE', the IBYTE index of the input record is compared with the (I + 1) BYTE index (S12
6) If they match (in the case of data other than the compression code), the value of the variable I is incremented by "1" (S127) and the IBYTE eye of the input record is output to the KBYTE eye of the output record (S
123), the values of the variables I, J, and K are each incremented by "1" (S124, 125), and the process proceeds to step S138.

【００３８】ステップS126での比較の結果、入力レコー
ドのＩBYTE目と(I+1) BYTE目とが一致しない場合（圧縮
コードの場合）は入力レコードのＩBYTE目がX'FE' であ
るか否かを判断する(S128)。X'FE' の場合は入力レコー
ドの(I+1）BYTE目を変数Ｌの上位１BYTEに(S129)、入力
レコードの(I+2）BYTE目を変数Ｌの下位１BYTEにセット
し(S130)、変数Ｉの値を“３”だけインクリーズする(S
131)。As a result of the comparison in step S126, if the IBYTE eye of the input record and the (I + 1) BYTE eye of the input record do not match (compression code), it is determined whether the IBYTE eye of the input record is X'FE '. It is determined (S128). In the case of X'FE ', the (I + 1) BYTE of the input record is set to the upper 1 BYTE of the variable L (S129), and the (I + 2) BYTE of the input record is set to the lower 1 BYTE of the variable L (S130). , Increment the value of variable I by “3” (S
131).

【００３９】ステップS128の比較の結果、入力レコード
のＩBYTE目がX'FD' の場合は変数Ｌに一旦“０”をセッ
トしてクリアし(S132)、入力レコードの(I+1）BYTE目を
変数Ｌの下位１BYTEにセットして(S133)、変数Ｉの値を
“２”だけインクリーズする(S134)。As a result of the comparison in step S128, when the IBYTE eye of the input record is X'FD ', the variable L is temporarily set to "0" and cleared (S132), and the (I + 1) BYTE eye of the input record is cleared. Is set to the lower 1 BYTE of the variable L (S133), and the value of the variable I is incremented by "2" (S134).

【００４０】圧縮コードのレコード長を変数Ｌにセット
した後、サンプルレコード（サンプルレコードの復元の
場合は全てスペースのレコード）のＪBYTE目からＬBYTE
分のデータを出力レコードのＫBYTE目からＬBYTEに出力
し(S135)、変数J,K の値をそれぞれ“Ｌ”だけインクリ
ーズする(S136, 137）。以上の復元処理を変数Ｉの値が
入力レコード長を超えるまで繰り返し(S138)、変数Ｉの
値が入力レコード長を超えた時点で変数Ｋの値を出力レ
コード長とし(S139)、１つのレコードの復元処理を終了
する。After setting the record length of the compression code in the variable L, LBYTE from the JBYTE'th position of the sample record (records of all spaces when restoring the sample record)
The minute data is output to the LBYTE from the KBYTE position of the output record (S135), and the values of the variables J and K are incremented by "L" (S136, 137). The above restoration process is repeated until the value of the variable I exceeds the input record length (S138), and when the value of the variable I exceeds the input record length, the value of the variable K is set as the output record length (S139), one record. The restoration process of is ended.

【００４１】なお、本実施例では参照レコードとの差分
データをそのまま圧縮結果として出力する場合について
説明したが、差分データにおける同一文字，繰り返しパ
ターンをさらに圧縮することも可能である。In the present embodiment, the case where the difference data from the reference record is output as it is as the compression result has been described, but it is possible to further compress the same character or the repeated pattern in the difference data.

【００４２】[0042]

【発明の効果】以上のように、本発明方法はファイル内
のどのレコードからでも圧縮レコードを復元できるの
で、ランダムアクセスが可能となり、また、バッチ処理
においても、対象データの抽出に関連する項目が非圧縮
であれば、大量データ中から目的のデータのみを抽出し
た後に、目的のデータのみの復元を行うことができてCP
U占有時間を大幅に削減するという優れた効果を奏す
る。As described above, according to the method of the present invention, since compressed records can be restored from any record in a file, random access is possible, and in batch processing, items related to extraction of target data are If it is uncompressed, only the target data can be extracted from a large amount of data and then only the target data can be restored.
It has the excellent effect of significantly reducing the U occupation time.

【００４３】さらに、本発明方法はファイル内レコード
に順序性を必要としないため、例えば１件目のデータを
サンプルとして２件目以降のソート対象データのソート
キー以外の部分を圧縮することによって、ソート作業デ
ータ量を減少させるので、I/O の減少、及びページング
の減少によりソート時間を大幅に削減するという優れた
効果を奏する。Further, since the method of the present invention does not require the ordering of the records in the file, for example, by using the first data as a sample and compressing the parts other than the sort key of the data to be sorted after the second data, sorting is performed. Since the amount of work data is reduced, there is an excellent effect that the sorting time is greatly reduced due to the reduction of I / O and paging.

[Brief description of drawings]

【図１】本発明方法の概要を示すレコードフォーマット
図である。FIG. 1 is a record format diagram showing an outline of a method of the present invention.

【図２】本発明方法におけるVSAMデータセットのフォー
マット図である。FIG. 2 is a format diagram of a VSAM data set in the method of the present invention.

【図３】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 3 is a flowchart for creating a compressed VSAM in the method of the present invention.

【図４】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 4 is a flowchart for creating a compressed VSAM in the method of the present invention.

【図５】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 5 is a flowchart for creating a compressed VSAM in the method of the present invention.

【図６】本発明方法における圧縮処理のフローチャート
である。FIG. 6 is a flowchart of a compression process in the method of the present invention.

【図７】本発明方法における圧縮処理のフローチャート
である。FIG. 7 is a flowchart of compression processing in the method of the present invention.

【図８】本発明方法における圧縮VSAMアクセスルーチン
のフローチャートである。FIG. 8 is a flow chart of a compressed VSAM access routine in the method of the present invention.

【図９】本発明方法における圧縮VSAMアクセスルーチン
のフローチャートである。FIG. 9 is a flow chart of a compressed VSAM access routine in the method of the present invention.

【図１０】本発明方法における圧縮VSAMアクセスルーチ
ンのフローチャートである。FIG. 10 is a flow chart of a compressed VSAM access routine in the method of the present invention.

【図１１】本発明方法における復元処理のフローチャー
トである。FIG. 11 is a flowchart of a restoration process in the method of the present invention.

Claims

[Claims]

1. In a file compression method for compressing a data portion having the same value in a plurality of records, it is predicted that the plurality of records have the same value in the data portion excluding the key item of the record to be compressed. A file compression characterized in that a reference record in which data is set is created in advance, the compression target record is compared with the reference record, and the data part of the compression target record whose value matches the value of the reference record is compressed. Method.

2. The file compression method according to claim 1, wherein the reference record is compressed.

3. The file compression method according to claim 1, wherein a code indicating compression is added to the compressed record.