JP2639776B2

JP2639776B2 - File compression method

Info

Publication number: JP2639776B2
Application number: JP5021151A
Authority: JP
Inventors: 文夫木田
Original assignee: DAIMARU JOHO SENTAA KK
Current assignee: DAIMARU JOHO SENTAA KK
Priority date: 1993-02-09
Filing date: 1993-02-09
Publication date: 1997-08-13
Anticipated expiration: 2012-08-13
Also published as: JPH06236302A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、複数のレコードにて同
一値となるデータ項目を圧縮してVSAMデータセット等に
おけるランダムアクセスが可能なファイル圧縮方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file compression method capable of compressing data items having the same value in a plurality of records and enabling random access in a VSAM data set or the like.

【０００２】[0002]

【従来の技術】従来のファイル圧縮方法の主なものは以
下のとおりである。 (1) ファイル内レコードの類似性を利用して前レコード
との差分を圧縮レコードとして出力するシーケンシャル
データセット向けのファイル圧縮方法。 (2) レコード内の同一文字，繰り返しパターンを圧縮す
るランダムアクセス可能なファイル圧縮方法。 (3) 前レコードとの差分に含まれる同一文字，繰り返し
パターンを圧縮する前述の２方法を組み合わせたファイ
ル圧縮方法。2. Description of the Related Art The main ones of conventional file compression methods are as follows. (1) A file compression method for sequential datasets that outputs the difference from the previous record as a compressed record using the similarity of the records in the file. (2) A random access file compression method that compresses the same character and repeated pattern in a record. (3) A file compression method that combines the above two methods for compressing the same character and repeated pattern included in the difference from the previous record.

【０００３】[0003]

【発明が解決しようとする課題】以上のように、従来の
ファイル圧縮方法の(1) 及び(3) は、ファイル先頭から
の順次アクセスしかできず、原理的にランダムアクセス
が不可能であり、(2) はCPU 使用量が多い割に一般的に
圧縮効率があまり良くないという問題がある。As described above, in the conventional file compression methods (1) and (3), only sequential access from the beginning of a file is possible, and random access is impossible in principle. (2) has a problem that the compression efficiency is generally not so good in spite of the large amount of CPU usage.

【０００４】本発明はこのような問題点を解決するため
になされたものであって、複数の圧縮対象レコードにて
同一値となると予測されるデータを設定した参照レコー
ドを作成し、参照レコードと圧縮対象レコードとの差分
を圧縮レコードとして出力することにより圧縮効率が良
くてランダムアクセスが可能なファイル圧縮方法の提供
を目的とする。The present invention has been made to solve such a problem. A reference record in which data predicted to have the same value in a plurality of compression target records is created, and the reference record and the reference record are created. An object of the present invention is to provide a file compression method capable of performing random access with high compression efficiency by outputting a difference from a record to be compressed as a compressed record.

【０００５】[0005]

【課題を解決するための手段】本発明に係るファイル圧
縮方法は、複数のレコードでその値が同一であるデータ
部分を圧縮するファイル圧縮方法において、圧縮対象レ
コードのキー項目を除くデータ部分のうち、複数のレコ
ードにて同一値となると予測されるデータを設定した参
照レコードをあらかじめ作成しておき、圧縮対象レコー
ドと参照レコードとを比較してその値が参照レコードの
値と一致する圧縮対象レコードのデータ部分を圧縮する
ことを特徴とする。A file compression method according to the present invention is a file compression method for compressing a data portion having the same value in a plurality of records. , A reference record in which data that is expected to have the same value in a plurality of records is created in advance, and the record to be compressed is compared with the record to be compressed, and the value matches the value of the reference record. The data portion is compressed.

【０００６】また、本発明に係るファイル圧縮方法は、
参照レコードを圧縮することを特徴とする。Further, a file compression method according to the present invention
It is characterized in that the reference record is compressed.

【０００７】さらに、本発明に係るファイル圧縮方法
は、圧縮したレコードに圧縮したことを示すコードを付
与することを特徴とする。Further, a file compression method according to the present invention is characterized in that a code indicating compression is added to a compressed record.

【０００８】[0008]

【作用】本発明に係るファイル圧縮方法は、圧縮対象フ
ァイル内で出現頻度が高いデータを設定した参照レコー
ドを予め作成しておき、圧縮対象レコードと参照レコー
ドとの差分を圧縮結果レコードとして出力し、VSAMデー
タセット等におけるランダムアクセスを可能とする。According to the file compression method of the present invention, a reference record in which data having a high frequency of appearance is set in a compression target file is created in advance, and the difference between the compression target record and the reference record is output as a compression result record. , Random access to VSAM data sets and the like.

【０００９】[0009]

【実施例】本発明のファイル圧縮方法（以下、本発明方
法という）を実現するVCP (VSAMCOMPRESSION PACKAGE)
について図に基づき詳述する。図１はVCP における圧縮
方法の概要を示すレコードフォーマット図である。VCP
では、圧縮禁止エリアとするレコードキー項目以降の項
目に、複数のレコード間でその値が一致すると予想され
る値を設定した参照レコードをあらかじめ作成してお
き、圧縮対象レコードを参照レコードと比較して参照レ
コードと値が一致する部分（図中、ハッチングで示す）
を圧縮する。DESCRIPTION OF THE PREFERRED EMBODIMENTS VCP (VSAMCOMPRESSION PACKAGE) for realizing the file compression method of the present invention (hereinafter referred to as the method of the present invention)
Will be described in detail with reference to the drawings. FIG. 1 is a record format diagram showing an outline of a compression method in VCP. VCP
In this section, a reference record is created in advance with a value that is expected to match between multiple records in the items following the record key item that is a compression prohibited area, and the record to be compressed is compared with the reference record. Where the value matches the reference record (shown by hatching in the figure)
Compress.

【００１０】圧縮の際、一致部分のBYTE数が３BYTE以上
で252 BYTE以下のときは、識別コードX'FD' 及び一致部
分のレコード長(L2)の２BYTEからなる圧縮コードに置き
換え、252 BYTE以上で32760 BYTE以下のときは、識別コ
ードX'FE' 及び一致部分のレコード長(L1)の３BYTEから
なる圧縮コードに置き換える。なお、一致部分の連続が
２BYTE以下のときは圧縮によって逆にBYTE数が増して圧
縮効果が得られないので圧縮コードへの置き換えは行わ
ない。In the compression, if the number of BYTEs in the matching portion is 3 or more bytes and 252 BYTE or less, it is replaced with a compression code consisting of the identification code X'FD 'and the record length (L2) of the matching portion of 2 BYTE. If it is less than 32760 BYTE, it is replaced with a compression code consisting of the identification code X'FE 'and the 3BYTE of the record length (L1) of the matching part. If the number of consecutive matching parts is 2 BYTE or less, the number of BYTEs increases by compression, and the compression effect cannot be obtained. Therefore, replacement with a compressed code is not performed.

【００１１】VCP では圧縮コードの識別文字としてX'F
D' 及びX'FE' を使用するが、圧縮対象レコード内に圧
縮コード以外を意味するX'FE' ，X'FD' が存在する場合
も考えられるので、圧縮コード以外の意味のX'FD' ，X'
FE' が出現する場合は、X'FD'又はX'FE' を２個連続で
出力して圧縮コードとの判別を可能とする。In VCP, X'F
Although D 'and X'FE' are used, there may be X'FE 'and X'FD' which means other than the compression code in the record to be compressed. ', X'
When FE 'appears, two X'FD's or X'FE's are continuously output to enable discrimination from a compressed code.

【００１２】また、圧縮結果のレコード長が元のレコー
ド長を超える場合は圧縮処理をバイパスして元のレコー
ドをそのまま出力するので、同一データセットに非圧縮
レコードと圧縮レコードとの混在を可能とするため、圧
縮レコードには圧縮禁止エリアの直後に、X'FE FF FF'
及びレコード長(RL)からなる５BYTEの圧縮識別エリアを
設ける。Further, when the record length of the compression result exceeds the original record length, the original record is output as it is, bypassing the compression processing. Therefore, it is possible to mix non-compressed records and compressed records in the same data set. X'FE FF FF 'immediately after the compression prohibited area
And a 5-byte compression identification area including a record length (RL).

【００１３】図２はVCP により作成されたVSAMデータセ
ットのフォーマット図である。本実施例では、アクセス
時のレコード圧縮／復元のために、VSAMデータセット内
にALL LOW-VALUE (X'0000 …')をキーとする参照レコー
ドを保持し、データレコードは、この参照レコードを用
いてVCP 圧縮した可変長レコードとして記録される。VC
P では参照レコードは提供ユーティリティでのロード時
に対象VSAMファイル内に作成され、VCP の提供するアク
セスルーチンでは、この参照レコードを用いてデータの
復元／圧縮を行うため、アクセスの都度、利用者が外部
から参照レコードを与える必要はない。FIG. 2 is a format diagram of a VSAM data set created by the VCP. In this embodiment, a reference record using ALL LOW-VALUE (X'0000...) As a key is stored in the VSAM data set for record compression / decompression at the time of access. It is recorded as a variable-length record compressed using VCP. VC
In P, the reference record is created in the target VSAM file when loaded by the provided utility, and in the access routine provided by VCP, the data is restored / compressed using this reference record. You do not need to provide a reference record from.

【００１４】また、VCP のアクセスルーチンでは一般VS
AM（非VCP のVSAM）の取り扱いも可能とするため、一般
VSAMのALL LOW-VALUE キーのレコードとVCP 形式の参照
レコードとを識別する目的で、参照レコード自体も、全
てスペース (X'40')のデータとの間でVCP 圧縮した形式
としている。なお、参照レコードが圧縮できない場合を
想定し、ロード時に圧縮禁止エリアの直後が必ずX'FE F
F FF' となるよう考慮している。In the VCP access routine, a general VS
AM (non-VCP VSAM) handling is also possible,
In order to distinguish between the VSAM ALL LOW-VALUE key record and the VCP format reference record, the reference records themselves are all VCP-compressed between space (X'40 ') data. Note that, assuming that the reference record cannot be compressed, X'FE F
F FF 'is considered.

【００１５】参照レコードの保持方法としては、上述の
ALL LOW-VALUE のような特殊キーをもつデータとして圧
縮対象ファイル内に登録する以外に、参照レコードを、
圧縮対象ファイルごとに決定した名称をもつロードモジ
ュールとして作成し、ライブラリ管理を行う方法も可能
である。なお、参照レコードをロードモジュールとして
保持しても、圧縮対象ファイル内に登録した場合と同様
に、圧縮対象ファイルへの初回アクセス時に参照レコー
ドをメモリ空間に取り込み、以降のアクセスではメモリ
空間に取り込んだ参照レコードを使用するので、モジュ
ールのローディング等の無駄なアクセスを圧縮の都度行
う必要はない。The method of holding the reference record is as described above.
In addition to registering data with a special key such as ALL LOW-VALUE in the file to be compressed,
A method of creating a load module having a name determined for each compression target file and managing the library is also possible. Even if the reference record is stored as a load module, the reference record is loaded into the memory space at the first access to the file to be compressed, and is loaded into the memory space in subsequent accesses, as in the case where the reference record is registered in the file to be compressed. Since a reference record is used, it is not necessary to perform useless access such as loading of a module every time compression is performed.

【００１６】以下、VCP の具体的な処理手順を、図３乃
至図11のフローチャートに基づいて説明する。（Ａ）圧縮VSAM作成（図３〜図５参照）まず、参照レコードとなるサンプルレコードを圧縮す
る。圧縮対象のVSAMファイル及びサンプルレコードが登
録されているサンプルファイルをオープンし(S1, S2)、
サンプルファイルをリードする(S3)。VSAMファイルから
キー長，キー位置，及び最大レコード長の情報を得て(S
4)、キー長とキー位置からキーの終端位置を計算し(S
5)、サンプルレコードの先頭からキー終端位置までX'0
0' をセットする(S6)。次に、サンプルレコードのキー
終端から後を、全てスペース(X'40') のレコードをサン
プルとして後述のような圧縮処理を行う(S7)。Hereinafter, the specific processing procedure of the VCP will be described with reference to the flowcharts of FIGS. (A) Creating a compressed VSAM (see FIGS. 3 to 5) First, a sample record serving as a reference record is compressed. Open the VSAM file to be compressed and the sample file in which the sample record is registered (S1, S2),
Read the sample file (S3). Obtain key length, key position, and maximum record length information from the VSAM file (S
4) Calculate the key end position from the key length and key position (S
5), X'0 from the beginning of the sample record to the key end position
Set '0' (S6). Next, after the key end of the sample record, compression processing as described below is performed using all space (X'40 ') records as samples (S7).

【００１７】圧縮の結果、〔キー終端までの長さ＋圧縮
識別コードの長さ＋サンプルレコード圧縮データの長
さ〕を WRITEレコード長とし(S8)、 WRITEレコード長と
VSAM最大レコード長とを比較する(S9)。WRITEレコード
長がVSAM最大レコード長以下の場合は〔サンプルレコー
ド先頭〜キー終端＋圧縮識別コード＋サンプルレコード
圧縮データ〕をVSAMレコードとする(S10) 。As a result of the compression, [length to key end + length of compression identification code + length of compressed sample record data] is set as the WRITE record length (S8), and
Compare with VSAM maximum record length (S9). If the WRITE record length is equal to or less than the VSAM maximum record length, [sample record head to key end + compression identification code + sample record compressed data] is set as a VSAM record (S10).

【００１８】ステップS9の比較の結果、 WRITEレコード
長がVSAM最大レコード長を超える場合は圧縮識別コード
をサンプルレコードキー終端の直後に設け(S11) 、サン
プルレコードをVSAMレコード(S12) 、サンプルレコード
長を WRITEレコード長とする(S13) 。上述のようにして
得られた圧縮又は非圧縮のVSAMレコード、即ち、サンプ
ルレコードをVSAMファイルに書き込み(S14) 、サンプル
ファイルをクローズする(S15) 。If the result of the comparison in step S9 indicates that the WRITE record length exceeds the VSAM maximum record length, a compression identification code is provided immediately after the end of the sample record key (S11), the sample record is stored in the VSAM record (S12), and the sample record length is set. Is the WRITE record length (S13). The compressed or uncompressed VSAM record obtained as described above, that is, a sample record is written in the VSAM file (S14), and the sample file is closed (S15).

【００１９】次に、サンプルレコードを基に圧縮対象デ
ータを圧縮する。入力データファイルをオープンし(S1
6) 、入力データファイルをリードする(S17) 。入力レ
コードのキー終端から後をサンプルレコードをサンプル
として後述のような圧縮処理を行う(S18) 。〔キー終端
までの長さ＋圧縮識別コードの長さ＋入力レコード圧縮
データの長さ〕を WRITEレコード長とし(S19) 、 WRITE
レコード長とVSAM最大レコード長とを比較する(S20) 。Next, the data to be compressed is compressed based on the sample records. Open the input data file (S1
6) Then, the input data file is read (S17). A compression process as described later is performed using a sample record as a sample after the key end of the input record (S18). [Length to key end + compression identification code length + input record compression data length] is set as the WRITE record length (S19), and WRITE
The record length is compared with the VSAM maximum record length (S20).

【００２０】WRITEレコード長がVSAM最大レコード長以
下の場合は〔サンプルレコード先頭〜キー終端＋圧縮識
別コード＋サンプルレコード圧縮データ〕をVSAMレコー
ドとする(S21) 。一方、ステップS21 の比較の結果、 W
RITEレコード長がVSAM最大レコード長を超える場合は入
力レコードをVSAMレコード(S22) 、入力レコード長を W
RITEレコード長とする(S23) 。上述のようにして得られ
た圧縮又は非圧縮のVSAMレコードをライトし(S24) 、ス
テップS17 に戻って、全データレコードに対して同様の
処理を繰り返す。入力データファイルがエンドオブファ
イルに達したらVSAMファイル及び入力データファイルを
クローズする(S25, 26) 。If the WRITE record length is equal to or less than the VSAM maximum record length, [start of sample record to key end + compression identification code + compressed sample record data] is set as a VSAM record (S21). On the other hand, as a result of the comparison in step S21, W
If the RITE record length exceeds the VSAM maximum record length, set the input record to VSAM record (S22) and set the input record length to W
The RITE record length is set (S23). The compressed or uncompressed VSAM record obtained as described above is written (S24), and the process returns to step S17 to repeat the same process for all data records. When the input data file reaches the end of file, the VSAM file and the input data file are closed (S25, S26).

【００２１】（Ｂ）圧縮処理（図６，図７参照）変数I,J,K にそれぞれ“１”をセットし(S41) 、変数Ｉ
の値をレジスタSV-Iに退避する(S42) 。圧縮対象である
入力レコード（サンプルレコードの圧縮の場合はサンプ
ルレコード）のＩBYTE目とサンプルレコード（サンプル
レコードの圧縮の場合は全てスペースのレコード）のＪ
BYTE目とを比較する(S43) 。比較結果が一致した場合
は、変数I,J の値をそれぞれ“１”だけインクリーズし
て(S44) 、変数Ｉの値と入力レコード長とを比較し(S4
5) 、変数Ｉの値が入力レコード長以下であって、全レ
コード長に対する処理が終了していない場合はステップ
S43 に戻る。(B) Compression processing (see FIGS. 6 and 7) "1" is set to each of the variables I, J and K (S41), and the variable I
Is saved in the register SV-I (S42). IBYTE of the input record to be compressed (sample record in the case of sample record compression) and J of the sample record (all records in the case of sample record compression)
A comparison is made with the BYTE order (S43). If the comparison results match, the values of the variables I and J are each incremented by "1" (S44), and the value of the variable I is compared with the input record length (S4).
5) If the value of variable I is less than or equal to the input record length and the processing for all record lengths has not been completed,
Return to S43.

【００２２】変数Ｉの値が入力レコード長を超えた場
合、又は入力レコードのＩBYTE目がサンプルレコードの
ＪBYTE目と一致しなかった場合は、（I-SVI)を圧縮コー
ドの一致部分のレコード長の変数Ｌ（バイナリ２BYTE）
にセットする(S46) 。変数Ｌの値を“２”と比較し(S4
7) 、“２”を超えている場合は“252 ”とを比較し(S4
8) 、変数Ｌの値が“３”以上で“252 ”以下の場合は
X'FD' を出力レコードのＫBYTE目、変数Ｌの下位１BYTE
を出力レコードの(K+1) BYTE目にそれぞれ出力し(S49,
50) 、変数Ｋの値を“２”だけインクリーズしてステッ
プS62 に移行する(S51) 。If the value of the variable I exceeds the input record length, or if the IBYTE of the input record does not match the JBYTE of the sample record, (I-SVI) is replaced by the record length of the part where the compression code matches. Variable L (binary 2 BYTE)
(S46). The value of the variable L is compared with “2” (S4
7) If “2” is exceeded, compare with “252” (S4
8) If the value of the variable L is equal to or more than “3” and equal to or less than “252”,
X'FD 'is the KBYTE of the output record, the lower 1 BYTE of the variable L
Are output to the (K + 1) BYTE of the output record, respectively (S49,
50) Then, the value of the variable K is incremented by "2", and the routine goes to step S62 (S51).

【００２３】ステップS48 の比較の結果、変数Ｌの値が
“252 ”を超えている場合はX'FE'を出力レコードのＫB
YTE目、変数Ｌの上位１BYTEを出力レコードの(K+1) BYT
E目、下位１BYTEを出力レコードの(K+2) BYTE目に出力
し(S52, 53, 54) 、変数Ｋを“３”だけインクリーズし
てステップS62 に移行する(S55) 。As a result of the comparison in step S48, if the value of the variable L exceeds "252", X'FE 'is set to KB of the output record.
YK, top 1 BYTE of variable L is (K + 1) BYT of output record
The Eth and lower 1 BYTE are output to the (K + 2) BYTE of the output record (S52, 53, 54), the variable K is incremented by "3", and the routine goes to step S62 (S55).

【００２４】一方、ステップS47 の比較の結果、変数Ｌ
の値が“２”以下の場合、即ち、サンプルレコードと一
致しなかった場合又は一致部分が２BYTE以下で圧縮効果
が得られない場合は、変数Ｌの値と“０”とを比較し(S
56) 、変数Ｌが“０”の場合、即ち、サンプルレコード
と一致しなかった場合はステップS62 に移行し、入力レ
コードのＩBYTE目がX'FD' 又はX'FE' であるか否かを判
断した後(S64) 、入力レコードのＩBYTE目を出力レコー
ドのＫBYTE目に出力し(S67) 、変数I,J,K の値をそれぞ
れ“１”だけインクリーズし(S68, 69) 、ステップS62
に戻る。なお、ここでX'FD', X'FE'とのチェックが不要
なのは、サンプルレコードにはX'FD', X'FE'を含まない
ということを前提としているためである。従って、サン
プルレコードと一致したときはX'FD', X'FE'ではありえ
ない。On the other hand, as a result of the comparison in step S47, the variable L
Is less than or equal to “2”, that is, when the value does not match the sample record or when the matching portion is 2 BYTE or less and the compression effect cannot be obtained, the value of the variable L is compared with “0” (S
56) If the variable L is "0", that is, if the variable L does not match the sample record, the flow shifts to step S62 to determine whether or not the IBYTE of the input record is X'FD 'or X'FE'. After the determination (S64), the IBYTE of the input record is output to the KBYTE of the output record (S67), and the values of the variables I, J, and K are respectively incremented by "1" (S68, 69), and step S62 is performed.
Return to Here, the reason why the check for X'FD 'and X'FE' is unnecessary is because it is assumed that the sample record does not include X'FD 'and X'FE'. Therefore, when they match the sample record, they cannot be X'FD 'and X'FE'.

【００２５】ステップS56 の比較の結果、変数Ｌの値が
“０”でない場合は“１”と比較し(S57) 、変数Ｌの値
が“１”の場合は入力レコードの(I-1) BYTE目を出力レ
コードのＫBYTE目に出力し(S60) 、変数Ｋの値を“１”
だけインクリーズし(S61) 、ステップS62 に移行する。As a result of the comparison in step S56, if the value of the variable L is not "0", it is compared with "1" (S57). If the value of the variable L is "1", (I-1) of the input record is compared. The BYTE is output to the KBYTE of the output record (S60), and the value of the variable K is set to "1".
Then, the process proceeds to step S62.

【００２６】ステップS56, 57 の比較の結果、変数Ｌの
値が“２”の場合は入力レコードの(I-2) BYTE目を出力
レコードのＫBYTE目に出力し(S58) 、変数Ｋの値を
“１”だけインクリーズし(S59) 、さらに入力レコード
の(I-1) BYTE目を出力レコードのＫBYTE目に出力し(S6
0) 、変数Ｋの値を“１”だけインクリーズし(S61) 、
ステップS62 に移行する。As a result of the comparison in steps S56 and S57, if the value of the variable L is "2", the (I-2) BYTE of the input record is output to the KBYTE of the output record (S58), and the value of the variable K is Is incremented by “1” (S59), and the (I-1) BYTE of the input record is output to the KBYTE of the output record (S6).
0), the value of the variable K is increased by “1” (S61),
Move to step S62.

【００２７】次に、変数Ｉの値と入力レコード長とを比
較し(S62) 、変数Ｉの値が入力レコード長以下の場合は
入力レコードのＩBYTE目とサンプルレコードのＪBYTE目
とを比較する(S63) 。比較結果が一致しない場合は、入
力レコードのＩBYTE目をX'FE' 及びX'FD' と比較する(S
64) 。入力レコードのＩBYTE目がX'FE' 又はX'FD' のい
ずれかである場合は圧縮コードとの区別のため、入力レ
コードのＩBYTE目を出力レコードのＫBYTE目に出力し(S
65) 、変数Ｋの値を“１”だけインクリーズした後(S6
6) 、入力レコードのＩBYTE目（２個目のX'FD' 又はX'F
E' ）を出力レコードのＫBYTE目に出力し(S67) 、変数
I,J,K の値をそれぞれ“１”だけインクリーズする(S6
8,69) 。Next, the value of the variable I is compared with the input record length (S62). If the value of the variable I is equal to or less than the input record length, the IBYTE of the input record is compared with the JBYTE of the sample record (S62). S63). If the comparison result does not match, the IBYTE of the input record is compared with X'FE 'and X'FD' (S
64). If the IBYTE of the input record is either X'FE 'or X'FD', the IBYTE of the input record is output to the KBYTE of the output record to distinguish it from the compression code (S
65) After incrementing the value of the variable K by “1” (S6
6), IBYTE of the input record (the second X'FD 'or X'F
E ') is output to the KBYTE of the output record (S67), and the variable
The values of I, J, and K are each increased by “1” (S6
8,69).

【００２８】ステップS64 の比較の結果、入力レコード
のＩBYTE目がX'FE' 又はX'FD' のいずれでもない場合は
入力レコードのＩBYTE目を出力レコードのＫBYTE目に出
力し(S67) 、変数I,J,K の値をそれぞれ“１”だけイン
クリーズし(S68, 69) 、ステップS62 に戻る。ステップ
S62 の比較の結果、変数Ｉの値が入力レコード長を超え
た場合は変数Ｋの値を出力レコード長とし(S70) 、１つ
のデータレコードに対する圧縮処理を終了する。また、
ステップS63 の比較の結果、入力レコードのＩBYTE目と
サンプルレコードのＪBYTE目とが一致した場合はステッ
プS42 に戻る。If the result of the comparison in step S64 is that the IBYTE of the input record is not X'FE 'or X'FD', the IBYTE of the input record is output to the KBYTE of the output record (S67), and the variable The values of I, J, and K are each increased by “1” (S68, 69), and the process returns to step S62. Steps
As a result of the comparison in S62, if the value of the variable I exceeds the input record length, the value of the variable K is set as the output record length (S70), and the compression processing for one data record is terminated. Also,
As a result of the comparison in step S63, when the IBYTE of the input record matches the JBYTE of the sample record, the process returns to step S42.

【００２９】（Ｃ）圧縮VSAMアクセスルーチン（図８〜図10参照）サンプルレコードの準備完了を示す初期設定SWが“１”
であるか否かを判断し(S81）、“１”（準備完了）の場
合はステップS93 に移行し、“１”でない場合は初期設
定SWに“１”をセットして(S82) 、VSAMファイルをオー
プンし(S83）、VSAMファイルからキー長，キー位置，及
び最大レコード長情報を得て(S84）、キー長とキー位置
からキーの終端位置を計算する(S85）。(C) Compressed VSAM Access Routine (see FIGS. 8 to 10) Initial setting SW indicating completion of sample record preparation is “1”
Is determined (S81). If "1" (preparation completed), the process proceeds to step S93. If not "1", "1" is set to the initial setting SW (S82), and the VSAM The file is opened (S83), the key length, the key position, and the maximum record length information are obtained from the VSAM file (S84), and the end position of the key is calculated from the key length and the key position (S85).

【００３０】レコードの先頭からキー終端位置までX'0
0' をセットし(S86）、VSAMファイルをランダムにREAD
してX'00' をキーとするサンプルレコードを検索する(S
87）。VSAMファイル内にX'00' をキーとするレコードが
ない場合は圧縮処理SWに“０”をセットする(S89) 。X'0 from the beginning of the record to the key end position
Set to 0 '(S86) and read VSAM file randomly
To search for a sample record with X'00 'as a key (S
87). If there is no record using X'00 'as a key in the VSAM file, "0" is set to the compression processing SW (S89).

【００３１】VSAMファイル内にX'00' をキーとするレコ
ードがある場合は、キー終端位置の直後が圧縮識別コー
ド(X'FE FF FF') であるか否かを判断する(S88）。圧縮
識別コードの場合はレコードのキー終端から後を、全て
スペースのレコード(X'40')をサンプルとして後述のよ
うに復元処理を行い(S90）、復元結果をサンプルレコー
ドとし(S91) 、圧縮処理SWに“１”をセットする(S92)
。一方、ステップS88 の判断の結果、キー終端位置の
直後が圧縮識別コードでない場合は圧縮処理SWに“０”
をセットする(S89）。If there is a record having X'00 'as a key in the VSAM file, it is determined whether or not a compression identification code (X'FE FF FF') is located immediately after the key end position (S88). In the case of the compression identification code, after the key end of the record, the restoring process is performed as described later using all space records (X'40 ') as a sample (S90), and the decompression result is set as a sample record (S91), and "1" is set to the processing SW (S92)
. On the other hand, if the result of determination in step S88 is that the compression identification code is not immediately after the key end position, the compression processing SW is set to “0”.
Is set (S89).

【００３２】圧縮処理SWが“１”であるか否かを判断し
(S93）、“０”（圧縮ファイル以外）であれば通常のVS
AMアクセスを行い(S109)、“１”（圧縮ファイル）であ
れば、参照系処理であるか更新系処理であるかを判断す
る(S94, 95）。参照系処理の場合はVSAMをアクセスし、
アクセス要求された方法でVSAMファイルを検索し(S9
6）、VSAMファイル内に参照対象のレコードが存在しな
い場合は参照を終了する。It is determined whether or not the compression processing switch is "1".
(S93) If "0" (other than compressed file), normal VS
AM access is performed (S109), and if it is "1" (compressed file), it is determined whether the process is a reference process or an update process (S94, 95). For reference-related processing, access VSAM,
Search for VSAM file using requested access method (S9
6) If the record to be referenced does not exist in the VSAM file, terminate the reference.

【００３３】VSAMファイル内に参照対象のレコードが存
在する場合はキー直後のコードが圧縮識別コードである
か否かを判断し(S97）、圧縮識別コードである場合はVS
AMレコードのキー終端から後をサンプルレコードをサン
プルとして後述する復元処理を行い(S98）、復元結果レ
コードを参照結果レコードとして(S99）、参照を終了す
る。ステップS97 の判断の結果、キー直後が圧縮識別コ
ードでない場合は復元処理をバイパスしてVSAMレコード
をそのまま参照結果レコードとし(S100 ）、参照を終了
する。If a record to be referred to exists in the VSAM file, it is determined whether or not the code immediately after the key is a compression identification code (S97).
Restoration processing described later is performed using the sample record as a sample after the key end of the AM record (S98), the restoration result record is set as a reference result record (S99), and the reference is terminated. If the result of the determination in step S97 is that the compression identification code is not immediately after the key, the decompression process is bypassed and the VSAM record is used as a reference result record (S100), and the reference is terminated.

【００３４】一方、更新系処理の場合は、新たにVSAMフ
ァイルに登録する入力レコードのキー終端から後をサン
プルレコードをサンプルとして前述のような圧縮処理を
行う(S101)。〔キー終端までの長さ＋圧縮識別コードの
長さ＋入力レコード圧縮データの長さ〕を WRITEレコー
ド長とし(S102)、 WRITEレコード長とVSAM最大レコード
長とを比較する(S103)。On the other hand, in the case of update processing, the above-described compression processing is performed using a sample record as a sample after the key end of an input record to be newly registered in the VSAM file (S101). [Length to key end + compression identification code length + input record compression data length] is set as the WRITE record length (S102), and the WRITE record length is compared with the VSAM maximum record length (S103).

【００３５】WRITEレコード長がVSAM最大レコード長以
下の場合は、〔入力レコード先頭〜キー終端＋圧縮識別
コード＋入力レコード圧縮データ〕をVSAMレコードと
し、VSAMファイルをアクセスする(S104)。また、ステッ
プS103の比較の結果、 WRITEレコード長がVSAM最大レコ
ード長を超える場合は圧縮しない入力レコードそのまま
をVSAMレコードとし(S105)、入力レコード長を WRITEレ
コード長として(S106)、VSAMファイルをアクセスする(S
107)。If the WRITE record length is equal to or less than the VSAM maximum record length, [input record head-key end + compression identification code + input record compression data] is set as the VSAM record, and the VSAM file is accessed (S104). If the WRITE record length exceeds the VSAM maximum record length as a result of the comparison in step S103, the input record that is not compressed is used as the VSAM record (S105), the input record length is used as the WRITE record length (S106), and the VSAM file is accessed. (S
107).

【００３６】（Ｄ）復元処理（図11参照）変数I,J,K にそれぞれ“１”をセットし(S121)、入力レ
コード（サンプルレコードの復元の場合はサンプルレコ
ード）のＩBYTE目がX'FD' 又はX'FE' であるか否かを判
断する(S122)。入力レコードのＩBYTE目がX'FD' 又はX'
FE' のいずれでもない場合は入力レコードのＩBYTE目を
出力レコードのＫBYTE目に出力し(S123)、変数I,J,K の
値をそれぞれ“１”だけインクリーズして(S124, 125)
、ステップS138に移行する。(D) Restoration processing (see FIG. 11) "1" is set to each of the variables I, J, and K (S121), and the IBYTE of the input record (the sample record in the case of restoring the sample record) is X '. It is determined whether it is FD 'or X'FE' (S122). The IBYTE of the input record is X'FD 'or X'
If it is not FE ', the IBYTE of the input record is output to the KBYTE of the output record (S123), and the values of the variables I, J, and K are each incremented by "1" (S124, 125).
Then, control goes to a step S138.

【００３７】ステップS122の判断の結果、入力レコード
のＩBYTE目がX'FD' 又はX'FE' のいずれかである場合は
入力レコードのＩBYTE目を(I+1) BYTE目と比較し(S12
6)、一致する場合（圧縮コード以外のデータの場合）は
変数Ｉの値を“１”だけインクリーズし(S127)、入力レ
コードのＩBYTE目を出力レコードのＫBYTE目に出力し(S
123)、変数I,J,K の値をそれぞれ“１”だけインクリー
ズして(S124, 125) 、ステップS138に移行する。If the result of the determination in step S122 is that the IBYTE of the input record is either X'FD 'or X'FE', the IBYTE of the input record is compared with the (I + 1) BYTE (S12).
6) If they match (in the case of data other than the compression code), the value of the variable I is incremented by “1” (S127), and the IBYTE of the input record is output to the KBYTE of the output record (S127).
123), the values of the variables I, J, and K are each incremented by "1" (S124, 125), and the flow shifts to step S138.

【００３８】ステップS126での比較の結果、入力レコー
ドのＩBYTE目と(I+1) BYTE目とが一致しない場合（圧縮
コードの場合）は入力レコードのＩBYTE目がX'FE' であ
るか否かを判断する(S128)。X'FE' の場合は入力レコー
ドの(I+1）BYTE目を変数Ｌの上位１BYTEに(S129)、入力
レコードの(I+2）BYTE目を変数Ｌの下位１BYTEにセット
し(S130)、変数Ｉの値を“３”だけインクリーズする(S
131)。As a result of the comparison in step S126, if the IBYTE of the input record does not match the (I + 1) BYTE (in the case of the compressed code), whether or not the IBYTE of the input record is X'FE ' It is determined (S128). In the case of X'FE ', the (I + 1) BYTE of the input record is set to the upper 1 BYTE of the variable L (S129), and the (I + 2) BYTE of the input record is set to the lower 1 BYTE of the variable L (S130). , The value of the variable I is increased by “3” (S
131).

【００３９】ステップS128の比較の結果、入力レコード
のＩBYTE目がX'FD' の場合は変数Ｌに一旦“０”をセッ
トしてクリアし(S132)、入力レコードの(I+1）BYTE目を
変数Ｌの下位１BYTEにセットして(S133)、変数Ｉの値を
“２”だけインクリーズする(S134)。As a result of the comparison in step S128, if the IBYTE of the input record is X'FD ', the variable L is temporarily set to "0" and cleared (S132), and the (I + 1) BYTE of the input record is cleared. Is set to the lower 1 BYTE of the variable L (S133), and the value of the variable I is increased by "2" (S134).

【００４０】圧縮コードのレコード長を変数Ｌにセット
した後、サンプルレコード（サンプルレコードの復元の
場合は全てスペースのレコード）のＪBYTE目からＬBYTE
分のデータを出力レコードのＫBYTE目からＬBYTEに出力
し(S135)、変数J,K の値をそれぞれ“Ｌ”だけインクリ
ーズする(S136, 137）。以上の復元処理を変数Ｉの値が
入力レコード長を超えるまで繰り返し(S138)、変数Ｉの
値が入力レコード長を超えた時点で変数Ｋの値を出力レ
コード長とし(S139)、１つのレコードの復元処理を終了
する。After setting the record length of the compression code to the variable L, the LBYTE from the JBYTE of the sample record (in the case of restoring the sample record, the record of all spaces)
The minute data is output to LBYTE from the KBYTE of the output record (S135), and the values of the variables J and K are incremented by "L" respectively (S136 and 137). The above restoration processing is repeated until the value of the variable I exceeds the input record length (S138), and when the value of the variable I exceeds the input record length, the value of the variable K is set as the output record length (S139). Is completed.

【００４１】なお、本実施例では参照レコードとの差分
データをそのまま圧縮結果として出力する場合について
説明したが、差分データにおける同一文字，繰り返しパ
ターンをさらに圧縮することも可能である。Although the present embodiment has been described with reference to the case where the difference data from the reference record is output as it is as the compression result, the same character and the repetition pattern in the difference data can be further compressed.

【００４２】[0042]

【発明の効果】以上のように、本発明方法はファイル内
のどのレコードからでも圧縮レコードを復元できるの
で、ランダムアクセスが可能となり、また、バッチ処理
においても、対象データの抽出に関連する項目が非圧縮
であれば、大量データ中から目的のデータのみを抽出し
た後に、目的のデータのみの復元を行うことができてCP
U占有時間を大幅に削減するという優れた効果を奏す
る。また圧縮の際には参照レコードとの比較だけで足
り、高速の圧縮が可能である。更に圧縮して記録されて
いるデータのいずれかが消去されても他のデータの伸長
は何ら問題なく行える。 As described above, according to the method of the present invention, a compressed record can be restored from any record in a file, so that random access is possible. If it is uncompressed, after extracting only the desired data from the large amount of data, only the desired data can be restored
It has an excellent effect of greatly reducing the U occupation time. Also, when compressing, it is only necessary to compare with the reference record.
High-speed compression is possible. It is compressed and recorded
Expansion of other data even if any of the data
Can be done without any problems.

【００４３】さらに、本発明方法はファイル内レコード
に順序性を必要としないため、例えば１件目のデータを
サンプルとして２件目以降のソート対象データのソート
キー以外の部分を圧縮することによって、ソート作業デ
ータ量を減少させるので、I/O の減少、及びページング
の減少によりソート時間を大幅に削減するという優れた
効果を奏する。Further, since the method of the present invention does not require the order of the records in the file, for example, the data other than the sort key of the data to be sorted is compressed by using the first data as a sample to sort the data. Since the amount of work data is reduced, an excellent effect of greatly reducing sort time due to reduction of I / O and paging is achieved.

[Brief description of the drawings]

【図１】本発明方法の概要を示すレコードフォーマット
図である。FIG. 1 is a record format diagram showing an outline of a method of the present invention.

【図２】本発明方法におけるVSAMデータセットのフォー
マット図である。FIG. 2 is a format diagram of a VSAM data set in the method of the present invention.

【図３】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 3 is a flowchart of creating a compressed VSAM in the method of the present invention.

【図４】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 4 is a flowchart of creating a compressed VSAM in the method of the present invention.

【図５】本発明方法における圧縮VSAM作成のフローチャ
ートである。FIG. 5 is a flowchart of creating a compressed VSAM in the method of the present invention.

【図６】本発明方法における圧縮処理のフローチャート
である。FIG. 6 is a flowchart of a compression process in the method of the present invention.

【図７】本発明方法における圧縮処理のフローチャート
である。FIG. 7 is a flowchart of a compression process in the method of the present invention.

【図８】本発明方法における圧縮VSAMアクセスルーチン
のフローチャートである。FIG. 8 is a flowchart of a compressed VSAM access routine in the method of the present invention.

【図９】本発明方法における圧縮VSAMアクセスルーチン
のフローチャートである。FIG. 9 is a flowchart of a compressed VSAM access routine in the method of the present invention.

【図１０】本発明方法における圧縮VSAMアクセスルーチ
ンのフローチャートである。FIG. 10 is a flowchart of a compressed VSAM access routine in the method of the present invention.

【図１１】本発明方法における復元処理のフローチャー
トである。FIG. 11 is a flowchart of a restoration process in the method of the present invention.

Claims

(57) [Claims]

In a file compression method for compressing a data portion having the same value in a plurality of records, it is predicted that a plurality of records have the same value in a data portion excluding a key item of a record to be compressed. A file compression method in which a reference record with data set is created in advance, and the data portion of the compression target record whose value matches the reference record value is compared with the compression target record and the reference record. Method.

2. The file compression method according to claim 1, wherein the reference record is compressed.

3. The file compression method according to claim 1, wherein a code indicating compression is added to the compressed record.