JP2604492B2

JP2604492B2 - Data compression processing method for sequential files

Info

Publication number: JP2604492B2
Application number: JP2236683A
Authority: JP
Inventors: 惠田沢; 祐義中島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-09-06
Filing date: 1990-09-06
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH04116738A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、順編成ファイルの各レコードに、前レコー
ドと同一の文字列が存在し、さらに、同一文字が連続し
ているデータが含まれている場合の、効率のよいデータ
圧縮処理方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] In the present invention, each record of a sequential file includes data in which the same character string as the previous record exists and in which the same character continues. And an efficient data compression processing method.

[Conventional technology]

従来、順編成ファイルの圧縮は、前後レコードの同一
の文字列の圧縮手段か、同一連続文字の圧縮手段かのど
ちらか一方を用いるか両方の圧縮手段を併用して圧縮を
行い、圧縮順編成ファイルを残すようにしていた。Conventionally, a sequential file is compressed using compression means of the same character string of the preceding and succeeding records or compression means of the same continuous character, or compression using both compression means. I had to leave the file.

[Problems to be solved by the invention]

上述した従来の技術では、圧縮手段を選ぶため、各々
の順編成ファイルのデータ特性を調査するか、各順編成
ファイルに対して両方の圧縮手段で処理し圧縮効率を調
査しなければならないという欠点があった。また、両方
の圧縮手段を用いるためには、一方の圧縮手段で作成し
た圧縮順編成ファイルを再度他方の圧縮手段で圧縮処理
を行わなければならなった。In the above-mentioned conventional technique, a drawback is that the data characteristics of each sequential file must be investigated or the compression efficiency must be investigated by processing both sequential files for each sequential file in order to select the compression means. was there. Further, in order to use both compression means, the compression sequential organization file created by one compression means must be compressed again by the other compression means.

[Means for solving the problem]

本発明は、同一文字の連続データや前レコードの同一
文字列データある順編成ファイルのデータ圧縮処理方式
において、前記順編成ファイルからデータを１レコード
ずつ取出し予め用意された作業領域に格納するデータ入
力手段と、１つ前に入力したレコードとの比較により、
前記格納されたレコードを同一文字列データを有するこ
とを示すタイプＡフィールドの部分と、前記タイプＡフ
ィールド以外であることを示すタイプＢフィールドの部
分に分割し各フィールド毎に順番に圧縮指示を出すレコ
ード分割手段と、前記圧縮指示に伴う前記フィールドが
タイプＡの時に起動されタイプＡのフィールドを予め決
められた圧縮方法により圧縮情報に変換し予め用意され
た出力領域に出力する前後同一文字列圧縮手段と、前記
圧縮指示に伴う前記フィールドがタイプＢの時に起動さ
れ、前記タイプＢのフィールド内で同一文字の連続する
データがある場合に、前記連続する文字部分を予め決め
られた圧縮方法により圧縮情報と圧縮文字に変換し、連
続しない前記文字列を予め決められた非圧縮情報と非圧
縮文字列に変換し前記出力領域に出力する同一連続文字
圧縮手段と、前記圧縮／非圧縮情報と前記圧縮文字，非
圧縮文字列を前記出力領域に出力する場合に、前記前後
同一文字列の圧縮情報が出力レコード内で前記圧縮情報
の内で奇数番目の位置に、又前記連続同一文字の圧縮情
報が前記圧縮情報の内で偶数番目の前記位置にくるよう
に予め決められたダミー情報を追加する情報追加手段
と、前記各フィールドの前記圧縮処理終了時に起動され
前記出力領域に出力された前記圧縮レコードを順編成フ
ァイルに出力するデータ出力手段を有する。According to the present invention, in a data compression processing method for a sequential file having continuous data of the same character or the same character string data of a previous record, data is input one by one by taking data from the sequential file and storing the data in a work area prepared in advance. By comparing the means with the record just entered,
The stored record is divided into a portion of a type A field indicating that the record has the same character string data and a portion of a type B field indicating that the record is other than the type A field, and a compression instruction is sequentially issued for each field. A record dividing means for compressing the same character string before and after outputting when the field accompanying the compression instruction is type A, converts the type A field into compression information by a predetermined compression method, and outputs it to a previously prepared output area; Means and, when the field accompanying the compression instruction is of type B, when there is continuous data of the same character in the field of type B, the continuous character portion is compressed by a predetermined compression method. Information and compressed characters, and converts the discontinuous character string into predetermined uncompressed information and an uncompressed character string. The same continuous character compression means for outputting to the output area, and when outputting the compressed / uncompressed information and the compressed character / uncompressed character string to the output area, the compressed information of the same character string before and after is output to the output record. Information adding means for adding predetermined dummy information such that the compressed information of the same consecutive characters is located at the even-numbered position in the compressed information at odd-numbered positions in the compressed information. And a data output means for outputting the compressed record, which is activated at the end of the compression processing of each field and output to the output area, to a sequential file.

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be described with reference to the drawings.

第１図は、本発明の一実施例の構成を表す図である。
順編成ファイルの圧縮処理方式は、データ入力手段（10
1）と、レコード分割手段（102）と、前後同一文字列圧
縮手段（103）と、同一連続文字圧縮手段（104）と、情
報追加手段（105）と、データ出力手段（106）で構成さ
れている。FIG. 1 is a diagram showing the configuration of one embodiment of the present invention.
The compression processing method for sequential files is based on the data input means (10
1), a record dividing means (102), a preceding and succeeding identical character string compressing means (103), an identical continuous character compressing means (104), an information adding means (105), and a data output means (106). ing.

最初に、データ入力手段（101）により入力の順編成
ファイルからレコードを１件入力し作業領域に格納す
る。次に、レコード分割手段（102）により前レコード
と同じ文字列であるタイプＡと呼ばれるフィールドと、
その他のタイプＢと呼ばれるフィールドにレコードを分
割する。このタイプＡのフィールドを前後同一文字列圧
縮手段（103）により圧縮情報に変換し出力領域にセッ
トする。タイプＢのフィールドは、同一連続文字圧縮手
段（104）により、同一文字の連続している部分を圧縮
情報と圧縮文字に変換し出力領域にセットする。連続し
ていない部分は非圧縮情報と非圧縮文字列に変換し出力
領域にセットする。情報追加手段（105）は、圧縮情報
を出力領域にセットする場合、前後同一文字列の圧縮情
報であれば出力レコード内の圧縮情報の中で奇数番目の
位置に、連続同一文字の圧縮情報は偶数番目の位置とな
るようにダミー情報を追加する。出力領域に作られた圧
縮レコードは、データ出力手段（106）により出力用の
順編成ファイルに出力される。このように偶数，奇数で
区別できるようにすることにより、同一文字列か連続同
一文字の圧縮情報かの識別コードが不要になり圧縮効率
を高めることが出来る。First, one record is input from the input sequential file by the data input means (101) and stored in the work area. Next, a field called type A which is the same character string as the previous record by the record dividing means (102),
Split the record into other fields called Type B. This type A field is converted into compression information by the same front and rear character string compression means (103) and set in the output area. In the field of type B, a continuous portion of the same character is converted into compression information and a compressed character by the same continuous character compression means (104) and set in an output area. Non-consecutive parts are converted into uncompressed information and an uncompressed character string and set in the output area. When the compression information is set in the output area, the information addition means (105) sets the compressed information of the same character string before and after in the odd-numbered position in the compressed information in the output record, Dummy information is added so as to be an even-numbered position. The compressed record created in the output area is output to a sequential file for output by the data output means (106). By making it possible to distinguish even numbers and odd numbers in this way, it is not necessary to use an identification code for determining whether the same character string is compressed information of consecutive same characters, and the compression efficiency can be improved.

第２図から第８図は、第１図のデータ入力手段（10
1）、レコード分割手段（102）、前後同一文字列圧縮手
段（103）、同一連続文字圧縮手段（104）、情報追加手
段（105）、データ出力手段（106）にそれぞれ対応する
処理の実施例での流れ図である。2 to 8 show the data input means (10
Example of processing corresponding to 1), record dividing means (102), preceding and succeeding identical character string compressing means (103), identical continuous character compressing means (104), information adding means (105), and data output means (106) FIG.

各流れ図での詳細な説明を行う。第２図は、データ入
力手段（101）の流れ図で、順編成ファイルから１レコ
ードを入力し作業領域に格納する（201）、その後、制
御をレコード分割手段に渡す。A detailed description will be given in each flowchart. FIG. 2 is a flowchart of the data input means (101), in which one record is input from the sequential file and stored in the work area (201), after which control is passed to the record dividing means.

第３図は、レコード分割手段（102）の流れ図であ
る。前レコードとの比較（301）により、前レコードと
同一の文字列がある場合にその文字列を取り出し、タイ
プＡのフィールドとする（304から305）。同一の文字列
でない場合は、その文字列を取り出し、タイプＢのフィ
ールドとする（302から303）。レコード内データのフィ
ールド分割が全て終了するまでの301から305を繰り返す
（306）。FIG. 3 is a flowchart of the record dividing means (102). If there is a character string identical to the previous record as a result of comparison with the previous record (301), the character string is extracted and set as a type A field (304 to 305). If they are not the same character string, the character string is extracted and set as a type B field (302 to 303). Steps 301 to 305 are repeated until all the field divisions of the data in the record are completed (306).

フィールド分割が終了したレコードは、各フィールド
に対して、第４図と第５図の処理を行い（307）、各フ
ィールドの圧縮処理が終了するとデータ出力手段（10
6）を起動する（308）。The records for which field division has been completed are subjected to the processing of FIGS. 4 and 5 for each field (307), and when the compression processing of each field is completed, the data output means (10)
6) Start (308).

第４図は、前後同一文字列圧縮手段（103）の流れ図
である。受け取ったフィールドのタイプをチェックし
（401）、タイプＡである場合に、情報追加手段（105）
を呼び出した後、圧縮情報を出力領域にセットする（40
2から403）。404と405は、圧縮前と圧縮後の例を示すも
のである。FIG. 4 is a flow chart of the same character string compression means (103) before and after. The type of the received field is checked (401), and if it is type A, the information adding means (105)
After calling, set the compression information in the output area (40
2 to 403). Reference numerals 404 and 405 show examples before and after compression.

第５図は、同一連続文字圧縮手段（104）の流れ図で
ある。受け取ったフィールドのタイプをチェックし（50
1）、タイプＡであれば処理を終了し、タイプＢである
場合は、同一文字が連続しているかチェックし（50
2）、連続している時は、情報追加手段を呼び出し、圧
縮情報と連続した文字を１文字だけ出力領域にセットす
る（503から505）。同一文字が連続していない時は、連
続していない文字列を取り出し非圧縮文字列に変換し、
非圧縮情報と非圧縮文字列を出力領域にセットする（50
6から508）。フィールド内の圧縮処理が全て終了するま
で502から508を繰り返す（509）。510と511は圧縮前と
圧縮後の例を示すものである。FIG. 5 is a flowchart of the same continuous character compression means (104). Check the type of the received field (50
1) If it is type A, the process is terminated; if it is type B, it is checked whether the same character is continuous (50).
2) If they are continuous, the information adding means is called, and only one character continuous with the compressed information is set in the output area (503 to 505). If the same character is not consecutive, take out the non-continuous character string and convert it to an uncompressed character string.
Set uncompressed information and uncompressed character string in the output area (50
6-508). Steps 502 to 508 are repeated until all compression processing in the field is completed (509). 510 and 511 show examples before and after compression.

第６図は、情報追加手段（105）の流れ図である。現
在処理しているフィールドのタイプをチェックし（60
1）、タイプＢの場合、出力領域内の圧縮情報としての
位置が奇数であればダミー情報を出力領域に追加し（60
2から603）偶数の位置にする。タイプＡの場合、出力領
域内の圧縮情報としての位置が偶数であればダミー情報
を出力領域に追加し（604から605）奇数の位置に変更す
る。FIG. 6 is a flowchart of the information adding means (105). Check the type of field currently being processed (60
1) In the case of type B, if the position as compression information in the output area is odd, dummy information is added to the output area (60).
2 to 603) Set to even position. In the case of type A, if the position as compression information in the output area is even, dummy information is added to the output area (604 to 605) and changed to an odd position.

第７図は、データ出力手段（106）の流れ図である。
出力領域の圧縮レコードを出力用の順編成ファイルに出
力する（701）。FIG. 7 is a flowchart of the data output means (106).
The compressed records in the output area are output to a sequential file for output (701).

〔The invention's effect〕

以上説明したように本発明は、順編成ファイルの前後
同一文字列の圧縮手段と連続同一文字の圧縮手段の両圧
縮処理方式を用いることにより、利用者は、ファイルの
データ特性を考慮せず圧縮を行うことが出来るようにな
り、さらに、両方の圧縮手段を併用することで圧縮効率
向上させる効果がある。As described above, according to the present invention, the user can perform compression without considering the data characteristics of the file by using both the compression means of the same character string before and after the sequential file and the compression means of the same continuous character. Can be performed, and further, there is an effect of improving the compression efficiency by using both compression means in combination.

[Brief description of the drawings]

第１図は本発明の全体構成図、第２図はデータ入力手段
の流れ図、第３図はレコード分割手段の流れ図、第４図
は前後同一文字列圧縮手段の流れ図、第５図は連続同一
文字圧縮手段の流れ図、第６図は情報追加手段の流れ
図、第７図はデータ出力手段の流れ図である。 101……データ入力手段、102……レコード分割手段、10
3……前後同一文字列圧縮手段、104……同一連続文字圧
縮手段、105……情報追加手段、106……データ出力手
段。1 is an overall configuration diagram of the present invention, FIG. 2 is a flowchart of data input means, FIG. 3 is a flowchart of record division means, FIG. 4 is a flowchart of the same character string compression means before and after, and FIG. FIG. 6 is a flowchart of the character adding means, FIG. 6 is a flowchart of the information adding means, and FIG. 7 is a flowchart of the data output means. 101 ... data input means, 102 ... record division means, 10
3 ... compressing means for the same character string before and after; 104 ... compressing means for the same continuous character; 105 ... adding means; 106 ... data outputting means.

Claims

(57) [Claims]

In a data compression processing method for a sequential file having continuous data of the same character or the same character string data as a previous record, data is taken out from the sequential file one record at a time and stored in a previously prepared work area. A comparison between the data input means to be performed and the record input immediately before indicates that the stored record has a portion of the type A field indicating that the stored record has the same character string data, and indicates that the stored record is other than the type A field. A record dividing means for dividing the data into a type B field portion and sequentially giving a compression instruction for each field; and activating when the field accompanying the compression instruction is type A, and compressing the type A field by a predetermined compression method. The same character string compression means before and after converting to information and outputting to an output area prepared in advance,
Activated when the field associated with the compression instruction is of type B, and when there is continuous data of the same character in the type B field, the continuous character portion is compressed information and compressed character by a predetermined compression method. To
The same continuous character compression means for converting the non-continuous character string into predetermined non-compressed information and an uncompressed character string and outputting the same to the output area;
When outputting an uncompressed character string to the output area, the compressed information of the same character string before and after is placed at an odd-numbered position in the compressed information in the output record, and the compressed information of the same character is successively Information adding means for adding dummy information predetermined so as to be located at an even-numbered position in the information; and sequentially storing the compressed records, which are activated at the end of the compression processing of the respective fields and output to the output area. A data compression processing method for a sequential file having data output means for outputting to a organized file.