JPH04360246A

JPH04360246A - Device for compressing file

Info

Publication number: JPH04360246A
Application number: JP3134694A
Authority: JP
Inventors: Hanae Nozaki; 野崎　華恵; Satoshi Ito; 聡伊藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-06-06
Filing date: 1991-06-06
Publication date: 1992-12-14

Abstract

PURPOSE:To attain efficient file compression by independently storing a common part and a non-common part excluding the common part when common contents are included in plural files. CONSTITUTION:When a user inputs a file compression processing command and n existing file names required to be compressed, an operating system(OS) mutually compares the contents data of the n files by a comparing processing part 11 and judges whether a completely coincident part of contents exists or not. At the time of judging the existence of the coincident contents part, the part is regarded as a common part, a common part control word indicating the common part is added to the common part by a common part processing part 12 and the common part and its control word are stored in a secondary storage device 30. Then processing for specifying that the compared files have the specific common part on their specific positions is applied to the compared files and the specified files are stored in the device 30. The file processing is repeated for all the n files. Thus efficient file compression can be attained.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、計算機システムにおい
てオペレーティングシステムによって行われるファイル
管理の方式に係わり、特に共通部分を有する複数のファ
イルを効率良く圧縮するためのファイル圧縮装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file management method performed by an operating system in a computer system, and more particularly to a file compression device for efficiently compressing a plurality of files having common parts.

【０００２】0002

【従来の技術】計算機システムにおいて、記憶領域であ
る二次記憶装置を効率的に使用してデータの蓄積経費の
削減を行い、かつ転送経費の経済化をはかるため、デー
タ圧縮技術が提案され実用化されている。データ圧縮と
は、データ変換を行うことによってデータ内の冗長度を
抑圧し、データをより短いデータ長で簡潔に表現するこ
とである。[Background Art] In computer systems, data compression technology has been proposed and put into practical use in order to efficiently use secondary storage devices, which are storage areas, to reduce data storage costs and to economize transfer costs. has been made into Data compression refers to suppressing redundancy in data by performing data conversion, and expressing data concisely with a shorter data length.

【０００３】現在、広く使用されているデータ圧縮法と
して、Ｈｕｆｆｍａｎの最適符号化法がある。これは、
データ中のパターンの出現頻度を統計的に調べ、出現頻
度の高いパターンほど短い符号を割り当てるもので、パ
ターンの数を多くすることにより平均符号長を短くする
ことが可能となる。また、Ｚｉｖ−Ｌｅｍｐｅｌのデー
タ圧縮法では、データの統計的性質や定常性を仮定する
必要がなく、任意の記号列に直接適用できる。このデー
タ圧縮法によると、長い記号列を効率良く圧縮すること
が可能なため、計算機システムで作られる各種ファイル
の圧縮に適している。[0003] As a data compression method that is currently widely used, there is Huffman's optimal encoding method. this is,
The frequency of appearance of patterns in data is statistically investigated, and shorter codes are assigned to patterns that appear more frequently.By increasing the number of patterns, it is possible to shorten the average code length. Furthermore, the Ziv-Lempel data compression method does not require assuming statistical properties or stationarity of data, and can be directly applied to any symbol string. According to this data compression method, long symbol strings can be compressed efficiently, so it is suitable for compressing various files created by computer systems.

【０００４】しかしながら、この種のデータ圧縮技術に
あっては次のような問題があった。即ち、複数のファイ
ルが等しい内容のデータを有している場合、記憶領域（
二次記憶装置）には同じデータがいくつも保存されるこ
とになる。この二次記憶装置における無駄な領域は、内
容が共通である部分のサイズが大きいほど、またそれを
有するファイル数が多いほど大きくなり、二次記憶装置
の効率的な利用の妨げとなっている。However, this type of data compression technology has the following problems. In other words, if multiple files have the same data, the storage area (
Multiple copies of the same data will be stored in the secondary storage device. This wasted area in the secondary storage device increases as the size of the part with common content increases and the number of files that have it increases, which hinders the efficient use of the secondary storage device. .

【０００５】一方、上記のようにデータ圧縮処理は、デ
ータファイルに対して個別に実行され、個々のデータフ
ァイルがよりサイズの小さいデータファイルに変換され
る。そのため、複数のファイルに共通の内容が内在して
いることによる二次記憶装置利用上の不経済性を、従来
のデータ圧縮処理では排除することができない。On the other hand, as described above, data compression processing is performed on data files individually, and each data file is converted into a smaller data file. Therefore, conventional data compression processing cannot eliminate the uneconomical effects of using a secondary storage device due to the common content inherent in a plurality of files.

【０００６】また、この問題点を解決するには、複数の
ファイルの内容を考慮した統合的な処理が必要であるが
、これまでこのような処理方法は全く実施されておらず
、未だ実用化されていない。[0006] Also, in order to solve this problem, integrated processing that takes into account the contents of multiple files is required, but such a processing method has not been implemented so far and has not yet been put into practical use. It has not been.

【０００７】[0007]

【発明が解決しようとする課題】このように従来のデー
タ圧縮処理は、個々のデータファイルに関してはそれぞ
れ圧縮効果があるものの、複数のファイルに共通の内容
が内在していることによる二次記憶装置利用上の不経済
性を排除することはできなかった。[Problems to be Solved by the Invention] As described above, although conventional data compression processing has a compression effect on individual data files, it is difficult to compress secondary storage devices due to the common content inherent in multiple files. Uneconomical use could not be eliminated.

【０００８】本発明は、上記問題点を解決するためにな
されたもので、その目的とするところは、複数のファイ
ルに共通の内容を二次記憶装置に重複して記憶する等の
不都合を避けることができ、二次記憶装置のより効率的
な利用が可能となるファイル圧縮装置を提供することに
ある。The present invention has been made to solve the above-mentioned problems, and its purpose is to avoid inconveniences such as redundant storage of content common to multiple files in a secondary storage device. An object of the present invention is to provide a file compression device that enables more efficient use of a secondary storage device.

【０００９】[0009]

【課題を解決するための手段】本発明の骨子は、複数の
ファイルに共通の内容がある場合に、共通部分とこの共
通部分を除いた非共通部分とを独立して格納することに
ある。[Means for Solving the Problems] The gist of the present invention is to independently store common parts and non-common parts other than the common parts when a plurality of files have common contents.

【００１０】即ち本発明は、複数のファイルを効率良く
圧縮するためのファイル圧縮装置において、第１の記憶
部に格納された複数のファイルに対しその内容を比較す
る手段と、この比較結果で内容が一致している部分を抜
き出す手段と、抜き出した共通部分を共通ファイルとし
て第１の記憶部又は第２の記憶部に格納する手段と、共
通部分を抜き出した後のファイルを固有ファイルとして
第１の記憶部又は第２の記憶部に格納する手段とを設け
るようにしたものである。That is, the present invention provides a file compression device for efficiently compressing a plurality of files, which includes means for comparing the contents of a plurality of files stored in a first storage unit, and means for extracting a portion in which the common portions match; means for storing the extracted common portion as a common file in a first storage unit or a second storage unit; and means for storing a file after extracting the common portion as a unique file in the first storage unit. or a means for storing the information in the second storage section.

【００１１】本発明においては、複数のファイルの内容
を比較した結果、一つのファイル内で判定される共通部
分が複数個あってもよい。また、比較はユーザーが指定
した全てのファイル間で行うだけに限らず、指定したフ
ァイルの内の任意の複数ファイルに対して比較処理を行
い、そのファイル間の共通部分を判定するようにしても
よい。[0011] In the present invention, as a result of comparing the contents of a plurality of files, there may be a plurality of common parts determined within one file. In addition, the comparison is not limited to only between all files specified by the user, but it is also possible to perform comparison processing on arbitrary multiple files among the specified files and determine the common parts between the files. good.

【００１２】0012

【作用】本発明によれば、以下のような状況において非
常に効率の良いファイル圧縮を行うことができる。[Operation] According to the present invention, very efficient file compression can be performed in the following situations.

【００１３】例えば、系の時間発展を調べるため計算機
によるシミュレーションが行われているものとする。こ
のようなシミュレーションでは、時間の離散化を行い、
その離散化された各タイムステップ毎に系の状態を計算
して時間発展を追う。計算された各タイムステップ毎の
現象の変化を把握するには、シミュレーション結果のグ
ラフィック化が非常に有効な支援手段であるため、通常
、あるタイムステップ間隔でグラフィック用のデータを
アウトプットしてデータファイルを作成する。そして、
そのデータファイルを元に、一タイムステップ毎の静止
画をつなげて動画を作る。For example, assume that a computer simulation is being performed to investigate the time evolution of a system. In such simulations, time is discretized and
The state of the system is calculated for each discretized time step and the time evolution is followed. Graphical representation of simulation results is an extremely effective means of assisting in understanding the changes in phenomena at each calculated time step. Create a file. and,
Based on that data file, a video is created by connecting still images from each time step.

【００１４】一般にグラフィック用としてアウトプット
されるデータ量は、一つのタイムステップ分でも大きく
、データファイル全体のサイズは膨大なものとなる。ところで、シミュレーションの特徴の一つとして、実行
者が希望する初期値，条件のもとでの系の状態の追跡が
可能であるという恣意性をあげることができる。そのた
め、同じ初期値において条件を変えて何回も計算を行い
、条件の違いによる時間発展の違いを調べるというシミ
ュレーションの実行例が考えられる。Generally, the amount of data output for graphics is large even for one time step, and the size of the entire data file is enormous. Incidentally, one of the characteristics of simulation is its arbitrariness in that it is possible to trace the state of the system under the initial values and conditions desired by the executor. Therefore, an example of executing a simulation is to perform calculations many times with the same initial value while changing conditions, and to examine differences in time evolution due to different conditions.

【００１５】このように、同じ初期値で条件を変えたシ
ミュレーションを繰り返し行い、その結果をグラフィッ
ク化するためにデータファイルを作成した場合、データ
ファイルにアウトプットされる初期データ（一タイムス
テップ分のデータであり、動画における初期画面となる
）は全てのデータファイルで共通のものとなる。しかし
、前述したようにグラフィック用のデータのサイズは一
つのタイムステップでも非常に大きなものであるため、
いくつものデータファイルが同じ初期データを共有して
いることは、二次記憶装置中に無駄な領域を作り出すこ
とになり、記憶装置の効率的な利用の妨げとなっている
。In this way, when simulations are repeated with the same initial values but with different conditions, and a data file is created to graphically represent the results, the initial data (for one time step) is output to the data file. data (which is the initial screen in the video) is common to all data files. However, as mentioned above, the size of graphics data is extremely large even for one time step, so
Having multiple data files sharing the same initial data creates wasted space in the secondary storage device, which hinders efficient use of the storage device.

【００１６】このような場合、本発明のファイル圧縮処
理を行えば、初期データはただ一つ保存されることにな
るため、二次記憶装置の効率的な使用が可能となり、か
つユーザーは圧縮処理の行われたデータファイルを、そ
れぞれが初期データを保持しているものとして扱うこと
ができる。言うまでも無く、初期データがファイル間で
共通である場合、全てのデータファイルに初期データを
アウトプットする必要はないのであるが、系の時間発展
の様子を視覚的に的確に捕らえ、そこに現れている物理
的，化学的現象をより良く理解するためには、グラフィ
ック用のデータファイルに初期データが含まれているこ
とが非常に望ましい。[0016] In such a case, if the file compression process of the present invention is performed, only one piece of initial data will be saved, so the secondary storage device can be used efficiently, and the user can The data files that have undergone this process can be treated as each holding initial data. Needless to say, if the initial data is common between files, it is not necessary to output the initial data to all data files, but it is possible to visually accurately capture the time evolution of the system and to In order to better understand the physical and chemical phenomena occurring, it is highly desirable that initial data be included in the graphics data file.

【００１７】また、初期データとそれ以降のタイムステ
ップのデータとを分割して、別のファイルにアウトプッ
トすることも可能であるが、このようにデータファイル
を分けた場合、その後のグラフィック化の処理やデータ
ファイルの管理がかなり煩雑なものとなる。そのため、
ファイル圧縮処理は二次記憶装置の有効利用のためのみ
でなく、ユーザーが計算機システムを能率的に使用する
上でも効果が発揮されるといえる。[0017]Also, it is possible to divide the initial data and the data of subsequent time steps and output them to separate files, but if the data files are divided in this way, it will be difficult to create graphics afterward. Processing and data file management become quite complicated. Therefore,
It can be said that file compression processing is effective not only for effective use of secondary storage devices, but also for users to use computer systems efficiently.

【００１８】その他、ファイル圧縮処理が有効である例
として、各時刻ｔでのｘの値を計算してｔとｘの値を出
力するプログラムの実行が考えられる。このプログラム
を、時間刻みは同じで条件を変えて何回も実行した場合
、データファイルに出力される時刻ｔに関する数列は、
作成される全てのデータファイルで等しくなる。従って
、ｔの値という共通の内容を含むこれらのデータファイ
ルに対しても、このファイル圧縮処理は効果を持つ。ま
た、僅かな変更で実現できるような新しい機能や異なる
機能を実行可能なプログラムに新たに持たせたい場合、
そのソースファイルをコピーし、修正して使用する。こ
のような場合にも共通の内容を含むファイルがいくつか
作られることになるため、ファイル圧縮処理を行うこと
で二次記憶装置の有効利用に貢献することができる。Another example in which file compression processing is effective is the execution of a program that calculates the value of x at each time t and outputs the values of t and x. If you run this program many times with the same time step but different conditions, the sequence of numbers related to time t output to the data file will be:
It will be the same for all data files created. Therefore, this file compression processing is also effective for these data files that include common content, such as the value of t. Also, if you want to add new or different functionality to an executable program that can be achieved with only a few changes,
Copy the source file, modify it, and use it. Even in such a case, several files containing common contents will be created, so performing file compression processing can contribute to effective use of the secondary storage device.

【００１９】また、全てのファイル間で行うだけに限ら
ず、指定したファイルの内の任意の複数ファイルに対し
て比較処理を行うことにより、圧縮効率が最大となるよ
うに処理することができる。例えば、３個のファイルが
存在しているときに、第１のファイルに対し第２のファ
イルは共通部分が多く、第３のファイルは共通部分が極
めて少ないとする。この場合、全てのファイルを比較処
理すると、共通部分は第１と第３のファイルの共通部分
のみとなり、圧縮効率は極めて低くなる。これに対し、
第１と第２のファイルを指定して比較処理を行えば、共
通部分のデータ量が多くなり、圧縮効率の向上をはかる
ことが可能となる。Furthermore, by performing comparison processing not only on all files but also on any plurality of specified files, it is possible to maximize compression efficiency. For example, suppose that there are three files, and the second file has many common parts with the first file, and the third file has very few common parts. In this case, if all the files are compared, the only common part will be the common part between the first and third files, and the compression efficiency will be extremely low. On the other hand,
By specifying the first and second files and performing comparison processing, the amount of data in the common portion increases, making it possible to improve compression efficiency.

【００２０】[0020]

【実施例】以下、本発明の実施例を図面を参照して説明
する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

【００２１】図１は、本発明の一実施例に係わるファイ
ル圧縮装置の概略構成を示すブロック図である。図中１
０は本発明に係わるファイル圧縮処理部であり、複数の
ファイルを比較する比較処理部１１，比較したファイル
の共通部分を取り出す共通部分処理部１２，及び各ファ
イルを共通部分と非共通部分に分けるファイル処理部１
３等から構成されている。２０は磁気ディスク等の二次
記憶装置（第１の記憶部）、３０も同様に磁気ディスク
等の二次記憶装置（第２の記憶部）である。なお、これ
らの二次記憶装置２０，３０は必ずしも独立したもので
はなく、共通のものであってもよい。上記装置によるフ
ァイル圧縮の動作を、図２〜図１３に示すフローチャー
ト及びファイル構造を参照して説明する。FIG. 1 is a block diagram showing a schematic configuration of a file compression device according to an embodiment of the present invention. 1 in the diagram
0 is a file compression processing unit according to the present invention, which includes a comparison processing unit 11 that compares multiple files, a common part processing unit 12 that extracts common parts of the compared files, and divides each file into common parts and non-common parts. File processing section 1
It is made up of 3rd grade. 20 is a secondary storage device (first storage section) such as a magnetic disk, and 30 is also a secondary storage device (second storage section) such as a magnetic disk. Note that these secondary storage devices 20 and 30 are not necessarily independent, and may be a common device. The operation of file compression by the above device will be explained with reference to the flowcharts and file structures shown in FIGS. 2 to 13.

【００２２】まず、基本的なファイル圧縮処理の手順を
、図２に示すフローチャートに従って説明する。ユーザ
ーがファイル圧縮処理のコマンドと圧縮を希望する既存
のｎ個のファイル名を入力する（ステップＳ１）と、オ
ペレーティングシステムは指定されたｎ個のファイル間
で内容データの比較を行い（ステップＳ２）、内容が完
全に一致している部分があるか否かを判断する（ステッ
プＳ３）。内容が一致している部分がなかった場合、フ
ァイル圧縮処理を終了する。First, the basic file compression processing procedure will be explained according to the flowchart shown in FIG. When the user inputs a command for file compression processing and the names of n existing files to be compressed (step S1), the operating system compares the content data between the specified n files (step S2). , it is determined whether there is a part whose contents completely match (step S3). If there is no matching part, the file compression process ends.

【００２３】一方、内容が一致している部分があると判
定された場合には、その部分を共通部分とみなし、それ
に共通部分であることを示す共通部分制御語を付加して
二次記憶装置に保存する（この処理を共通部分処理と呼
ぶ）（ステップＳ４）。さらに、比較が行われた後のフ
ァイルに対して、そのファイルが特定の位置に特定の共
通部分を所有していることを指定するための処理を行い
、二次記憶装置に保存する（この処理を比較後ファイル
処理と呼ぶ）（ステップＳ５）。この比較後ファイル処
理をｎ個のファイル全てに対して行うと、ファイル圧縮
処理が終了される。ファイル圧縮処理によって新たに作
成されたファイルを共通部分も含めて圧縮済ファイルと
呼ぶ。On the other hand, if it is determined that there is a part whose contents match, that part is regarded as a common part, a common part control word indicating that it is a common part is added thereto, and the part is stored in the secondary storage device. (This processing is called common part processing) (step S4). Furthermore, the file after the comparison is processed to specify that the file owns a specific common part at a specific location, and the file is saved in a secondary storage device (this process (referred to as post-comparison file processing) (step S5). When this post-comparison file processing is performed on all n files, the file compression processing is completed. A file newly created by file compression processing, including common parts, is called a compressed file.

【００２４】ここで、ファイル間の内容データを比較し
て共通部分の判定を行う方法を簡単に説明する。例とし
て、ファイル１とファイル２の共通部分を判定する場合
を考える。まず、ファイル１を適当なデータ長（例えば
この場合１行とする）に分割したとみなし、ファイル１
の全行とファイル２の全行を先頭行から順に１行ずつ比
較する。あるペアを比較した結果、その二つの行が同一
でなければ次のペアの比較を行うが、同一であった場合
には以下の処理を行う。例えば、ファイル１のｉ１　行
目とファイル２のｉ２　行目が同一であった場合は、次
にファイル１の（ｉ１　＋１）行目とファイル２の（ｉ
２　＋１）行目の比較を行う。この二つの行も同じであ
れば、（ｉ１＋２）行目と（ｉ２　＋２）行目を比較す
る。[0024] Here, a method for comparing content data between files and determining common parts will be briefly described. As an example, consider a case where a common portion between file 1 and file 2 is determined. First, assuming that file 1 has been divided into appropriate data lengths (for example, one line in this case), file 1
Compare all lines of file 2 with all lines of file 2 line by line starting from the first line. As a result of comparing a certain pair, if the two lines are not the same, the next pair is compared, but if they are the same, the following processing is performed. For example, if the i1th line of file 1 and the i2th line of file 2 are the same, then the (i1 +1)th line of file 1 and the (i
Compare the 2+1)th line. If these two lines are also the same, the (i1+2)th line and (i2+2)th line are compared.

【００２５】このように、二つの行が同一である限りフ
ァイル１とファイル２のそれぞれ次の行の比較を行う。そして、仮にファイル１の（ｉ１　＋ｋ）行目とファイ
ル２の（ｉ２　＋ｋ）行目が同一でないと判明した場合
、ファイル１のｉ１　行目から（ｉ１　＋ｋ−１）行目
までの範囲とファイル２のｉ２　行目から（ｉ２　＋ｋ
−１）行目までの範囲が共通部分であると判定される。もし、３個のファイルの共通部分を判定する場合は、フ
ァイル１とファイル２の共通部分とファイル３を上記と
同様の方法で比較する。In this way, as long as the two lines are the same, the next lines of file 1 and file 2 are compared. If it turns out that the (i1 + k) line of file 1 and the (i2 + k) line of file 2 are not the same, then the range from the i1 line to the (i1 + k-1) line of file 1 and the file From line i2 of 2 (i2 +k
-1) It is determined that the range up to the line is a common part. If a common portion of three files is to be determined, the common portion of files 1 and 2 and file 3 are compared in the same manner as described above.

【００２６】次に、共通部分処理と比較後ファイル処理
について、より具体的に説明する。まず、図２のステッ
プＳ４の共通部分処理では、上述したように共通部分に
共通部分制御語を付加し、図３のようなファイル（これ
を共通ファイルと呼ぶ）として二次記憶装置に保存する
。ここで、共通部分制御語にはその共通部分を共有して
いる圧縮済ファイル数（これを共有ファイル数と呼ぶ）
を記録する。この共通部分制御語は、後に説明するよう
に圧縮済ファイルの更新，削除等を行うために設ける。Next, the common portion processing and the post-comparison file processing will be explained in more detail. First, in the common part processing in step S4 of FIG. 2, a common part control word is added to the common part as described above, and the file is saved in the secondary storage device as a file as shown in FIG. 3 (this is called a common file). . Here, the common part control word is the number of compressed files that share the common part (this is called the number of shared files).
Record. This common part control word is provided for updating, deleting, etc. the compressed file, as will be explained later.

【００２７】図２のステップＳ５の比較後ファイル処理
として、インデックス逐次型ファイル方式を用いること
ができる。まず、従来用いられているインデックス逐次
型ファイルについて説明する。インデックス逐次型ファ
イルとは、一つのインデックスといくつかのレコード（
ファイルを分割したもの）で構成された構造を持ち、イ
ンデックスにはレコードを指定するための値（レコード
が保存されている二次記憶装置中の先頭位置と末尾位置
）が順番に記憶されている。オペレーティングシステム
はインデックスからレコードに関する情報を読取り、順
番にレコードにアクセスすることによって、インデック
ス逐次型ファイルの処理を行うことができる。これによ
ると、レコードの更新，挿入，削除も可能である。As the post-comparison file processing in step S5 in FIG. 2, an index sequential file method can be used. First, a conventionally used index sequential file will be explained. An index sequential file is one index and several records (
The index has a structure that consists of a split file), and the index stores the values for specifying the record (the start and end positions in the secondary storage device where the record is saved) in order. . The operating system can process indexed sequential files by reading information about the records from the index and accessing the records in order. According to this, it is also possible to update, insert, and delete records.

【００２８】本発明の比較後ファイル処理では、このイ
ンデックス逐次型ファイル方式を応用して、オペレーテ
ィングシステムは図４に示すフローチャートに従った処
理を行う。初めに、比較が行われた後のファイルを共通
部分とそれ以外の部分（これを固有部分と呼ぶ）に分割
し、それぞれをレコードとみなす（ステップＳ６）。つ
まり、レコードは共通部分或いは固有部分のどちらかで
あるが、共通部分である場合、共通部分制御語を含む共
通ファイルをレコードとみなす。次に、新たに設けたイ
ンデックスにそれぞれのレコードの先頭位置と末尾位置
を示す値と、そのレコードが共通部分であるか固有部分
であるかを区別する値を記録する（ステップＳ７）。最
後に、インデックスと固有部分であるレコードを二次記
憶装置に保存して（ステップＳ８）、比較後ファイル処
理が終了される。In the post-comparison file processing of the present invention, this index sequential file method is applied, and the operating system performs processing according to the flowchart shown in FIG. First, the file after comparison is divided into a common part and other parts (this is called a unique part), and each part is regarded as a record (step S6). That is, a record is either a common part or a unique part, but if it is a common part, a common file containing a common part control word is regarded as a record. Next, values indicating the start and end positions of each record and a value distinguishing whether the record is a common part or a unique part are recorded in the newly provided index (step S7). Finally, the index and the unique portion of the record are saved in the secondary storage device (step S8), and the post-comparison file processing is completed.

【００２９】従って、このインデックス逐次型ファイル
方式でファイル圧縮処理を行った場合、圧縮済ファイル
は図５に示すような構造を持つ。このようにして作成さ
れた圧縮済ファイルに対して、通常のインデックス逐次
型ファイルに対する処理（更新，削除等）と同様の処理
を行うことが可能である。但し、レコードが共通ファイ
ルである場合、レコードの先頭に共通部分制御語が付加
されているため、共通部分制御語を読み飛ばし、共通部
分のみを処理する。Therefore, when file compression processing is performed using this index sequential file method, the compressed file has a structure as shown in FIG. It is possible to perform the same processing (update, deletion, etc.) on the compressed file created in this manner as on a normal index sequential type file. However, if the record is a common file, a common part control word is added to the beginning of the record, so the common part control word is skipped and only the common part is processed.

【００３０】また、比較後ファイル処理をマーキング方
式と呼ぶ処理方法を用いて実行することも可能である。この方式では、オペレーティングシステムは図６に示す
フローチャートに従った処理を行う。まず、比較処理が
行われたファイルから共通部分を取り除き、その位置に
共通部分が存在することを示すマーキングを行う（ステ
ップＳ９）。次に、ファイルにヘッダーを付加し、それ
にファイルから取り除いた共通部分に関する情報を記録
する（ステップＳ１０）。ステップＳ９とステップＳ１
０によって処理されたファイルを固有ファイルと呼び、
その構造を図７に示す。最後に、固有ファイルを二次記
憶装置に保存して（ステップＳ１１）、比較後ファイル
処理の終了となる。It is also possible to execute the post-comparison file processing using a processing method called a marking method. In this method, the operating system performs processing according to the flowchart shown in FIG. First, a common part is removed from the files that have been subjected to the comparison process, and marking is performed at that position to indicate that the common part exists (step S9). Next, a header is added to the file, and information regarding the common parts removed from the file is recorded therein (step S10). Step S9 and Step S1
The file processed by 0 is called a unique file,
Its structure is shown in FIG. Finally, the unique file is saved in the secondary storage device (step S11), and the post-comparison file processing ends.

【００３１】具体的なマ−キングの方法として、共通部
分が抜き出された位置に特殊文字（これを共通部分指定
文字と呼ぶ）を記録し、その後ろに抜き出した共通部分
（共通ファイル）を指定する値を記録する。この値は、
例えば共通ファイルが保存されている二次記憶装置中の
先頭位置と末尾位置を示す値である。ここで、共通部分
指定文字とそれに続く値を共通部分指定語と呼ぶ。また
、ヘッダーにも共通ファイルが保存されている先頭位置
と末尾位置を示す値を記録する。従って、マーキング方
式によるファイル圧縮処理で作成される圧縮済ファイル
は一つの固有ファイルとそれが指定する共通ファイルか
ら構成される。例えば、オペレーティングシステムがマ
ーキング方式によって作成された圧縮済ファイルを更新
する場合、次のような処理を行うことで圧縮済ファイル
を通常のファイルと同等に扱うことができる。[0031] As a specific marking method, a special character (this is called a common part designation character) is recorded at the position where the common part is extracted, and the extracted common part (common file) is recorded after that. Record the value you specify. This value is
For example, it is a value indicating the start position and end position in the secondary storage device where the common file is stored. Here, the common part specification character and the value following it are called a common part specification word. In addition, values indicating the start and end positions where the common file is saved are also recorded in the header. Therefore, a compressed file created by file compression processing using the marking method is composed of one unique file and a common file specified by the unique file. For example, when an operating system updates a compressed file created using a marking method, the compressed file can be handled in the same way as a normal file by performing the following process.

【００３２】即ち、圧縮済ファイルの更新のためには、
固有ファイルが二次記憶装置から作業領域である主記憶
或いは二次仮想アドレス空間へ転送される。転送の際、
各ビット毎にそれが共通部分指定文字であるか否かの判
定を行い、共通部分指定文字であると判定された場合は
、共通部分指定語をそれが指定する共通部分と置き換え
る。つまり、共通部分指定語を取り除き、その位置に共
通部分の読み込みを行う。この処理によって、作業領域
に転送される固有ファイルは、ファイル圧縮処理を行う
前と同一のファイルに変換される。但し、固有ファイル
のヘッダーの転送は行わない。That is, in order to update the compressed file,
The unique file is transferred from the secondary storage device to the main storage or secondary virtual address space, which is a work area. When transferring,
It is determined for each bit whether or not it is a common part designation character, and if it is determined that it is a common part designation character, the common part designation word is replaced with the common part specified by it. In other words, the common part specification word is removed and the common part is read at that position. Through this process, the unique file transferred to the work area is converted into the same file as before the file compression process. However, the header of the specific file is not transferred.

【００３３】以上記述してきた方法によると、本発明の
ファイル圧縮処理を実現することが可能となる。次に、
上述した方法によって作成された圧縮済ファイルを削除
あるいは更新するための処理と、圧縮済ファイルに対し
さらにファイル圧縮処理を行う場合の具体的な手順につ
いて説明する。According to the method described above, it is possible to realize the file compression processing of the present invention. next,
A process for deleting or updating a compressed file created by the method described above, and a specific procedure for further performing file compression processing on a compressed file will be described.

【００３４】インデツクス逐次型ファイル方式によって
作成された圧縮済ファイルを削除する場合、オペレーテ
ィングシステムは図８に示すフローチャートに従った処
理を行う。ユーザーが削除のコマンドと削除したい圧縮
済ファイルの名前を入力すると（ステップＳ１２）、オ
ペレーティングシステムは指定された圧縮済ファイルの
インデックスにアクセスし（ステップＳ１３）、レコー
ドが共通部分であるか固有部分であるかの判定を行う（
ステップＳ１４）。レコードが固有部分である場合、オ
ペレーティングシステムはそのレコード（固有部分）の
削除を行う（ステップＳ１５）。When deleting a compressed file created by the index sequential file method, the operating system performs processing according to the flowchart shown in FIG. When the user enters a deletion command and the name of the compressed file to be deleted (step S12), the operating system accesses the index of the specified compressed file (step S13) and determines whether the records are common or unique. Determine if there is (
Step S14). If the record is a unique part, the operating system deletes the record (unique part) (step S15).

【００３５】一方、レコードが共通部分である場合、オ
ペレーティングシステムはその共通ファイルの共通部分
制御語にアクセスし、共有ファイル数の書き替えを行う
（ステップＳ１６）。この場合は共有ファイル数を１つ
減らす。次に、共有ファイル数が０であるかないかの判
定を行い、共有ファイル数が０であればその共通部分を
削除する（この処理を零判定削除と呼ぶ）（ステップＳ
１７）。ユーザーが指定した圧縮済ファイルが所有する
全てのレコードに対して、ステップＳ１３からステップ
Ｓ１７までの処理が終わると、最後にインデックスが削
除され（ステップＳ１８）、圧縮済ファイルの削除が完
了する。On the other hand, if the record is a common part, the operating system accesses the common part control word of the common file and rewrites the number of shared files (step S16). In this case, reduce the number of shared files by one. Next, it is determined whether the number of shared files is 0 or not, and if the number of shared files is 0, the common part is deleted (this process is called zero determination deletion) (step S
17). When the processes from step S13 to step S17 are completed for all records owned by the compressed file specified by the user, the index is finally deleted (step S18), and the deletion of the compressed file is completed.

【００３６】マーキング方式による圧縮済ファイルを削
除する場合は、図９のフローチャートに従った処理が行
われる。まず、ユーザーが削除のコマンドと削除したい
圧縮済ファイルの名前を入力すると（ステップＳ１９）
、オペレーティングシステムは指定された圧縮済ファイ
ルの固有ファイルのヘッダーにアクセスし、共通ファイ
ルに関する情報を読み取る（ステップＳ２０）。次に、
共通ファイルの共有ファイル数を１つ減らして（ステッ
プＳ２１）、零判定削除を行う（ステップＳ２２）。最
後に、固有ファイルを削除して（ステップＳ２３）、圧
縮済ファイルの削除の終了となる。When deleting a compressed file using the marking method, processing is performed according to the flowchart of FIG. First, when the user inputs a deletion command and the name of the compressed file to be deleted (step S19)
, the operating system accesses the unique file header of the specified compressed file and reads information regarding the common file (step S20). next,
The number of shared files in the common file is decreased by one (step S21), and zero determination deletion is performed (step S22). Finally, the unique file is deleted (step S23), and the deletion of the compressed file is completed.

【００３７】インデックス逐次型ファイル方式による圧
縮済ファイルに対する更新の手続きを、図１０に示すフ
ローチャートに従って説明する。ユーザーは圧縮済ファ
イルの編集のコマンドと圧縮済ファイル名を入力して（
ステップＳ２４）、圧縮済ファイルの編集を行う。（ステップＳ２５）。編集が終った圧縮済ファイルを保
存する際、オペレーティングシステムはレコード毎にそ
の内部で変更が行われたかをチェックする（ステップＳ
２６）。レコード内で変更がなされなかった場合、その
レコードに対する処理は何も行わない。しかし、レコー
ド内で変更が行われた場合、レコードが共通部分である
か、固有部分であるかを判定し（ステップＳ２７）、レ
コードが固有部分であればそれを更新する（ステップＳ
２８）。The procedure for updating a compressed file using the index sequential file method will be explained with reference to the flowchart shown in FIG. The user enters the command to edit a compressed file and the compressed file name (
Step S24), the compressed file is edited. (Step S25). When saving an edited compressed file, the operating system checks each record to see if changes have been made within it (step S
26). If no changes were made within a record, no action is taken on that record. However, if a change is made within the record, it is determined whether the record is a common part or a unique part (step S27), and if the record is a unique part, it is updated (step S27).
28).

【００３８】一方、変更されたレコードが共通部分であ
る場合、そのレコードを固有部分として更新する（ステ
ップＳ２９）。つまり、共通部分制御語を取り除いた状
態で新たに保存する。そして、インデクッスの該当する
レコードの情報を、新たに保存したレコードを指定する
値に書き替える（ステップＳ３０）。さらに、変更され
る前の共通ファイルに対しては、共通部分制御語内の共
有ファイル数を１つ減らして（ステップＳ３１）、零判
定削除を行う（ステップＳ３２）。ステップＳ２６から
ステップＳ３２までの処理を圧縮済ファイルが所有する
全てのレコードに対して行って、圧縮済ファイルの更新
が完了となる。On the other hand, if the changed record is a common part, the record is updated as a unique part (step S29). In other words, it is newly saved with the common part control word removed. Then, the information of the corresponding record in the index is rewritten to a value specifying the newly saved record (step S30). Furthermore, for the common file before being changed, the number of shared files in the common part control word is decreased by one (step S31), and zero determination deletion is performed (step S32). The processes from step S26 to step S32 are performed on all records owned by the compressed file, and the update of the compressed file is completed.

【００３９】マーキング方式による圧縮済ファイルの更
新を行う場合、変更したファイルは通常のファイルとし
て保存する。そして、変更前の圧縮済ファイルが所有し
ていた共通ファイルの共有ファイル数を１つ減らし、零
判定削除を行う。従って、マーキング方式による圧縮済
ファイルに一旦更新が行われると、それはもはや圧縮済
ファイルではなく、通常のファイルに戻る。When updating a compressed file using the marking method, the changed file is saved as a normal file. Then, the number of shared files of the common file owned by the compressed file before the change is reduced by one, and zero determination deletion is performed. Therefore, once an update is made to a compressed file using the marking method, it is no longer a compressed file, but reverts to a regular file.

【００４０】圧縮済ファイルに対するファイル圧縮処理
は、圧縮済ファイルを通常のファイルに戻した後、改め
て行う。圧縮済ファイルを通常のファイルに戻す処理は
次のような手順で行う。インデックス逐行型ファイル方
式で処理された圧縮済ファイルの場合、インデックスに
従ってレコードを一つのファイルにつなげ直す。その際
レコードが共通ファイルであれば、共通部分制御語を除
いた共通部分のみをつなげる。マーキング方式による圧
縮済ファイルの場合、共通部分指定語をそれが指定する
共通部分と置き換える。そして、圧縮済ファイルが所有
していた共通ファイルの共有ファイル数を１つ減らし、
零判定削除を行う。最後に、インデックス逐次型ファイ
ル方式の場合はインデックスを、マーキング方式の場合
はヘッダー部分を消去する。以上の手続きにより圧縮済
ファイルは通常のファイルに戻るため、その通常のファ
イルに対して改めてファイル圧縮処理を行う。File compression processing for a compressed file is performed again after the compressed file is restored to a normal file. The process of converting a compressed file back to a normal file is performed using the following steps. In the case of compressed files processed using the index-sequential file method, records are recombined into one file according to the index. At this time, if the record is a common file, only the common parts excluding the common part control word are connected. In the case of a compressed file using the marking method, the common part specification word is replaced with the common part specified by it. Then, reduce the number of shared files owned by the compressed file by one,
Delete zero judgment. Finally, if the index sequential file method is used, the index is deleted, and if the marking method is used, the header portion is deleted. The above procedure returns the compressed file to a normal file, so the file compression process is performed on the normal file again.

【００４１】これまで、処理される共通部分が１個であ
るという前提の下でファイル圧縮処理の説明を行ってき
たが、共通部分が複数個ある場合でもファイル圧縮処理
を実現するにあたり、何等支障を来すものではない。そ
のため、判定される共通部分が複数個ある場合の処理方
法について説明する。Up to now, file compression processing has been explained on the assumption that there is only one common part to be processed, but even if there are multiple common parts, there is no problem in realizing file compression processing. It is not something that causes Therefore, a processing method when there are a plurality of common parts to be determined will be described.

【００４２】ｎ個のファイル中に、例えばｋ個の共通部
分が含まれている場合のファイル圧縮処理は、基本的に
は図２のフローチャートと同様の手続きでよいが、一部
異なる箇所がある。図２のステップＳ２においてｎ個の
ファイルが比較された結果、共通部分がｋ個判定された
場合、ステップＳ４がｋ回繰り返され、判定されたｋ個
の共通部分それぞれについて共通部分処理が行われる。次に、ｎ個のファイルに対して比較後ファイル処理（ス
テップＳ５）が行われるが、インデックス逐次型ファイ
ル方式の場合、レコードのうちｋ個が共通部分になって
おり、マーキング方式の場合、固有ファイルにはｋ個の
共通部分指定語が記録される。またインデックスとヘッ
ダーには、ｋ個の共通部分全ての情報が記録される。こ
の比較後ファイル処理がｎ回繰り返され、ｎ個のファイ
ル全てに対する処理が終わると、ファイル圧縮処理が終
了する。[0042] File compression processing when, for example, k common parts are included in n files can basically be performed using the same procedure as shown in the flowchart in Fig. 2, but there are some differences. . If k common parts are determined as a result of comparing n files in step S2 of FIG. 2, step S4 is repeated k times, and common part processing is performed for each of the k determined common parts. . Next, post-comparison file processing (step S5) is performed on the n files, but in the case of the index sequential file method, k of the records are common parts, and in the case of the marking method, the unique k common part specification words are recorded in the file. Further, information on all k common parts is recorded in the index and header. This post-comparison file processing is repeated n times, and when all n files have been processed, the file compression processing ends.

【００４３】比較処理はユーザーが指定したｎ個ファイ
ル全てに対して行うだけに限らず、その内の任意のファ
イル間で行われるものであってもよい。そのため、ファ
イル圧縮処理を行う際、比較するファイル数ｉを変化さ
せて、段階的に比較処理と共通部分処理を実行する。つ
まり、段階ｉにおいてはｉ個（ｉ≦ｎ）のファイル間で
比較を行い、そのｉ個のファイルに対する共通部分ｉを
判定して、共通部分処理を行う。ｉの値はｎからｍ（ｎ
≧ｍ≧２）まで減少方向へ変化させる。The comparison process is not limited to all n files specified by the user, but may be performed between any files among them. Therefore, when performing the file compression process, the number i of files to be compared is changed and the comparison process and the common part process are performed in stages. That is, in stage i, a comparison is made between i files (i≦n), a common portion i for the i files is determined, and common portion processing is performed. The value of i is from n to m(n
≧m≧2).

【００４４】図１１のフローチャートを参照して、具体
的な処理の手順を説明する。ユーザーが、ファイル圧縮
処理のコマンドとｎ個のファイル名を入力する（ステッ
プＳ３３）と、まずオペレーティングシステムはｎ個の
ファイルに対して比較処理を行い（ステップＳ３４、ｉ
＝ｎ）、全ファイルの共通部分（共通部分ｎ）を判定し
（ステップＳ３５）、共通部分ｎがあれば共通部分処理
を行う（ステップＳ３６）。次に、任意のｎ−１個のフ
ァイル間において、共通部分ｎを除いた範囲で内容の比
較を行い（ステップＳ３４、ｉ＝ｎ−１）、共通部分ｎ
−１があれば、共通部分処理を実行する（ステップＳ３
６）。但し、ステップＳ３４からステップＳ３６までの
処理は、ｎＣｉ（ｎ個からｉ個選び出す組み合わせの数
）回繰り返す。つまり、ｉがｎ−１の時、繰り返しの回
数はｎＣｎ−１　＝ｎであり、全ての組み合わせのｎ−
１個のファイル間で、比較処理と共通部分処理を行う。The specific processing procedure will be explained with reference to the flowchart shown in FIG. When the user inputs a command for file compression processing and n file names (step S33), the operating system first performs a comparison process on the n files (step S34,
= n), the common part (common part n) of all files is determined (step S35), and if there is a common part n, common part processing is performed (step S36). Next, contents are compared between arbitrary n-1 files, excluding the common part n (step S34, i=n-1), and the common part n
-1, execute common part processing (step S3
6). However, the processing from step S34 to step S36 is repeated nCi times (the number of i combinations to be selected from n). In other words, when i is n-1, the number of repetitions is nCn-1 = n, and all combinations of n-
Comparison processing and common portion processing are performed between one file.

【００４５】このようにして、最終的にはｍ個（ｎ≧ｍ
≧２）のファイル間での比較処理と共通部分処理まで段
階別に処理を行う。最後に、入力したｎ個のファイルに
対して、そのファイル中で判定された全ての共通部分に
関する比較後ファイル処理を行い（ステップＳ３７）、
ファイル圧縮処理が終了される。以下、ファイル圧縮処
理に関する補助的な事項について記述する。In this way, m pieces (n≧m
≧2) Processing is performed step by step, including comparison processing between files and common portion processing. Finally, for the n input files, post-comparison file processing is performed on all common parts determined in the files (step S37);
The file compression process ends. Below, auxiliary matters related to file compression processing will be described.

【００４６】（１）ファイル間の比較の結果、判定され
る共通部分のサイズが非常に小さい場合は、ファイル圧
縮処理の効果が十分得られないため、共通部分サイズの
下限値の指定が必要である。従ってこの場合、例えば下
限値を共通部分制御語のサイズと定める。(1) If the size of the common portion determined as a result of comparison between files is very small, the effect of file compression processing will not be sufficiently obtained, so it is necessary to specify the lower limit value of the common portion size. be. Therefore, in this case, for example, the lower limit value is determined as the size of the common part control word.

【００４７】（２）　共通部分処理において、共通ファ
イルの共通部分制御語に記録する情報は共有ファイル数
であるが、加えて、その共通ファイルを共有している圧
縮済ファイルの名前とその記憶位置を記録してもよい。(2) In common part processing, the information recorded in the common part control word of a common file is the number of shared files, but in addition, the names and storage locations of compressed files that share the common file are also recorded. may be recorded.

【００４８】（３）　インデックス逐次型ファイル方式
による比較後ファイル処理において、レコードのサイズ
が非常に大きい場合、それをさらにいくつかのレコード
に分割してもよい。(3) In post-comparison file processing using the index sequential file method, if the record size is very large, it may be further divided into several records.

【００４９】（４）　マーキング方式による比較後ファ
イル処理において、固有ファイルの共通部分指定語中で
共通部分指定文字に続いて記録される値は、共通ファイ
ルを指定できる値であればよいため、例えば共通ファイ
ルが保存されている先頭位置と末尾位置を示す値を記録
するか、若しくは共通ファイルの名前を記録してもよい
。但し、共通部分指定語に共通ファイルの名前を記録し
た場合、ヘッダーには共通ファイルの先頭位置と末尾位
置を示す値に加えて、共通ファイルの名前も記録しなけ
ればならない。(4) In the post-comparison file processing using the marking method, the value recorded following the common part designation character in the common part designation word of the unique file may be any value that can designate the common file. Values indicating the start and end positions where the common file is saved may be recorded, or the name of the common file may be recorded. However, if the name of the common file is recorded in the common part specification word, the name of the common file must also be recorded in the header in addition to the values indicating the start and end positions of the common file.

【００５０】（５）　オペレーティングシステムはファ
イル圧縮処理を終了する前に、ユーザーに対して処理状
況を知らせてもよい。例えば、ユーザ−がオンラインで
操作を行っている場合、端末の画面に処理状況をメッセ
ージとして表示する。即ち、共通部分が判定された場合
、判定された共通部分の名前或いは番号とその共通部分
を共有する圧縮済ファイル名のリストを、共通部分が判
定されなかった場合、共通部分がなかったという内容の
メッセージを表示してもよい。(5) The operating system may notify the user of the processing status before completing the file compression processing. For example, when a user is performing an operation online, the processing status is displayed as a message on the terminal screen. That is, if a common part is determined, the name or number of the determined common part and a list of compressed file names that share the common part, and if a common part is not determined, the content that there is no common part. message may be displayed.

【００５１】（６）　比較するファイル数を減少させな
がら、共通部分の判定と共通部分処理を段階的に行う場
合のファイル圧縮処理（図１１）において、最終段階で
比較されるファイルの個数ｍは暗黙に指定がなされてい
てもよいし、ユーザーが指定し直すことができてもよい
。(6) In the file compression process (FIG. 11) in which common part determination and common part processing are performed in stages while reducing the number of files to be compared, the number m of files to be compared at the final stage is It may be implicitly specified, or the user may be able to specify it again.

【００５２】（７）　ｎ個のファイル間の共通部分ｎを
判定する方法として、例えば次のような手順がある。ｎ
個のファイルに番号付けを行い、まず１番目と２番目の
ファイルを比較して、共通部分を抽出する。次に、抽出
した共通部分と３番目のファイルを比較して、同じく共
通部分を抽出する。このようにファイルを一つずつ順番
に比較していき、最後のｎ−１個のファイル間での共通
部分とｎ番目のファイルを比較して、共通部分ｎを判定
する。よって共通部分ｎが判定された場合、１番目から
ｉ番目（２≦ｉ≦ｎ−１）までのファイル間では、既に
比較が行われ共通部分が抽出されている。ここで、共通
部分ｎを判定する過程で抽出されるこれらの共通部分を
仮共通部分と呼ぶ。ところで、共通部分の判定と共通部
分処理を段階的に行う場合のファイル圧縮処理（図１１
）では比較するファイル数を減少方向に変化させるため
、共通部分ｉを判定する段階では、共通部分ｎから共通
部分ｉ＋１までは判定済みとなっている。従って、上記
の方法で共通部分ｎを判定している場合には、幾つかの
仮共通部分が既に抽出されているため、ファイル圧縮処
理のステップ数を減らす目的で、抽出済みの仮共通部分
を共通部分ｉの判定のために再利用してもよい。(7) As a method for determining the common portion n between n files, there is, for example, the following procedure. n
The first and second files are numbered and the common parts are extracted by comparing the first and second files. Next, the extracted common portion is compared with the third file, and the common portion is similarly extracted. In this way, the files are compared one by one in order, and the common part among the last n-1 files is compared with the n-th file to determine the common part n. Therefore, when a common part n is determined, a comparison has already been performed and a common part has been extracted between the first to i-th (2≦i≦n-1) files. Here, these common parts extracted in the process of determining the common part n are called temporary common parts. By the way, file compression processing (Figure 11) when determining common parts and processing common parts is performed in stages.
), the number of files to be compared is changed in a decreasing direction, so at the stage of determining the common portion i, the common portions n to i+1 have already been determined. Therefore, when determining the common part n using the above method, some temporary common parts have already been extracted, so in order to reduce the number of steps in the file compression process, the extracted temporary common parts are It may be reused for determining the common part i.

【００５３】（８）　１個のファイル内に共通部分が幾
つか存在する場合、このファイルに対し単独でファイル
圧縮処理を行ってもよい。これを実現するためには、ユ
ーザーがファイル圧縮処理のコマンドとファイル名を一
つ入力し、一つのファイル内だけで内容の比較を行って
共通部分１を判定してもよいし、或いは図１１のファイ
ル圧縮処理において、最終的に比較するファイル数ｍを
１としてもよい。(8) If there are several common parts within one file, file compression processing may be performed on this file alone. To achieve this, the user may input one file compression processing command and file name, and compare the contents within one file to determine the common part 1, or as shown in Figure 11 In the file compression process, the number m of files to be finally compared may be set to 1.

【００５４】（９）　本発明におけるファイル圧縮方式
は、従来行われているデータ圧縮技術（Ｈｕｆｆｍａｎ
　の最適符号化法、Ｚｉｖ　−Ｌｅｍｐｅｌ　のデータ
圧縮法など）とは全く異質の処理方法であるため、ファ
イル圧縮処理を行った後、圧縮済ファイルに対してさら
に従来のデータ圧縮を行うことに何等問題はない。さら
に、ファイル圧縮処理における変形例をいくつか説明す
る。(9) The file compression method in the present invention is based on the conventional data compression technology (Huffman
(optimal encoding method, Ziv-Lempel data compression method, etc.), so there is no point in performing further conventional data compression on the compressed file after file compression processing. No problem. Furthermore, some modified examples of file compression processing will be explained.

【００５５】圧縮済ファイルに対する更新処理の特殊な
場合として、共通部分の更新が考えられる。これは共通
部分そのものを書き替えてしまう処理であり、この処理
を行うことにより、その共通部分を含む圧縮済ファイル
は共通部分の内容が一斉に変更されることになる。共通
部分を更新するためには、共通部分制御語の変更は行わ
ず、そのまま更新を行えばよい。このように、同じ内容
を含んでいる複数のファイルに対して、その共通部分の
一括変更を希望する場合、ファイル圧縮処理と共通部分
の更新を総合した処理（これをファイル一括変更と呼ぶ
）を行うと非常に便利である。[0055] As a special case of update processing for compressed files, updating of common parts can be considered. This is a process that rewrites the common part itself, and by performing this process, the contents of the common part of compressed files that include the common part are changed all at once. In order to update the common part, the common part control word may not be changed and the update may be performed as is. In this way, if you want to change the common parts of multiple files that contain the same content all at once, you can perform a process that combines file compression processing and updating the common parts (this is called batch file modification). It is very convenient to do.

【００５６】ファイルの一括変更は、図１２に示すフロ
ーチャートに従って行われる。初めに、ユーザーはファ
イル一括変更のコマンドと一括変更を希望するｎ個のフ
ァイル名を入力する（ステップＳ３８）。オペレーティ
ングシステムはそれらのファイルに対しファイル圧縮処
理を行い（ステップＳ３９）、共通部分の名前或いは番
号とそれを共有している圧縮済ファイル名のリストを表
示する（ステップＳ４０）。次に、ユーザーは更新した
い共通部分の編集を行い（ステップＳ４１）、オペレー
ティングシステムはその修正された共通部分を更新する
（ステップＳ４２）。この共通部分の修正と更新は繰り
返し行うことが可能である。更新したい共通部分がなく
なれば処理を終了する。Batch modification of files is performed according to the flowchart shown in FIG. First, the user inputs a command for batch modification of files and the names of n files desired to be modified at once (step S38). The operating system performs file compression processing on these files (step S39), and displays the name or number of the common part and a list of compressed file names that share it (step S40). Next, the user edits the common part to be updated (step S41), and the operating system updates the modified common part (step S42). Modification and updating of this common part can be repeated. When there are no common parts to be updated, the process ends.

【００５７】また、ファイル圧縮処理をコピー処理とし
て用いることも可能である。この処理の手順を図１３に
示すフローチャートを参照して説明する。まず、ユーザ
ーはコピー（ファイル圧縮処理による）のコマンドとコ
ピー元とコピー先のファイル名を入力する（ステップＳ
４３）。オペレーティングシステムはコピーするファイ
ル全体を共通部分とみなし（ステップＳ４４）、共通部
分処理（ステップＳ４５）において、共有ファイル数を
２と記録する。さらに、比較後ファイル処理（ステップ
Ｓ４６）を２回行って、コピー元とコピー先の圧縮ファ
イルを作成する。It is also possible to use file compression processing as copy processing. The procedure of this process will be explained with reference to the flowchart shown in FIG. First, the user inputs a copy command (by file compression processing) and the copy source and copy destination file names (step S
43). The operating system regards the entire file to be copied as a common part (step S44), and records the number of shared files as 2 in common part processing (step S45). Furthermore, the post-comparison file processing (step S46) is performed twice to create copy source and copy destination compressed files.

【００５８】インデックス逐次型ファイル方式による比
較後ファイル処理の場合、図４のステップＳ６の処理は
必要はなく、インデックスに記録される情報は一つのレ
コード（共通ファイル）の情報のみである（ステップＳ
７）。そして、ステップＳ８ではインデックスのみが保
存される。また、マーキング方式による比較後ファイル
処理の場合、図６に示したフローチャートに従って処理
を行う。但し、この比較後ファイル処理で作成される固
有ファイルはヘッダーと共通部分指定語のみから構成さ
れる。以上のような処理を行うことによって、二次記憶
装置には共通部分がただ一つ保存され、それをコピー元
とコピー先の圧縮済ファイルが共有することになる。In the case of post-comparison file processing using the index sequential file method, the process of step S6 in FIG. 4 is not necessary, and the information recorded in the index is only that of one record (common file) (step S
7). Then, in step S8, only the index is saved. Furthermore, in the case of post-comparison file processing using the marking method, processing is performed according to the flowchart shown in FIG. However, the unique file created by this post-comparison file processing consists only of a header and a common part specification word. By performing the above processing, only one common part is saved in the secondary storage device, and this is shared between the copy source and copy destination compressed files.

【００５９】なお、本発明は上述した実施例に限定され
るものではない。実施例ではファイルを格納する記憶部
として磁気ディスクを用いたが、この代わりには磁気テ
ープや光ディスク等の二次記憶装置（外部記憶装置）を
用いることができる。また、図１に示すファイル圧縮処
理部の構成は、ハードウェアによって実現してもよいし
、ソフトウェアによって実現してもよい。その他、本発
明の要旨を逸脱しない範囲で、種々変形して実施するこ
とができる。Note that the present invention is not limited to the embodiments described above. In the embodiment, a magnetic disk is used as a storage unit for storing files, but a secondary storage device (external storage device) such as a magnetic tape or an optical disk can be used instead. Further, the configuration of the file compression processing section shown in FIG. 1 may be realized by hardware or software. In addition, various modifications can be made without departing from the gist of the present invention.

【００６０】[0060]

【発明の効果】以上詳述したように本発明によれば、内
容が共通した部分を含むファイルが複数存在する場合に
、各ファイルを共通部分とこの共通部分を除いた非共通
部分とに分け、これらを二次記憶装置等の記憶部に独立
して格納している。従って、複数のファイルに共通の内
容を二次記憶装置に重複して記憶する等の不都合を避け
ることができ、二次記憶装置のより効率的な利用が可能
となるファイル圧縮装置を実現することが可能となる。[Effects of the Invention] As detailed above, according to the present invention, when there are multiple files including portions with common contents, each file is divided into the common portion and the non-common portion excluding the common portion. , these are stored independently in a storage unit such as a secondary storage device. Therefore, it is possible to realize a file compression device that can avoid inconveniences such as redundant storage of contents common to a plurality of files in a secondary storage device, and can make more efficient use of the secondary storage device. becomes possible.

[Brief explanation of the drawing]

【図１】本発明の一実施例に係わるファイル圧縮装置の
概略構成を示すブロック図、FIG. 1 is a block diagram showing a schematic configuration of a file compression device according to an embodiment of the present invention;

【図２】ファイル圧縮処理の基本的な手順を示すフロー
チャート、[Figure 2] Flowchart showing the basic steps of file compression processing,

【図３】共通ファイルの構造を示す模式図、[Figure 3] Schematic diagram showing the structure of a common file,

【図４】イ
ンデックス逐次型ファイル方式による比較後ファイル処
理を示すフローチャート、FIG. 4 is a flowchart showing post-comparison file processing using the index sequential file method;

【図５】インデックス逐次型ファイル方式によるファイ
ル圧縮処理で作成される圧縮ファイルの構造を示す模式
図、[Fig. 5] A schematic diagram showing the structure of a compressed file created by file compression processing using the index sequential file method.

【図６】マーキング方式による比較後ファイル処理を示
すフローチャート、FIG. 6 is a flowchart showing post-comparison file processing using the marking method;

【図７】マーキング方式によるファイル圧縮処理で作成
される圧縮ファイルが持つ固有ファイルの構造を示す模
式図、[Fig. 7] A schematic diagram showing the structure of a unique file of a compressed file created by file compression processing using a marking method.

【図８】インデックス逐次型ファイル方式で処理された
圧縮ファイルを削除する手順を示すフローチャート、FIG. 8 is a flowchart showing a procedure for deleting a compressed file processed by the index sequential file method;

【
図９】マーキング方式で処理された圧縮ファイルを削除
する手順を示すフローチャート、[
FIG. 9 is a flowchart showing the procedure for deleting compressed files processed by the marking method;

【図１０】インデックス逐次型ファイル方式で処理され
た圧縮ファイルを更新する手順を示すフローチャート、
FIG. 10 is a flowchart showing a procedure for updating a compressed file processed by the index sequential file method;

【図１１】比較するファイル数を変化させ、段階的に比
較処理と共通部分処理を行う場合のファイル圧縮処理の
手順を示すフローチャート、FIG. 11 is a flowchart showing the procedure of file compression processing when the number of files to be compared is changed and comparison processing and common part processing are performed in stages;

【図１２】ファイル一括変更の手順を示すフローチャー
ト、[Fig. 12] Flowchart showing the procedure for batch modification of files,

【図１３】ファイル圧縮処理をコピー処理として行う場
合の手順を示すフローチャート。FIG. 13 is a flowchart showing a procedure when file compression processing is performed as copy processing.

[Explanation of symbols]

１０…ファイル圧縮処理部、１１…比較処理部、１２…共通部分処理部、１３…ファイル処理部、２０…二次記憶装置（第１の記憶部）、３０…二次記憶
装置（第２の記憶部）。10...File compression processing section, 11...Comparison processing section, 12...Common part processing section, 13...File processing section, 20...Secondary storage device (first storage section), 30...Secondary storage device (second storage section) memory).

Claims

[Claims]

Claim 1: means for comparing the contents of a plurality of files stored in a first storage unit; means for extracting portions whose contents match as a result of the comparison by the means; means for storing a common portion as a common file in a first storage unit or a second storage unit; and a means for storing a file after extracting the common portion as a unique file in a first storage unit.
1. A file compression device comprising: a storage unit or a second storage unit.