JP2012181796A

JP2012181796A - Information processing system, duplicate file removal method for information processing system, information processor, and control method and control program for information processor

Info

Publication number: JP2012181796A
Application number: JP2011045934A
Authority: JP
Inventors: Takemi Yoshida; 武未吉田
Original assignee: NEC Fielding Ltd
Current assignee: NEC Fielding Ltd
Priority date: 2011-03-03
Filing date: 2011-03-03
Publication date: 2012-09-20
Anticipated expiration: 2031-03-03
Also published as: JP5473010B2

Abstract

PROBLEM TO BE SOLVED: To effectively use resources by removing duplicate data and centrally managing retained data in an information processing system where plural computers are connected with each other.SOLUTION: An information processor includes: a search unit for searching whether a folder including the same data file as a data file to be processed is in data retention means for retaining data; a storage unit for, when the folder including the same data file is not in the data retention means, storing the folder including the data file in the data retention means; a path setting unit for setting a path from a shortcut file in each folder of all information processors having the data file to the data file in the data retention means; and a removal unit for removing all the data files retained in the folders of the information processors.

Description

本発明は、情報処理システムにおいて重複ファイルを排除する技術に関する。 The present invention relates to a technique for eliminating duplicate files in an information processing system.

上記技術分野において、特許文献１では、記憶装置ユニット（ＨＤＵ）内における重複データをハッシュコードの比較により検出して、重複データに対しては、一方を破棄する処理やリンクに置き換える処理、重複しないデータのみのコピー処理などを行なうことが記載されている。また、特許文献２では、複数のディスクドライブからなる記憶システムにおいて、データの重複除外範囲についてハッシュ値の表を保持して、ディスクドライブが電源ＯＦＦであっても重複除外処理ができる技術が記載されている。また、特許文献３には、複数のディスクからなるストレージアレイにデータを保持する場合に、仮想テープライブラリ(VTL:Virtual Tape Library)によってデータセット内の同じアンカーポイントとその前後のデルタによって重複データを識別する。そして、重複データをストレージインジケータで置き換えることによって記憶データを圧縮することが記載されている。 In the above technical field, in Patent Document 1, duplicate data in a storage unit (HDU) is detected by comparing hash codes, and for duplicate data, processing for discarding one or replacing it with a link, there is no duplication It describes that only data is copied. Patent Document 2 describes a technique that allows a deduplication process to be performed even when a disk drive is turned off by holding a hash value table for a deduplication range of data in a storage system including a plurality of disk drives. ing. In Patent Document 3, when data is stored in a storage array composed of a plurality of disks, duplicate data is stored by the same anchor point in the data set and a delta before and after the virtual tape library (VTL). Identify. It also describes that the stored data is compressed by replacing the duplicate data with a storage indicator.

特開２００９−２５１７２５号公報JP 2009-251725 A 特開２００９−０８０７８８号公報JP 2009-080788 A 特表２００９−５３５７０４号公報Special table 2009-535704 gazette

しかしながら、上記従来技術は、いずれもディスクなどからなる個別の記憶媒体内における、重複データを排除する技術である。したがって、記憶媒体内の重複データの一方は残し他方はポインタに置き換えることで自己完結的に解決が可能である。ところが、ネットワークを介して多くのコンピュータや周辺機器などが接続されたコンピュータシステムにおいては、記憶媒体間やコンピュータ間、コンピュータと記憶媒体間での重複データを削除することも求められる。 However, the above conventional techniques are techniques for eliminating duplicate data in individual storage media such as disks. Accordingly, it is possible to solve the problem in a self-contained manner by replacing one of the duplicate data in the storage medium and replacing the other with a pointer. However, in a computer system in which many computers and peripheral devices are connected via a network, it is also required to delete duplicate data between storage media, between computers, and between computers and storage media.

たとえば、それぞれ各ユーザ個人で使用するクライアントＰＣのデータは、ユーザ自らの意思で自由に管理して上記従来技術を適用すれば重複データの排除は可能である。しかしながら、コンピュータシステムを構成する全サーバ、全クライアントＰＣが含むディスク全体を対象にすれば、複数の重複したデータが存在する可能性が極めて高くなる。なぜなら、コンピュータシステムにおいて複数のクライアントＰＣで情報の共有をするには、ＷＥＢ上やファイルサーバ上に共有ファイルをアップデートする必要があり、また、それぞれのクライアントＰＣが同一ファイルをダウンロードするためである。また、各クライアントＰＣには重複データが無く１つデータであったとしても、二度と使用する必要がないファイルを削除し忘れたりすると、システム内には重複データとして残ってしまうことになる。さらに、ユーザ意識の問題による支給ＰＣの私物化なども原因となる。かかる問題点の解決には、個別の装置内における重複データの排除とは異なる新たな工夫が求められる。 For example, it is possible to eliminate duplicate data if the data of the client PC used by each individual user is freely managed by the user's own intention and the above-described conventional technique is applied. However, if the entire disks included in all servers and all client PCs constituting the computer system are targeted, the possibility that a plurality of duplicate data exists is extremely high. This is because in order to share information among a plurality of client PCs in a computer system, it is necessary to update the shared file on the WEB or the file server, and each client PC downloads the same file. Even if there is no duplicate data in each client PC and there is only one data, if you forget to delete a file that does not need to be used again, it will remain as duplicate data in the system. In addition, it is also caused by the personalization of the payment PC due to the problem of user awareness. In order to solve such a problem, a new device different from the elimination of duplicate data in individual apparatuses is required.

本発明の目的は、上述の課題を解決する技術を提供することにある。 The objective of this invention is providing the technique which solves the above-mentioned subject.

上記目的を達成するため、本発明に係る装置は、
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索手段と、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶手段と、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定手段と、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除手段と、
を備えることを特徴とする。 In order to achieve the above object, an apparatus according to the present invention provides:
Search means for searching whether or not a folder containing the same data file as the data file to be processed exists in the data holding means for holding data;
If there is no folder containing the same data file in the data holding means, storage means for storing the folder containing the data file in the data holding means;
Path setting means for setting a path from a shortcut file in the folder to the data file in the data holding means for all information processing device folders holding the data file;
Deleting means for deleting all the data files held by the folder of the information processing apparatus;
It is characterized by providing.

上記目的を達成するため、本発明に係る方法は、
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索ステップと、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶ステップと、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダのショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定ステップと、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除ステップと、
を含むことを特徴とする。 In order to achieve the above object, the method according to the present invention comprises:
A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
It is characterized by including.

上記目的を達成するため、本発明に係るプログラムは、
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索ステップと、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶ステップと、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダのショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定ステップと、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除ステップと、
をコンピュータに実行させることを特徴とする。 In order to achieve the above object, a program according to the present invention provides:
A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
Is executed by a computer.

上記目的を達成するため、本発明に係るシステムは、
複数のクライアントが生成した異なるフォルダに同じデータファイルを保持することが可能な情報処理システムであって、
前記同じデータファイルを１つのフォルダに保持する保持手段と、
前記複数のクライアントが生成した前記同じデータファイルを保持する全ての異なるフォルダ内のショートカットファイルから、前記保持手段に保持した前記１つのフォルダへのパスを設定するパス設定手段と、
前記全ての異なるフォルダが保持する前記同じデータファイルを全て削除する削除手段と、
を備えることを特徴とする。 In order to achieve the above object, a system according to the present invention provides:
An information processing system capable of holding the same data file in different folders generated by a plurality of clients,
Holding means for holding the same data file in one folder;
Path setting means for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding means;
Deleting means for deleting all the same data files held by all the different folders;
It is characterized by providing.

上記目的を達成するため、本発明に係る方法は、
複数のクライアントが生成した異なるフォルダに同じのデータファイルを保持することが可能な情報処理システムにおける重複ファイル排除方法であって、
前記同じデータファイルを１つのフォルダに保持する保持ステップと、
前記複数のクライアントが生成した前記同じデータファイルを保持する全ての異なるフォルダ内のショートカットファイルから、前記保持ステップにおいて保持した前記１つのフォルダへのパスを設定するパス設定ステップと、
前記全ての異なるフォルダが保持する前記同じデータファイルを全て削除する削除ステップと、
を含むことを特徴とする。 In order to achieve the above object, the method according to the present invention comprises:
A duplicate file elimination method in an information processing system capable of holding the same data file in different folders generated by a plurality of clients,
A holding step of holding the same data file in one folder;
A path setting step for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding step;
A deletion step of deleting all the same data files held by all the different folders;
It is characterized by including.

本発明によれば、複数のコンピュータが接続する情報処理システムにおける重複データの排除と保持データの集中管理とにより、リソースを有効利用できる。 According to the present invention, resources can be effectively used by eliminating duplicate data and centrally managing retained data in an information processing system connected to a plurality of computers.

本発明の第１実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理システムにおける各装置の機能構成の概略とその動作手順を示すブロック図である。It is a block diagram which shows the outline of the function structure of each apparatus in the information processing system which concerns on 2nd Embodiment of this invention, and its operation | movement procedure. 本発明の第２実施形態に係る重複検索サーバのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the duplication search server which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係るショートカットデータの構成を示す図である。It is a figure which shows the structure of the shortcut data which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係るバックアップＤＢとデータ参照ＤＢとの構成を示す図である。It is a figure which shows the structure of backup DB and data reference DB which concern on 2nd Embodiment of this invention. 本発明の第２実施形態に係る重複検索サーバの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the duplication search server which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係るバックアップデータ有無の判断処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the judgment process of backup data presence / absence based on 2nd Embodiment of this invention. 本発明の第２実施形態に係る重複データの判断処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the judgment process of the duplicate data which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る重複データ有りの場合のクライアントＰＣ用のショートカット作成処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the shortcut creation process for client PCs in case with duplicate data which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る重複データ無しの場合のデータ参照サーバへのバックアップデータ記憶処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the backup data storage process to the data reference server in the case of no duplicate data based on 2nd Embodiment of this invention. 本発明の第２実施形態に係る重複データ無しの場合のクライアントＰＣ用のショートカット作成処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the shortcut creation process for client PCs in the case of no duplicate data concerning 2nd Embodiment of this invention. 本発明の第２実施形態に係る処理済みでないバックアップデータ有無の判断処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the judgment process of the backup data presence / absence which has not been processed based on 2nd Embodiment of this invention. 本発明の第２実施形態に係る後処理を含む処理手順を示すフローチャートである。It is a flowchart which shows the process sequence including the post-process which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係るクライアントＰＣのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the client PC concerning 2nd Embodiment of this invention. 本発明の第２実施形態に係るショートカットテーブルの構成を示す図である。It is a figure which shows the structure of the shortcut table which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係るクライアントＰＣの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of client PC which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態による具体例の処理における起動前起点での各部データを示す図である。It is a figure which shows each part data in the starting point before starting in the process of the specific example by 2nd Embodiment of this invention. 本発明の第２実施形態による具体例の処理における初期化時点（Ｓ７０１）での各部データを示す図である。It is a figure which shows each part data in the initialization time (S701) in the process of the specific example by 2nd Embodiment of this invention. 本発明の第２実施形態による具体例の処理におけるバックアップデータ数の検出時点（Ｓ７０７）での各部データを示す図である。It is a figure which shows each part data in the detection time (S707) of the number of backup data in the process of the specific example by 2nd Embodiment of this invention. 本発明の第２実施形態による具体例の処理における１番目のバックアップデータの重複データ有無判断中（Ｓ８０９の判定）の各部データを示す図である。It is a figure which shows each part data in the duplication data existence judgment (judgment of S809) of the 1st backup data in the process of the specific example by 2nd Embodiment of this invention. 本発明の第２実施形態による具体例の処理における重複データ無しの場合のデータ参照サーバ書込準備時（Ｓ１００７）の各部データを示す図である。It is a figure which shows each part data at the time of the data reference server write preparation in the case of no duplicate data in the process of the specific example by 2nd Embodiment of this invention (S1007). 本発明の第２実施形態による具体例の処理における重複データ無しの場合のデータ参照サーバ書込時（Ｓ１１０５）及びショートカットパス設定時（Ｓ１１０９）の各部データを示す図である。It is a figure which shows each part data at the time of the data reference server writing at the time of the duplication data absence in the process of the specific example by 2nd Embodiment of this invention (S1105), and a shortcut path setting (S1109). 本発明の第２実施形態による具体例の処理における１回目の処理済み判定時（Ｓ１２０１／Ｓ１２０３）の各部データを示す図である。It is a figure which shows each part data at the time of the 1st process completion determination in the process of the specific example by 2nd Embodiment of this invention (S1201 / S1203). 本発明の第２実施形態による具体例の処理における２番目のバックアップデータの重複データ有りの判断時（Ｓ８０９）及びショートカットパス設定時（Ｓ９１１）の各部データを示す図である。It is a figure which shows each part data at the time of judgment (S809) of the duplication data of the 2nd backup data in the process of the specific example by 2nd Embodiment of this invention, and the time of a shortcut path setting (S911). 本発明の第２実施形態による具体例の処理における２回目の処理済み判定時（Ｓ１２０１／Ｓ１２０３）の各部データを示す図である。It is a figure which shows each part data at the time of the 2nd process completion determination in the process of the specific example by 2nd Embodiment of this invention (S1201 / S1203). 本発明の第２実施形態による具体例の処理における重複データ処理の終了時点（Ｓ１３０７／Ｓ１３０９）での各部データを示す図である。It is a figure which shows each part data in the completion time (S1307 / S1309) of the duplication data process in the process of the specific example by 2nd Embodiment of this invention.

以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素はあくまで例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.

［第１実施形態］
本発明の第１実施形態としての情報処理装置１００について、図１を用いて説明する。図１に示すように、情報処理装置１００は、検索部１２０と、記憶部１３０と、パス設定部１４０と、削除部１５０と、を含む。検索部１２０は、処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持部１１０内に有るか否かを検索する。記憶部１３０は、同一のデータファイルＢを含むフォルダがデータ保持部１１０内に無い場合は、データファイルＢを含むフォルダＹＹをデータ保持部１１０に記憶する。 [First Embodiment]
An information processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. As illustrated in FIG. 1, the information processing apparatus 100 includes a search unit 120, a storage unit 130, a path setting unit 140, and a deletion unit 150. The search unit 120 searches whether or not a folder including the same data file as the data file to be processed exists in the data holding unit 110 that holds data. When there is no folder that includes the same data file B in the data holding unit 110, the storage unit 130 stores the folder YY that includes the data file B in the data holding unit 110.

パス設定部１４０は、データファイルＡを保持する全ての情報処理装置１６０、１８０のフォルダＡＡ，ＣＣに対して、当該フォルダ内のショートカットファイルＡ，Ｃからデータ保持部１１０内のデータファイルＡへのパス１９０を設定する。また、パス設定部１４０は、データファイルＢを保持する情報処理装置１７０のフォルダＢＢに対して、当該フォルダ内のショートカットファイルＢからデータ保持部１１０内のデータファイルＢへのパス１９０を設定する。削除部１５０は、情報処理装置１６０，１７０のフォルダＡＡ，ＣＣが保持するデータファイルＡを全て削除する。また、削除部１５０は、情報処理装置１７０のフォルダＢＢが保持するデータファイルＢを削除する。 The path setting unit 140 applies, for all the information processing apparatuses 160 and 180 holding the data file A, the folders AA and CC in the folder from the shortcut files A and C in the folder to the data file A in the data holding unit 110. A path 190 is set. Further, the path setting unit 140 sets a path 190 from the shortcut file B in the folder to the data file B in the data holding unit 110 for the folder BB of the information processing apparatus 170 that holds the data file B. The deletion unit 150 deletes all data files A held in the folders AA and CC of the information processing apparatuses 160 and 170. Further, the deletion unit 150 deletes the data file B held in the folder BB of the information processing apparatus 170.

本実施形態によれば、複数のコンピュータが接続する情報処理システムにおける重複データの排除と保持データの集中管理とにより、リソースを有効利用できる。 According to the present embodiment, resources can be effectively used by eliminating duplicate data and centralized management of retained data in an information processing system in which a plurality of computers are connected.

［第２実施形態］
本発明の第２実施形態によれば、重複検索サーバに導入した処理を管理機端末ＰＣから実行することにより、自動的に全クライアントＰＣ上のデータが、データ参照サーバ上の、同一ハッシュ値のデータが保管されたフォルダへのショートカットファイルに置き換えられ、削除される。データ参照サーバに同一ハッシュ値のデータがない場合は、データが新規にデータ参照サーバに追加され、そのデータへのパスがデータベースに追加される。 [Second Embodiment]
According to the second embodiment of the present invention, by executing the processing introduced in the duplicate search server from the management terminal PC, the data on all the client PCs is automatically stored in the data reference server with the same hash value. It is replaced with a shortcut file to the folder where the data is stored and deleted. If there is no data with the same hash value in the data reference server, the data is newly added to the data reference server, and the path to the data is added to the database.

上記、本実施形態による重複検索サーバの処理手順は、次のような手順である。まず、全クライアントＰＣのデータを重複検索サーバへ統合バックアップして、全データのそれぞれのハッシュ値を算出してデータベースに書き込む。次に、そのバックアップデータベースを参照した対象データのハッシュ値と、現在までの一意なデータが統合保管されているデータ参照サーバを一覧できるデータ参照データベースに保持された対象データのハッシュ値とを１つ１つ比較する。そして、バックアップデータベースの対象データが重複データであるか非重複データであるかの比較をする。比較の結果、対象バックアップデータが非重複データの場合は、データ参照サーバ上に新たに対象バックアップデータを複製して、データ参照データベースに新データの情報を追加する。対象バックアップデータが重複データの場合は、データ参照サーバへのデータ複製は行わない。この状態で、対象バックアップデータが非重複データであっても重複データであっても、データ参照サーバ上にデータが存在することになる。最後に、クライアントＰＣ上の対象バックアップデータが保管されたフォルダに、データ参照サーバ上の同一データファイルへのショートカットを作成して、クライアントＰＣ上の対象バックアップデータを削除する。本実施形態の処理手順によって、情報処理システム全体の重複ファイルを排除することができ、かつ、システムに接続する全てのクライアントＰＣ上のデータをサーバに一括してマイグレーションし、システム全体としてディスクなどの記憶媒体の有効活用を実現することができる。 The processing procedure of the duplicate search server according to the present embodiment is as follows. First, the data of all the client PCs are integrated and backed up to the duplicate search server, and the hash values of all the data are calculated and written in the database. Next, one hash value of the target data referring to the backup database and one hash value of the target data held in the data reference database that can list the data reference servers in which the unique data until now are integrated and stored Compare one. Then, it is compared whether the target data of the backup database is duplicated data or non-duplicated data. As a result of the comparison, if the target backup data is non-duplicate data, the target backup data is newly duplicated on the data reference server, and new data information is added to the data reference database. When the target backup data is duplicate data, data replication to the data reference server is not performed. In this state, data exists on the data reference server regardless of whether the target backup data is non-duplicate data or duplicate data. Finally, a shortcut to the same data file on the data reference server is created in the folder where the target backup data on the client PC is stored, and the target backup data on the client PC is deleted. Through the processing procedure of this embodiment, it is possible to eliminate duplicate files in the entire information processing system, and the data on all client PCs connected to the system is migrated to the server in a lump so that the entire system can be Effective utilization of the storage medium can be realized.

なお、本実施形態では、情報処理システム内のバックアップデータファイルの重複を無くす例を代表して説明するが、いかなるデータファイルあるいはデータの一部の重複を無くすためにも容易に適用できる。さらに、データはプログラムであってもよく、本実施形態のデータはクライアントＰＣが処理する全てのデジタルデータを含む概念である。 In this embodiment, an example in which duplication of backup data files in the information processing system is eliminated will be described as a representative example. However, the present embodiment can be easily applied to eliminate duplication of any data file or part of data. Further, the data may be a program, and the data of the present embodiment is a concept including all digital data processed by the client PC.

《本実施形態の情報処理システムの構成》
図２は、本実施形態に係る情報処理システム２００の構成を示すブロック図である。 << Configuration of Information Processing System of this Embodiment >>
FIG. 2 is a block diagram illustrating a configuration of the information processing system 200 according to the present embodiment.

図２を参照すると、本実施形態の情報処理システム２００は、クライアントＰＣ・Ｎ１００〜Ｎ１０ＸとクライアントサーバＮ１１Ｘとを含むクライアント装置を有する。なお、上記クライアントＰＣ・Ｎ１００〜Ｎ１０Ｘには、デスクトップコンピュータＮ１００〜Ｎ１０Ｋや携帯端末Ｎ１０ＭやノートパソコンＮ１０Ｎ〜Ｎ１０Ｘを含んでよい。情報処理システム２００は、重複検索サーバＮ２００と、データ参照サーバＮ３００と、管理端末ＰＣ・Ｎ４００とを有する。クライアントＰＣ・Ｎ１００〜Ｎ１０Ｘと、重複検索サーバＮ２００と、データ参照サーバＮ３００と、管理端末ＰＣ・Ｎ４００とは、ネットワークＮ５００を介して、互いに接続される。接続は有線であっても無線であってもよい。 Referring to FIG. 2, the information processing system 200 of the present embodiment includes a client device including client PCs N100 to N10X and a client server N11X. The client PCs N100 to N10X may include desktop computers N100 to N10K, portable terminals N10M, and notebook computers N10N to N10X. The information processing system 200 includes a duplicate search server N200, a data reference server N300, and a management terminal PC / N400. The client PCs N100 to N10X, the duplicate search server N200, the data reference server N300, and the management terminal PC N400 are connected to each other via a network N500. The connection may be wired or wireless.

なお、管理端末ＰＣ・Ｎ４００は、重複検索サーバＮ２００に直接接続されてよい（図２に破線で示す）。また、データ参照サーバＮ３００は、本情報処理システムが参照するただ１つのデータを保持するため、図２のように並列構造であるのが望ましい。 The management terminal PC / N400 may be directly connected to the duplicate search server N200 (shown by a broken line in FIG. 2). The data reference server N300 preferably has a parallel structure as shown in FIG. 2 in order to hold only one piece of data that is referred to by the information processing system.

（情報処理システムにおける各装置の機能構成とその動作手順の概略）
図３は、情報処理システム２００における各装置の機能構成の概略とその動作手順を示すブロック図である。なお、図３には、重複検索サーバＮ２００を中心に機能構成部とそれらのデータ及び信号の接続と、動作手順のステップ番号を示している（ステップ番号は、図７〜図１３のステップ番号に対応する）。動作手順の詳細な処理は図７〜図１３に従って後述するので、ここでは機能構成部の機能と動作を主に説明する。 (Outline of functional configuration and operation procedure of each device in information processing system)
FIG. 3 is a block diagram showing an outline of a functional configuration of each device in the information processing system 200 and an operation procedure thereof. Note that FIG. 3 shows the function configuration units, their data and signal connections, and the step numbers of the operation procedures centering on the duplicate search server N200 (the step numbers are the same as the step numbers in FIGS. 7 to 13). Corresponding). Since detailed processing of the operation procedure will be described later with reference to FIGS. 7 to 13, the function and operation of the functional component will be mainly described here.

クライアントＰＣ・Ｎ１００〜Ｎ１０Ｘに内蔵されるフォルダは、バックアップデータ元フォルダＦ１１０〜Ｆ１１Ｘである。 The folders built in the client PCs N100 to N10X are backup data source folders F110 to F11X.

重複検索サーバＮ２００に内蔵される処理部は、中央処理部ＳＷ１０と、バックアップ制御部ＳＷ２０と、比較計算部ＳＷ３０と、データ作成部ＳＷ４０とである。なお、本実施形態においては、これら処理部は重複検索サーバＮ２００のＣＰＵが各処理モジュールプログラムを実行することにより実現され、各処理部間の信号伝達は各処理モジュールプログラム間の引き数により実現される。しかしながら、処理部の一部あるいは全部がそれぞれのＣＰＵを有していて、各処理部間の信号伝達はコンピュータ通信により行なわれてもよい。 The processing units incorporated in the duplicate search server N200 are a central processing unit SW10, a backup control unit SW20, a comparison calculation unit SW30, and a data creation unit SW40. In the present embodiment, these processing units are realized by the CPU of the duplicate search server N200 executing each processing module program, and signal transmission between the processing units is realized by an argument between the processing module programs. The However, some or all of the processing units may have respective CPUs, and signal transmission between the processing units may be performed by computer communication.

また、重複検索サーバＮ２００に内蔵されるデータベースは、重複検索サーバＮ２００が処理するクライアントＰＣのバックアップデータ元フォルダを管理するバックアップＤＢ・Ｄ１０と、データ参照サーバのデータ参照先フォルダを管理するデータ参照ＤＢ・Ｄ２０とである。また、重複検索サーバＮ２００に内蔵されるレジスタは、バックアップＤＢ・Ｄ１０を管理するためのＢＵＩＤ指数レジスタＲ１０及びＢＵＩＤ合計レジスタＲ２０と、データ参照ＤＢ・Ｄ２０を管理するためのＦＤＩＤ指数レジスタＲ３０及びＦＤＩＤ合計レジスタＲ４０とである。また、重複検索サーバＮ２００に内蔵されるフォルダは、クライアントＰＣから転送されたバックアップデータを格納する格納先フォルダＦ２１０〜Ｆ２１Ｘである。また、重複検索サーバＮ２００に内蔵されるカウンタは、処理するバックアップデータ数を示すカウンタＣ１０である。 The database built in the duplicate search server N200 includes a backup DB D10 for managing the backup data source folder of the client PC processed by the duplicate search server N200, and a data reference DB for managing the data reference destination folder of the data reference server. -It is with D20. The registers built in the duplicate search server N200 include a BUID index register R10 and a BUID total register R20 for managing the backup DB / D10, and a FDID index register R30 and FDID total for managing the data reference DB / D20. Register R40. The folders built in the duplicate search server N200 are storage destination folders F210 to F21X for storing backup data transferred from the client PC. The counter built in the duplicate search server N200 is a counter C10 indicating the number of backup data to be processed.

データ参照サーバＮ３００に内蔵されるフォルダは、参照先のバックアップデータを保持するデータ参照先フォルダＦ３１０〜Ｆ３１０である。 The folders built in the data reference server N300 are data reference destination folders F310 to F310 that hold backup data of reference destinations.

本実施形態で処理される対象データは、バックアップデータＦ１０〜Ｆ１Ｘと、ショートカットファイルＦ３０〜Ｆ３Ｘとである。バックアップデータＦ１０〜Ｆ１Ｘは、データ参照サーバのデータ参照フォルダに保持され、クライアントＰＣのバックアップ元フォルダや格納先フォルダからは削除される。そのため、図３では削除されるバックアップデータは破線で示されている。クライアントＰＣのバックアップ元フォルダのバックアップデータＦ１０〜Ｆ１Ｘは削除されて、ショートカットファイルＦ３０〜Ｆ３Ｘに置き換えられる。 The target data processed in this embodiment is backup data F10 to F1X and shortcut files F30 to F3X. The backup data F10 to F1X are held in the data reference folder of the data reference server, and are deleted from the backup source folder and the storage destination folder of the client PC. Therefore, in FIG. 3, the backup data to be deleted is indicated by a broken line. The backup data F10 to F1X in the backup source folder of the client PC are deleted and replaced with shortcut files F30 to F3X.

各機能構成部の概要動作をさらに詳細に説明する。 The general operation of each functional component will be described in more detail.

クライアントＰＣ・Ｎ１００〜Ｎ１０Ｘは各クライアントＰＣであり、本情報処理システムのデータ排除対象である。重複検索サーバＮ２００は本情報処理システムの処理サーバであり、データの処理を行うサーバである。データ参照サーバＮ３００は全非重複データの格納先サーバであり、クライアントＰＣに保管されるショートカットファイルのデータ参照先サーバである。管理端末ＰＣ・Ｎ４００は、本情報処理システムを起動させる端末であり、重複検索サーバの中央処理部ＳＷ１０を起動させる端末である。 Client PCs N100 to N10X are client PCs and are data exclusion targets of the information processing system. The duplicate search server N200 is a processing server of the information processing system, and is a server that processes data. The data reference server N300 is a storage destination server for all non-redundant data, and is a data reference destination server for shortcut files stored in the client PC. The management terminal PC / N400 is a terminal that activates the information processing system, and is a terminal that activates the central processing unit SW10 of the duplicate search server.

クライアントＰＣ・Ｎ１００〜Ｎ１０Ｘのバックアップデータ元フォルダＦ１１０〜Ｆ１１Ｘは、クライアントＰＣ上のバックアップデータを保存しているフォルダである。バックアップＤＢに当フォルダへのパス情報を書き込む
重複検索サーバＮ２００の中央処理部ＳＷ１０は管理端末ＰＣ・Ｎ４００からの起動命令から起動する処理部で、主な機能として各処理部への起動を中心に行う司令塔の役割を持つ処理部である。また、処理終了時の管理端末ＰＣ・Ｎ４００への処理終了伝達も行う。バックアップ制御部ＳＷ２０は中央処理部ＳＷ１０からの起動命令により起動する処理部で、主な機能として全クライアントＰＣのバックアップの役割を持つ処理部である。比較計算部ＳＷ３０は中央処理部ＳＷ１０からの起動命令により起動する処理部で、主な機能としてバックアップＤＢとデータ参照サーバＤＢとのデータ比較や、各レジスタ値のデータ比較の役割を持つ処理部である。データ作成部ＳＷ４０は中央処理部ＳＷ１０からの起動命令により起動する処理部で、主な機能としてデータおよびフォルダの生成や削除の役割や、ショートカットファイルの作成の役割を持つ処理部である。 The backup data source folders F110 to F11X of the client PCs N100 to N10X are folders that store backup data on the client PC. Write the path information to this folder in the backup DB. The central processing unit SW10 of the duplicate search server N200 is a processing unit that is activated from the activation command from the management terminal PC / N400, with the main function being the activation to each processing unit. It is a processing department that has the role of a command tower. Also, the process end is transmitted to the management terminal PC N400 at the end of the process. The backup control unit SW20 is a processing unit that is activated by an activation instruction from the central processing unit SW10, and is a processing unit that has a role of backing up all client PCs as a main function. The comparison calculation unit SW30 is a processing unit that is activated by an activation instruction from the central processing unit SW10, and is a processing unit that plays a role of data comparison between the backup DB and the data reference server DB and data comparison of each register value as a main function. is there. The data creation unit SW40 is a processing unit that is activated by an activation command from the central processing unit SW10. The data creation unit SW40 is a processing unit that has a main function of creating and deleting data and folders and a role of creating shortcut files.

バックアップＤＢ・Ｄ１０は、バックアップ制御部ＳＷ２０からのバックアップ命令によりデータごとに昇順のＩＤを与えた、クライアントＰＣのフォルダへのパスを保持するパス保持部であるバックアップデータのデータベースである。バックアップＤＢ・Ｄ１０が取得する項目は、バックアップＩＤと、バックアップＩＤに対応付けられたファイル名、バックアップ格納先フォルダパス、ハッシュ値、バックアップ元クライアントフォルダパスとの５つである（図５Ｂ参照）。 The backup DB D10 is a database of backup data that is a path holding unit that holds the path to the folder of the client PC, which is given an ascending ID for each data by a backup command from the backup control unit SW20. There are five items acquired by the backup DB / D10: a backup ID, a file name associated with the backup ID, a backup storage destination folder path, a hash value, and a backup source client folder path (see FIG. 5B).

データ参照ＤＢ・Ｄ２０は、データ作成部ＳＷ４０からの非重複データの作成命令によりデータごとに昇順のＩＤを与えた、データ参照サーバのフォルダへのパスを蓄積するパス蓄積部であり、データ参照サーバＮ３００内のデータを管理するデータベースである。取得する項目は、フォルダＩＤと、フォルダＩＤに対応付けられたファイル名、データ参照フォルダパス、ハッシュ値との４つである（図５Ｂ参照）。 The data reference DB D20 is a path storage unit that stores an ascending ID for each data according to a non-duplicate data creation command from the data creation unit SW40, and accumulates a path to a folder of the data reference server. This is a database for managing data in N300. There are four items to be acquired: a folder ID, a file name associated with the folder ID, a data reference folder path, and a hash value (see FIG. 5B).

ＢＵＩＤ指数レジスタＲ１０は、バックアップＤＢの主キーとなるＩＤ値を示すレジスタである。ＢＵＩＤ合計レジスタＲ２０は、バックアップＤＢのデータ数の合計値を示すレジスタである。バックアップ制御部ＳＷ２０により生成されたカウンタ数をそのまま反映する。ＦＤＩＤ指数レジスタＲ３０は、データ参照ＤＢの主キーとなるＩＤ値を示すレジスタである。ＦＤＩＤ合計レジスタＲ４０は、データ参照ＤＢのデータ数の合計値を示すレジスタである。 The BUID index register R10 is a register indicating an ID value serving as a primary key of the backup DB. The BUID total register R20 is a register indicating the total value of the number of data in the backup DB. The number of counters generated by the backup control unit SW20 is reflected as it is. The FDID index register R30 is a register that indicates an ID value that is a primary key of the data reference DB. The FDID total register R40 is a register indicating the total value of the number of data in the data reference DB.

格納先フォルダＦ２１０〜Ｆ２１Ｘは、バックアップ制御部ＳＷ２０による全クライアントＰＣのバックアップを一時保存しているフォルダである。格納先フォルダＦ２１０〜Ｆ２１Ｘに基づき、バックアップ制御部ＳＷ２０はバックアップＤＢに当フォルダへのパス情報を書き込む。 The storage destination folders F210 to F21X are folders that temporarily store backups of all client PCs by the backup control unit SW20. Based on the storage destination folders F210 to F21X, the backup control unit SW20 writes the path information to this folder in the backup DB.

データ参照先フォルダＦ３１０〜Ｆ３１Ｘは、データ参照サーバＮ３００上に保管されるデータの保管先で、クライアントＰＣがデータを参照する先のフォルダである。 Data reference destination folders F310 to F31X are storage destinations of data stored on the data reference server N300, and are folders to which the client PC refers to data.

重複検索サーバＮ２００の稼働中、データ作成部ＳＷ４０は、対象バックアップデータが非重複データであった場合、データ参照サーバＮ３００に新規のデータ参照フォルダを作成する。バックアップデータＦ１０〜Ｆ１Ｘは、元々クライアントＰＣ上に保管されていたデータである。また、バックアップ制御部ＳＷ２０によりバックアップされる対象のデータでもある。また、データ作成部ＳＷ４０によりデータを削除されるデータ、かつ、データ参照先フォルダへ保管されるデータでもある。 During the operation of the duplicate search server N200, the data creation unit SW40 creates a new data reference folder in the data reference server N300 when the target backup data is non-duplicate data. The backup data F10 to F1X are data originally stored on the client PC. It is also data to be backed up by the backup control unit SW20. Further, the data is deleted by the data creation unit SW40 and stored in the data reference destination folder.

ショートカットＳ１０は、クライアントＰＣのバックアップ元フォルダから、対象バックアップデータが格納されたデータ参照先フォルダを参照するショートカットファイルである。このショートカットＳ１０は、対象のバックアップデータが格納されたデータ参照先フォルダへのショートカットとしてデータ作成部ＳＷ４０が作成し、通知されたものである。 The shortcut S10 is a shortcut file that refers to the data reference destination folder in which the target backup data is stored from the backup source folder of the client PC. This shortcut S10 is created and notified by the data creation unit SW40 as a shortcut to the data reference destination folder in which the target backup data is stored.

カウンタＣ１０は、バックアップ制御部ＳＷ２０からバックアップ命令により作成されるバックアップＤＢのＩＤ数をカウントするカウンタである。カウンタは、そのままバックアップ制御部ＳＷ２０によりＢＵＩＤ合計レジスタ値に反映される。 The counter C10 is a counter that counts the number of backup DB IDs created by the backup command from the backup control unit SW20. The counter is directly reflected in the BUID total register value by the backup control unit SW20.

《本実施形態に係る重複検索サーバのハードウェア構成》
図４は、本実施形態に係る重複検索サーバＮ２００のハードウェア構成を示すブロック図である。 << Hardware Configuration of Duplicate Search Server According to this Embodiment >>
FIG. 4 is a block diagram showing a hardware configuration of the duplicate search server N200 according to the present embodiment.

図４で、ＣＰＵ４１０は演算制御用のプロセッサであり、プログラムを実行することで図３の各機能構成部を実現する。ＲＯＭ４２０は、初期データ及びプログラムなどの固定データ及びプログラムを記憶する。通信制御部４３０は、ネットワークを介してクライアントＰＣ・Ｎ１００〜Ｎ１０Ｘ、データ参照サーバＮ３００及び管理端末ＰＣ・Ｎ４００と通信する。通信は無線でも有線でもよい。 In FIG. 4, a CPU 410 is a processor for arithmetic control, and implements each functional component of FIG. 3 by executing a program. The ROM 420 stores fixed data and programs such as initial data and programs. The communication control unit 430 communicates with the client PCs N100 to N10X, the data reference server N300, and the management terminal PC N400 via the network. Communication may be wireless or wired.

ＲＡＭ４４０は、ＣＰＵ４１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ４４０には、本実施形態の実現に必要なデータを記憶する領域が確保されている。４４１は、ＢＵＩＤ指数レジスタである。４４２は、ＢＵＩＤ合計レジスタである。４４３は、ＦＤＩＤ指数レジスタである。４４４は、ＦＤＩＤ合計レジスタである。４４５は、カウンタでありバックアップＤＢ・Ｄ１０のカウンタＣ１０として機能する。４４６は、重複検索サーバＮ２００が各バックアップデータに対応して作成するショートカットデータである（図５Ａ参照）。 The RAM 440 is a random access memory that the CPU 410 uses as a work area for temporary storage. In the RAM 440, an area for storing data necessary for realizing the present embodiment is secured. Reference numeral 441 denotes a BUID index register. Reference numeral 442 denotes a BUID total register. Reference numeral 443 denotes an FDID index register. Reference numeral 444 denotes an FDID total register. Reference numeral 445 denotes a counter that functions as the counter C10 of the backup DB D10. Reference numeral 446 denotes shortcut data created by the duplicate search server N200 corresponding to each backup data (see FIG. 5A).

ストレージ４５０は、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータ又はプログラムが記憶されている。４５１は、一時的にバックアップデータを格納する格納先フォルダである。４５２は、バックアップＤＢである（図５Ｂ参照）。４５３は、データ参照ＤＢである（図５Ｂ参照）。ストレージ４５０には、以下のプログラムが格納される。４５４は、全体の処理を実行させる重複ファイル検索プログラムであり、図３の中央処理部ＳＷ１０の処理に対応する（図６参照）。４５５は、クライアントＰＣのバックアップデータを制御するバックアップ制御モジュールであり、図３のバックアップ制御部ＳＷ２０の処理に対応する。４５６は、重複ファイル検索プログラム４５４の分岐処理においてレジスタ内容やカウンタ値を比較する比較形成モジュールであり、図３の比較計算部ＳＷ３０の処理に対応する。４５７は、データ参照サーバへのバックアップデータの記憶やショートカットの作成を行なうデータ作成モジュールであり、図３のデータ作成部ＳＷ４０の処理に対応する。 The storage 450 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. A storage destination folder 451 temporarily stores backup data. Reference numeral 452 denotes a backup DB (see FIG. 5B). Reference numeral 453 denotes a data reference DB (see FIG. 5B). The storage 450 stores the following programs. Reference numeral 454 denotes a duplicate file search program for executing the entire process, which corresponds to the process of the central processing unit SW10 in FIG. 3 (see FIG. 6). A backup control module 455 controls backup data of the client PC and corresponds to the processing of the backup control unit SW20 in FIG. Reference numeral 456 denotes a comparison formation module that compares register contents and counter values in the branch process of the duplicate file search program 454, and corresponds to the process of the comparison calculation unit SW30 in FIG. Reference numeral 457 denotes a data creation module for storing backup data in the data reference server and creating a shortcut, and corresponds to the processing of the data creation unit SW40 in FIG.

なお、図４には、本実施形態に必須なデータやプログラムのみが示されており、ＯＳなどの汎用のデータやプログラムは図示されていない。 Note that FIG. 4 shows only data and programs essential to the present embodiment, and general-purpose data and programs such as OS are not shown.

（ショートカットデータの構成）
図５Ａは、重複検索サーバＮ２００のデータ作成部ＳＷ４０が作成するショートカットデータ４４６の構成を示す図である。 (Composition of shortcut data)
FIG. 5A is a diagram showing a configuration of shortcut data 446 created by the data creation unit SW40 of the duplicate search server N200.

ショートカットデータ４４６には、クライアントＰＣのバックアップ元クライアントフォルダへのバックアップ元クライアントフォルダパス５１１に対応付けられて、次のデータが記憶される。そのデータは、ファイル名５１２と、データ参照サーバＮ３００のデータ参照先フォルダパス５１３と、ファイル内のバックアップデータのハッシュ値５１４とである。 The shortcut data 446 stores the following data in association with the backup source client folder path 511 to the backup source client folder of the client PC. The data includes a file name 512, a data reference destination folder path 513 of the data reference server N300, and a hash value 514 of backup data in the file.

（バックアップＤＢとデータ参照ＤＢとの構成）
図５Ｂは、重複検索サーバＮ２００のバックアップＤＢ４５２とデータ参照ＤＢ４５３の構成を示す図である。 (Configuration of backup DB and data reference DB)
FIG. 5B is a diagram showing the configuration of the backup DB 452 and the data reference DB 453 of the duplicate search server N200.

バックアップＤＢ４５２には、バックアップＩＤ５２１に対応付けられて、次のデータが記憶される。そのデータは、ファイル名５２２と、重複検索サーバＮ２００の格納先フォルダへのバックアップ格納先フォルダパス５２３と、ファイル内のバックアップデータのハッシュ値５２４と、バックアップ元クライアントフォルダパス５２５とである。 The backup DB 452 stores the following data in association with the backup ID 521. The data includes a file name 522, a backup storage destination folder path 523 to the storage destination folder of the duplicate search server N200, a hash value 524 of backup data in the file, and a backup source client folder path 525.

データ参照ＤＢ４５３には、データ参照先フォルダのフォルダＩＤ５３１に対応付けられて、次のデータが記憶される。そのデータは、ファイル名５３２と、データ参照サーバＮ３００のデータ参照先フォルダへのデータ格納先フォルダパス５３３と、ファイル内のバックアップデータのハッシュ値５２４とである。 The data reference DB 453 stores the next data in association with the folder ID 531 of the data reference destination folder. The data includes a file name 532, a data storage destination folder path 533 to a data reference destination folder of the data reference server N300, and a hash value 524 of backup data in the file.

《本実施形態に係る重複検索サーバの処理手順》
図６は、本実施形態に係る重複検索サーバＮ２００の処理手順を示すフローチャートである。このフローチャートは、ＣＰＵ４１０によってＲＡＭ４４０を使用しながら実行されて、図３の重複検索サーバＮ２００の各処理部の機能が実現される。また、各処理部がＣＰＵを有する場合には、各ステップは各処理部の処理を含み、処理部間の情報伝達はコンピュータ間通信で実現される。従って、図６のフローチャートは中央処理部ＳＷ１０の処理手順に対応するものである、なお、図６の丸数字は、図７〜図１３の丸数字に対応する。 << Processing Procedure of Duplicate Search Server According to this Embodiment >>
FIG. 6 is a flowchart showing a processing procedure of the duplicate search server N200 according to the present embodiment. This flowchart is executed by the CPU 410 while using the RAM 440, and the functions of the processing units of the duplicate search server N200 in FIG. 3 are realized. When each processing unit has a CPU, each step includes processing of each processing unit, and information transmission between the processing units is realized by communication between computers. Therefore, the flowchart of FIG. 6 corresponds to the processing procedure of the central processing unit SW10. Note that the circled numbers in FIG. 6 correspond to the circled numbers in FIGS.

図６のフローチャートは管理端末ＰＣからの起動指示によりスタートする。まず、ステップＳ６１０において、重複検索サーバＮ２００の初期化を行なう（図７参照）。次に、ステップＳ６２０において、クライアントＰＣから重複検索サーバＮ２００の格納先フォルダに複写した新しいバックアップデータの有無を判断する（図８参照）。バックアップデータが無ければ、ステップＳ６９０の管理端末ＰＣ・Ｎ４００への報告をして処理を終了する。 The flowchart in FIG. 6 starts with an activation instruction from the management terminal PC. First, in step S610, the duplicate search server N200 is initialized (see FIG. 7). Next, in step S620, it is determined whether there is new backup data copied from the client PC to the storage destination folder of the duplicate search server N200 (see FIG. 8). If there is no backup data, a report is sent to the management terminal PC / N400 in step S690 and the process is terminated.

一方、バックアップデータが有ればステップＳ６３０に進んで、そのバックアップデータの重複データがデータ参照サーバＮ３００に有るか否かを判定する（図９参照）。重複データがデータ参照サーバＮ３００に有ると判別されればステップＳ６４０に進んで、データ参照サーバＮ３００のデータ参照先フォルダのバックアップデータへのショートカットを作成する。そして、クライアントＰＣのバックアップデータ元フォルダにショートカットを設定し、元のバックアップデータは削除する（図９参照）。一方、重複データがデータ参照サーバＮ３００に無いと判別されればステップＳ６５０に進んで、格納先フォルダに一時格納されているバックアップデータをデータ参照サーバＮ３００のデータ参照先フォルダに記憶する（図１０参照）。そして、ステップＳ６６０において、ステップＳ６５０で記憶したデータ参照サーバＮ３００のデータ参照先フォルダのバックアップデータへのショートカットを作成する。そして、クライアントＰＣのバックアップデータ元フォルダにショートカットを設定し、元のバックアップデータは削除する（図１１参照）。 On the other hand, if there is backup data, the process proceeds to step S630, and it is determined whether duplicate data of the backup data exists in the data reference server N300 (see FIG. 9). If it is determined that duplicate data exists in the data reference server N300, the process advances to step S640 to create a shortcut to the backup data of the data reference destination folder of the data reference server N300. Then, a shortcut is set in the backup data source folder of the client PC, and the original backup data is deleted (see FIG. 9). On the other hand, if it is determined that there is no duplicate data in the data reference server N300, the process advances to step S650 to store the backup data temporarily stored in the storage destination folder in the data reference destination folder of the data reference server N300 (see FIG. 10). ). In step S660, a shortcut to the backup data of the data reference destination folder of the data reference server N300 stored in step S650 is created. Then, a shortcut is set in the backup data source folder of the client PC, and the original backup data is deleted (see FIG. 11).

ステップＳ６４０あるいはＳ６６０で、クライアントＰＣのバックアップデータをショートカットに置き換えた後、ステップＳ６７０において、全てのバックアップデータを処理したか否かを判断する（図１２参照）。まだ処理していないバックアップデータが残っていればステップＳ６３０に戻って、次のバックアップデータの処理を繰り返す。全てのバックアップデータを処理していればステップＳ６８０に進んで、後処理をした後、ステップＳ６９０において管理端末ＰＣに処理終了を通知して、重複検索サーバＮ２００の処理を終了する。 After replacing the backup data of the client PC with the shortcut in step S640 or S660, it is determined in step S670 whether all the backup data has been processed (see FIG. 12). If backup data that has not yet been processed remains, the process returns to step S630 to repeat the next backup data process. If all the backup data has been processed, the process proceeds to step S680. After post-processing, the management terminal PC is notified of the end of processing in step S690, and the process of the duplicate search server N200 is terminated.

以下、図７〜図１３に図６の各ステップを更に詳細に示したフローチャートを示す。なお、図７〜図１３のフローチャートにおいては、中央処理部ＳＷ１０により各処理部が起動されるように記載されている。しかし、上述の如く、図４の構成であれば引き数を持って各モジュールを起動するものであり、複数ＣＰＵで構成されていれば各処理部へのコンピュータ通信により起動することになる。 7 to 13 are flowcharts showing the steps of FIG. 6 in more detail. In the flowcharts of FIGS. 7 to 13, it is described that each processing unit is activated by the central processing unit SW <b> 10. However, as described above, in the configuration shown in FIG. 4, each module is activated with an argument, and when configured with a plurality of CPUs, it is activated by computer communication to each processing unit.

（バックアップデータ有無の判断処理）
図７は、バックアップデータ有無の判断処理を含む処理手順を示すフローチャートである。図７は、管理端末機ＰＣ・Ｎ４００からの起動からバックアップ制御部ＳＷ２０によるバックアップ処理の終了までを示す。管理機端末ＰＣ・Ｎ４００が中央処理部ＳＷ１０を起動して、図７の処理が開始する。 (Judgment processing for backup data)
FIG. 7 is a flowchart showing a processing procedure including determination processing for the presence / absence of backup data. FIG. 7 shows from the startup from the management terminal PC · N400 to the end of the backup process by the backup control unit SW20. The management machine terminal PC · N400 activates the central processing unit SW10 and the processing of FIG. 7 starts.

まず、ステップＳ７０１において、中央処理部ＳＷ１０は、ＢＵＩＤ指数レジスタＲ１０とＦＤＩＤ指数レジスタＲ３０の値を初期化する。すなわち、ＢＵＩＤ指数レジスタＲ１０＝１とし、ＦＤＩＤ指数レジスタＲ３０＝１とする。次に、ステップＳ７０３において、中央処理部ＳＷ１０はバックアップ制御部ＳＷ２０を起動する。 First, in step S701, the central processing unit SW10 initializes the values of the BUID index register R10 and the FDID index register R30. That is, BUID index register R10 = 1 and FDID index register R30 = 1. Next, in step S703, the central processing unit SW10 activates the backup control unit SW20.

次に、ステップＳ７０５において、バックアップ制御部ＳＷ２０は、全クライアントＰＣデータの統合バックアップを開始する。ここで、格納先としては、格納先フォルダＦ２１０〜Ｆ２１Ｘを指定する。次に、ステップＳ７０７において、バックアップ制御部ＳＷ２０は、バックアップのデータベースであるバックアップＤＢ・Ｄ１０に、ステップＳ７０５におけるバックアップデータを反映する。なお、ステップＳ７０５でデータをバックアップする度にバックアップ数をカウントするカウンタＣ１０の値を反映する。次に、ステップＳ７０９において、バックアップ制御部ＳＷ２０は、カウンタＣ１０を読み、カウンタＣ１０＝０かどうか判定する。すなわち、バックアップデータがないかを判定する。 In step S705, the backup control unit SW20 starts an integrated backup of all client PC data. Here, the storage destination folders F210 to F21X are designated as the storage destination. Next, in step S707, the backup control unit SW20 reflects the backup data in step S705 on the backup DB · D10 which is a backup database. Note that the value of the counter C10 that counts the number of backups every time data is backed up in step S705 is reflected. Next, in step S709, the backup control unit SW20 reads the counter C10 and determines whether the counter C10 = 0. That is, it is determined whether there is backup data.

バックアップデータが無ければ図１３に進む。一方、バックアップデータがあればステップＳ７１１に進んで、バックアップ制御部ＳＷ２０は、ステップＳ７０５、Ｓ７０７が終了した時点でＢＵＩＤ合計レジスタＲ２０とカウンタＣ１０とを読み出す。そして、ＢＵＩＤ合計レジスタＲ２０にカウンタＣ１０の値を書き込みレジスタ値を更新する。すなわち、ＢＵＩＤ合計レジスタＲ２０＝カウンタＣ１０とする。次に、アウテップＳ７１３において、バックアップ制御部ＳＷ２０は処理を終了し、中央処理部ＳＷ１０にバックアップ終了の伝達をする。 If there is no backup data, the process proceeds to FIG. On the other hand, if there is backup data, the process proceeds to step S711, and the backup control unit SW20 reads the BUID total register R20 and the counter C10 when steps S705 and S707 are completed. Then, the value of the counter C10 is written in the BUID total register R20 to update the register value. That is, BUID total register R20 = counter C10. Next, in step S713, the backup control unit SW20 ends the process, and transmits the backup end to the central processing unit SW10.

（重複データの判断処理）
図８は、重複データの判断処理を含む処理手順を示すフローチャートである。図８は、比較計算部ＳＷ３０による重複データ検査及び終了までを示す。 (Duplicate data judgment process)
FIG. 8 is a flowchart showing a processing procedure including a duplicate data determination process. FIG. 8 shows the duplication data inspection and the end by the comparison calculation unit SW30.

まず、ステップＳ８０１において、中央処理部ＳＷ１０は、ステップＳ７１３においてバックアップ制御部ＳＷ２０の終了伝達を受け取った時点で、自動的に比較計算部ＳＷ３０を起動する。また、ステップＳ１２０９において比較計算部ＳＷ３０のまだ重複排除検索が必要なバックアップデータが存在することの伝達を受け取った時点で、再自動的に比較計算部ＳＷ３０を起動する。 First, in step S801, the central processing unit SW10 automatically activates the comparison calculation unit SW30 when it receives the end notification of the backup control unit SW20 in step S713. In step S1209, the comparison calculation unit SW30 is automatically activated again when it receives notification from the comparison calculation unit SW30 that there is still backup data that requires deduplication search.

次に、ステップＳ８０３において、比較計算部ＳＷ３０は、ＦＤＩＤ指数レジスタＲ３０とＦＤＩＤ合計レジスタＲ４０を読込み、ＦＤＩＤ指数レジスタＲ３０の値がＦＤＩＤ合計レジスタＲ４０の値より大きい値かを比較する（ＦＤＩＤ指数レジスタＲ３０＞ＦＤＩＤ合計レジスタＲ４０）。すなわち、対象バックアップデータはデータ参照ＤＢ内データと全て比較したか、を判定する。ＦＤＩＤ指数レジスタＲ３０＞ＦＤＩＤ合計レジスタＲ４０の場合はステップＳ８１１に進んで、比較計算部ＳＷ３０は処理を終了して、中央処理部ＳＷ１０に比較系差結果として「非重複」データであることを伝達する。 Next, in step S803, the comparison calculation unit SW30 reads the FDID exponent register R30 and the FDID total register R40, and compares whether the value of the FDID exponent register R30 is larger than the value of the FDID total register R40 (FDID exponent register R30). > FDID total register R40). That is, it is determined whether the target backup data is all compared with the data in the data reference DB. If FDID index register R30> FDID total register R40, the process proceeds to step S811, and the comparison calculation unit SW30 ends the process and notifies the central processing unit SW10 that the data is “non-overlapping” data as a comparison system difference result. .

一方、ＦＤＩＤ指数レジスタＲ３０≦ＦＤＩＤ合計レジスタＲ４０の場合はステップＳ８０５に進み、ＢＵＩＤ指数レジスタＲ１０とＦＤＩＤ指数レジスタＲ３０とを読み込む。次に、ステップＳ８０７において、ステップＳ８０５で読み込んだＢＵＩＤ指数レジスタＲ１０とＦＤＩＤ指数レジスタＲ３０とをそれぞれのデータベースのＩＤ値としたハッシュ値を比較する。すなわち、バックアップＤＢ・Ｄ１０におけるバックアップＩＤがＢＵＩＤ指数レジスタ値のハッシュ値＝データ参照ＤＢ・Ｄ２０におけるフォルダＩＤがＦＤＩＤ指数レジスタ値のハッシュ値を比較する。その結果から、対象データは既にデータ参照サーバに存在するかを判断する。ハッシュ値比較で一致すればステップＳ８１３に進んで、比較計算部ＳＷ３０は処理を終了し、中央処理部ＳＷ１０に比較計算結果として、「重複」データであることを伝達する。 On the other hand, if FDID exponent register R30 ≦ FDID total register R40, the process advances to step S805 to read the BUID exponent register R10 and the FDID exponent register R30. Next, in step S807, hash values using the BUID index register R10 and the FDID index register R30 read in step S805 as the ID values of the respective databases are compared. That is, the backup ID in the backup DB · D10 is the hash value of the BUID index register value = the hash value of the folder ID in the data reference DB · D20 is the FDID index register value. From the result, it is determined whether the target data already exists in the data reference server. If they match in the hash value comparison, the process proceeds to step S813, where the comparison calculation unit SW30 ends the process and notifies the central processing unit SW10 that the data is “duplicate” as a comparison calculation result.

一方、ハッシュ値比較で一致しなければステップＳ８０９に進んで、比較計算部ＳＷ３０は、ＦＤＩＤ指数レジスタＲ３０の値に＋１をした値を書き込む（ＦＤＩＤ指数レジスタＲ３０＝ＦＤＩＤ指数レジスタＲ３０＋１）。次に、ステップＳ８０３に戻って、比較計算部ＳＷ３０は、再び、ＦＤＩＤ指数レジスタＲ３０とＦＤＩＤ合計レジスタＲ４０を読込み、ＦＤＩＤ指数レジスタＲ３０の値がＦＤＩＤ合計レジスタＲ４０の値より大きい値かを比較する（ＦＤＩＤ指数レジスタＲ３０＞ＦＤＩＤ合計レジスタＲ４０）。 On the other hand, if they do not match in the hash value comparison, the process advances to step S809, and the comparison calculation unit SW30 writes a value obtained by adding +1 to the value of the FDID exponent register R30 (FDID exponent register R30 = FDID exponent register R30 + 1). Next, returning to step S803, the comparison calculation unit SW30 reads the FDID index register R30 and the FDID total register R40 again and compares whether the value of the FDID index register R30 is larger than the value of the FDID total register R40 ( FDID index register R30> FDID total register R40).

（重複データ有りの場合のクライアントＰＣ用のショートカット作成処理）
図９は、重複データ有りの場合のクライアントＰＣ用のショートカット作成処理を含む処理手順を示すフローチャートである。図８での重複データ検査の結果、対象データが重複データであった場合の処理方法を示す。 (Shortcut creation process for client PC when duplicate data exists)
FIG. 9 is a flowchart showing a processing procedure including a shortcut creation process for a client PC when there is duplicate data. A processing method when the target data is duplicate data as a result of the duplicate data inspection in FIG. 8 will be described.

ステップＳ９０１において、中央処理部ＳＷ１０は、ステップＳ８１３の比較計算部ＳＷ３０の「対象ファイルが重複データ」である伝達を受け取った時点で、自動的にデータ作成部ＳＷ４０を起動する。次に、ステップＳ９０３において、データ作成部ＳＷ４０は、ＦＤＩＤ指数レジスタＲ３０を読み込む。次に、ステップＳ９０５において、データ作成部ＳＷ４０は、データ参照ＤＢ・Ｄ２０を読み込む。次に、ステップＳ９０７において、データ作成部ＳＷ４０は、ＢＵＩＤ指数レジスタＲ１０を読み込む。次に、ステップＳ９０９において、データ作成部ＳＷ４０は、バックアップＤＢ・Ｄ１０を読み込む。次に、ステップＳ９１１において、データ作成部ＳＷ４０は、ステップＳ９０９で読み込んだバックアップＤＢ・Ｄ１０のバックアップＩＤ列の内、ステップＳ９０７で読み込んだＢＵＩＤ指数レジスタＲ１０の値の行のバックアップ元クライアントフォルダパスを参照する。そして、対象のバックアップ元クライアントパス上に、ステップＳ９０５で読み込んだデータ参照ＤＢ・Ｄ２０のフォルダＩＤ列の内、ステップＳ９０３で読み込んだＦＤＩＤ指数レジスタＲ３０の値の行のデータ参照先フォルダパスへのショートカットＳ１０を新規作成する。 In step S901, the central processing unit SW10 automatically activates the data creation unit SW40 when receiving the transmission that “the target file is duplicate data” from the comparison calculation unit SW30 in step S813. Next, in step S903, the data creation unit SW40 reads the FDID index register R30. In step S905, the data creation unit SW40 reads the data reference DB D20. Next, in step S907, the data creation unit SW40 reads the BUID index register R10. Next, in step S909, the data creation unit SW40 reads the backup DB · D10. Next, in step S911, the data creation unit SW40 refers to the backup source client folder path in the row of the value of the BUID index register R10 read in step S907 in the backup ID column of the backup DB • D10 read in step S909. To do. Then, on the target backup source client path, a shortcut to the data reference destination folder path in the row of the value of the FDID index register R30 read in step S903 in the folder ID column of the data reference DB D20 read in step S905. S10 is newly created.

次に、ステップＳ９１３において、データ作成部ＳＷ４０は、ステップＳ９０９で読み込んだバックアップＤＢ・Ｄ１０のバックアップＩＤ列の内、ステップＳ９０７で読み込んだＢＵＩＤ指数レジスタＲ１０の行の値のファイル名とバックアップ元クライアントフォルダパスとを参照する。そして、対象のバックアップ元クライアントパス上の同一ファイル名のファイルを削除する。次に、ステップＳ９１６において、データ作成部ＳＷ４０は、データ作成部ＳＷ４０を終了し、中央処理部ＳＷ１０にデータ作成終了を伝達する。 In step S913, the data creation unit SW40 includes the file name and the backup source client folder in the row of the BUID index register R10 read in step S907 in the backup ID column of the backup DB • D10 read in step S909. Refer to the path. Then, the file with the same file name on the target backup source client path is deleted. Next, in step S916, the data creation unit SW40 terminates the data creation unit SW40 and transmits the end of data creation to the central processing unit SW10.

（重複データ無しの場合のデータ参照サーバへのバックアップデータ記憶処理）
図１０は、重複データ無しの場合のデータ参照サーバへのバックアップデータ記憶処理を含む処理手順を示すフローチャートである。図１０は、図８での重複データ検査の結果、対象データが非重複データであった場合の処理方法の内、データ参照サーバへのバックアップデータ記憶処理を示す。 (Backup data storage processing to the data reference server when there is no duplicate data)
FIG. 10 is a flowchart showing a processing procedure including a backup data storage process to the data reference server when there is no duplicate data. FIG. 10 shows backup data storage processing to the data reference server in the processing method when the target data is non-duplicate data as a result of the duplicate data check in FIG.

まず、ステップＳ１００１において、中央処理部ＳＷ１０は、ステップＳ８１１の比較計算部ＳＷ３０の「対象ファイルが非重複データ」である伝達を受け取った時点で、自動的にデータ作成部ＳＷ４０を起動する。次に、ステップＳ１００３において、データ作成部ＳＷ４０は、ＦＤＩＤ指数レジスタＲ３０を読み込む。次に、ステップＳ１００５において、データ作成部ＳＷ４０は、データ参照サーバＮ３００のＤドライブ上にバックアップ参照先フォルダＦ３１Ｘを新規作成する。なお、フォルダ名は、ステップＳ１００３で読み込んだＦＤＩＤ指数レジスタＲ３０の値にする。 First, in step S1001, the central processing unit SW10 automatically activates the data creation unit SW40 when it receives the transmission that “the target file is non-duplicate data” from the comparison calculation unit SW30 in step S811. Next, in step S1003, the data creation unit SW40 reads the FDID index register R30. Next, in step S1005, the data creation unit SW40 creates a new backup reference destination folder F31X on the D drive of the data reference server N300. Note that the folder name is set to the value of the FDID index register R30 read in step S1003.

次に、ステップＳ１００７において、データ作成部ＳＷ４０は、ＦＤＩＤ合計レジスタＲ４０の値に＋１をした値を書き込む（ＦＤＩＤ合計レジスタＲ４０＝ＦＤＩＤ合計レジスタＲ４０＋１）。次に、ステップＳ１００９において、データ作成部ＳＷ４０は、データ参照ＤＢ・Ｄ２０を読み込み、データベースの最後行に、ステップＳ１００３で読み込んだＦＤＩＤ指数レジスタＲ３０の値がフォルダＩＤの値となるべく行を追加する。この時点で、データ参照ＤＢ・Ｄ２０の最終行には、フォルダＩＤとデータ参照先フォルダパスの列とを書き込む。 Next, in step S1007, the data creation unit SW40 writes a value obtained by adding +1 to the value of the FDID total register R40 (FDID total register R40 = FDID total register R40 + 1). Next, in step S1009, the data creation unit SW40 reads the data reference DB D20, and adds a row to the last row of the database so that the value of the FDID index register R30 read in step S1003 becomes the folder ID value. At this time, the folder ID and the column of the data reference destination folder path are written in the last row of the data reference DB D20.

次に、ステップＳ１０１１において、データ作成部ＳＷ４０は、ＢＵＩＤ指数レジスタＲ１０を読み込む。次に、ステップＳ１０１３において、データ作成部ＳＷ４０は、バックアップＤＢ・Ｄ１０を読み込む。次に、ステップＳ１０１５において、データ作成部ＳＷ４０は、対象のバックアップデータＦ１０を読み込む。すなわち、ステップＳ１０１３で読み込んだバックアップＤＢ・Ｄ１０のバックアップＩＤの値が、ステップＳ１０１１で読み込んだＢＵＩＤ指数レジスタＲ１０の値である行の、バックアップ格納先フォルダパスを参照し、対象のバックアップデータを読み込む。 Next, in step S1011, the data creation unit SW40 reads the BUID index register R10. In step S1013, the data creation unit SW40 reads the backup DB D10. Next, in step S1015, the data creation unit SW40 reads the target backup data F10. That is, the target backup data is read by referring to the backup storage destination folder path in the row in which the backup ID value of the backup DB D10 read in step S1013 is the value of the BUID index register R10 read in step S1011.

（重複データ無しの場合のクライアントＰＣ用のショートカット作成処理）
図１１は、重複データ無しの場合のクライアントＰＣ用のショートカット作成処理を含む処理手順を示すフローチャートである。図１１は、図８での重複データ検査の結果、対象データが非重複データであった場合の処理方法の内、図１０に続くクライアントＰＣ用のショートカット作成処理を示す。 (Shortcut creation process for client PC when there is no duplicate data)
FIG. 11 is a flowchart showing a processing procedure including a shortcut creation process for a client PC when there is no duplicate data. FIG. 11 shows a shortcut creation process for the client PC subsequent to FIG. 10 in the processing method when the target data is non-duplicate data as a result of the duplicate data check in FIG.

まず、ステップＳ１１０１において、データ作成部ＳＷ４０は、データ参照ＤＢ・Ｄ２０を読み込む。次に、ステップＳ１１０３において、データ作成部ＳＷ４０は、ステップＳ１００５で作成したデータ参照サーバ上の新規フォルダへ、ステップＳ１０１５で読み込んだバックアップデータＦ１０を複製する。すなわち、ステップＳ１１０１で読み込んだデータ参照ＤＢ・Ｄ２０のバックアップＩＤ列が、ステップＳ１００３で読み込んだＦＤＩＤ指数レジスタＲ３０の行のデータ参照先フォルダパスを参照する。そして、対象のパス先へ、ステップＳ１０１５で読み込んだバックアップデータＦ１０を複製する。 First, in step S1101, the data creation unit SW40 reads the data reference DB · D20. In step S1103, the data creation unit SW40 copies the backup data F10 read in step S1015 to the new folder on the data reference server created in step S1005. That is, the backup ID column of the data reference DB D20 read in step S1101 refers to the data reference destination folder path in the row of the FDID index register R30 read in step S1003. Then, the backup data F10 read in step S1015 is copied to the target path destination.

次に、ステップＳ１１０５において、データ作成部ＳＷ４０は、ステップＳ１００９で書き込んだデータ参照ＤＢ・Ｄ２０のフォルダＩＤの値がＦＤＩＤ指数レジスタＲ３０である空白列のファイル名とハッシュ値とに、ステップＳ１０１３で読み込んだバックアップＤＢ・Ｄ１０のステップＳ１０１１で読み込んだバックアップＩＤの値がＢＵＩＤ指数レジスタＲ１０の行のファイル名とハッシュ値とを書き込む。すなわち、この時点で、ステップＳ１００９で空白であったデータ参照ＤＢの最終行のファイル名とハッシュ値とが書き込まれる。 In step S1105, the data creation unit SW40 reads in step S1013 the file name and hash value of the blank string whose folder ID value of the data reference DB D20 written in step S1009 is the FDID index register R30. The backup ID value read in step S1011 of the backup DB • D10 writes the file name and hash value in the row of the BUID index register R10. That is, at this point, the file name and hash value of the last line of the data reference DB that was blank in step S1009 are written.

次に、ステップＳ１１０７において、データ作成部ＳＷ４０は、ステップＳ１０１３で読み込んだバックアップＤＢ・Ｄ１０のバックアップＩＤ列が、ステップＳ１０１１で読み込んだＢＵＩＤ指数レジスタＲ１０の値の行のバックアップ元クライアントフォルダパスの列を参照する。また、対象のバックアップ元クライアントパス上に、ステップＳ１１０１で読み込んだデータ参照ＤＢ・Ｄ２０のフォルダＩＤ列が、ステップＳ１０１１で読み込んだＦＤＩＤ指数レジスタＲ３０の値の行の、データ参照先フォルダパスの列を参照する。そして、対象のデータ参照先フォルダパスへのショートカットＳ１０を新規作成する。すなわち、対象バックアップデータのクライアントパス元へデータ複製したデータ参照先へのショートカットファイルを作成する。 In step S1107, the data creation unit SW40 sets the backup source client folder path column in the row of the value of the BUID index register R10 read in step S1011 as the backup ID column of the backup DB D10 read in step S1013. refer. Further, on the target backup source client path, the folder ID column of the data reference DB D20 read in step S1101 is the column of the data reference destination folder path in the row of the value of the FDID index register R30 read in step S1011. refer. Then, a shortcut S10 to the target data reference destination folder path is newly created. That is, a shortcut file is created for the data reference destination that has been copied to the client path source of the target backup data.

次に、ステップＳ１１０９において、データ作成部ＳＷ４０は、ステップＳ１０１３で読み込んだバックアップＤＢ・Ｄ１０のバックアップＩＤ列が、ステップＳ１０１１で読み込んだＢＵＩＤ指数レジスタＲ１０の値の行の、バックアップ元クライアントフォルダパスの列を参照する。そして、対象のバックアップ元クライアントパス上の対象バックアップデータと同一の対象バックアップデータＦ１０を削除する。この時点で、クライアントＰＣには、対象バックアップデータは削除され、変わりに非重複データの格納先であるデータ参照サーバへのショートカットに置き換えられる。次に、ステップＳ１１１１において、データ作成部ＳＷ４０は処理を終了し、中央処理部ＳＷ１０にバックアップ終了を伝達する。 In step S1109, the data creation unit SW40 sets the backup source client folder path column in the row of the value of the BUID index register R10 read in step S1011 in the backup ID column of the backup DB D10 read in step S1013. Refer to Then, the same target backup data F10 as the target backup data on the target backup source client path is deleted. At this point, the target backup data is deleted in the client PC, and is replaced with a shortcut to the data reference server that is the storage destination of non-duplicate data. Next, in step S1111, the data creation unit SW40 ends the process, and transmits the backup end to the central processing unit SW10.

（処理済みでないバックアップデータ有無の判断処理）
図１２は、処理済みでないバックアップデータ有無の判断処理を含む処理手順を示すフローチャートである。図１２は、対象データの比較を全て行って終了するか継続するかの判定処理を示す。 (Judgment process for the presence of unprocessed backup data)
FIG. 12 is a flowchart showing a processing procedure including determination processing for the presence / absence of unprocessed backup data. FIG. 12 shows a process for determining whether to end or continue by comparing all target data.

まず、ステップＳ１２０１において、中央処理部ＳＷ１０は、ＢＵＩＤ指数レジスタＲ１０を読み込み、ＢＵＩＤ指数レジスタＲ１０の値に＋１をした値を書き込む（ＢＵＩＤ指数レジスタＲ１０＝ＢＵＩＤ指数レジスタＲ１０＋１）。次に、ステップＳ１２０３において、中央処理部ＳＷ１０は、ＦＤＩＤ指数レジスタＲ３０を読み込み、ＦＤＩＤ指数レジスタＲ３０の値を初期化する。すなわち、ＦＤＩＤ指数レジスタＲ３０＝１とする。次に、ステップＳ１２０５において、中央処理部ＳＷ１０は、比較計算部ＳＷ３０を起動する。 First, in step S1201, the central processing unit SW10 reads the BUID index register R10 and writes a value obtained by adding +1 to the value of the BUID index register R10 (BUID index register R10 = BUID index register R10 + 1). Next, in step S1203, the central processing unit SW10 reads the FDID exponent register R30 and initializes the value of the FDID exponent register R30. That is, FDID index register R30 = 1. Next, in step S1205, the central processing unit SW10 activates the comparison calculation unit SW30.

次に、ステップＳ１２０７において、比較計算部ＳＷ３０は、ＢＵＩＤ指数レジスタＲ１０とＢＵＩＤ合計レジスタＲ２０とを読み込み、ＢＵＩＤ指数レジスタＲ１０の値がＢＵＩＤ合計レジスタＲ２０の値より大きい値か比較する（ＢＵＩＤ指数レジスタＲ１０＞ＢＵＩＤ合計レジスタＲ２０）。すなわち、全バックアップデータを全て処理したかの判断である。ＢＵＩＤ指数レジスタＲ１０≦ＢＵＩＤ合計レジスタＲ２０の場合はステップＳ１２０９に進んで、比較計算部ＳＷ３０は処理を終了し、中央処理部ＳＷ１０にまだ重複排除検索が必要なバックアップデータが存在することを伝達して、ステップＳ８０１に戻る。すなわち、次のバックアップデータの処理へ進む。一方、ＢＵＩＤ指数レジスタＲ１０＞ＢＵＩＤ合計レジスタＲ２０の場合はステップＳ１２１１に進んで、比較計算部ＳＷ３０は処理を終了し、中央処理部ＳＷ１０に重複排除検索が必要なバックアップデータが存在しないことを伝達して、ステップＳ１３０１に進む。 Next, in step S1207, the comparison calculation unit SW30 reads the BUID index register R10 and the BUID total register R20 and compares whether the value of the BUID index register R10 is larger than the value of the BUID total register R20 (BUID index register R10). > BUID total register R20). That is, it is a determination whether all the backup data has been processed. In the case of BUID index register R10 ≦ BUID total register R20, the process proceeds to step S1209, the comparison calculation unit SW30 ends the processing, and notifies the central processing unit SW10 that there is backup data that still needs deduplication search. Return to step S801. That is, the process proceeds to the next backup data processing. On the other hand, if BUID index register R10> BUID total register R20, the process proceeds to step S1211, the comparison calculation unit SW30 ends the process, and notifies the central processing unit SW10 that there is no backup data that requires deduplication search. Then, the process proceeds to step S1301.

（後処理）
図１３は、後処理を含む処理手順を示すフローチャートである。図１３は、後処理としての最終的な初期化処理を示す。 (Post-processing)
FIG. 13 is a flowchart showing a processing procedure including post-processing. FIG. 13 shows final initialization processing as post-processing.

まず、ステップＳ１３０１において、中央処理部ＳＷ１０は、ステップＳ１２１１の比較計算部ＳＷ３０の「必要なバックアップデータが存在しない」という伝達を受け取った時点で、自動的にバックアップ制御部ＳＷ２０を起動する。 First, in step S1301, the central processing unit SW10 automatically activates the backup control unit SW20 when it receives a notification that “the necessary backup data does not exist” from the comparison calculation unit SW30 in step S1211.

次に、ステップＳ１３０３において、バックアップ制御部ＳＷ２０は、バックアップＤＢ・Ｄ１０のカウンタＣ１０を読み込む。次に、ステップＳ１３０５において、バックアップ制御部ＳＷ２０は、ステップＳ１３０３で読み込んだバックアップＤＢ・Ｄ１０とカウンタＣ１０との情報から、バックアップ格納先フォルダ内の全バックアップデータを消去する。次に、ステップＳ１３０７において、バックアップ制御部ＳＷ２０は、ステップＳ１３０３で読み込んだバックアップＤＢ・Ｄ１０の２行目以降（項目以外）のデータを消去し、カウンタＣ１０を初期化する（カウンタＣ１０＝０）。次に、ステップＳ１３０９において、バックアップ制御部ＳＷ２０は、ステップＳ１３０５、Ｓ１３０７が終了した時点で、ＢＵＩＤ合計レジスタＲ２０の値を初期化する（ＢＵＩＤ合計レジスタＲ２０＝１）。次に、ステップＳ１３１１において、バックアップ制御部ＳＷ２０は、中央処理部ＳＷ１０に初期化の終了を伝達する。 In step S1303, the backup control unit SW20 reads the counter C10 of the backup DB · D10. In step S1305, the backup control unit SW20 deletes all backup data in the backup storage destination folder from the information of the backup DB D10 and the counter C10 read in step S1303. Next, in step S1307, the backup control unit SW20 deletes data in the second and subsequent rows (other than the items) of the backup DB • D10 read in step S1303, and initializes the counter C10 (counter C10 = 0). Next, in step S1309, the backup control unit SW20 initializes the value of the BUID total register R20 when the steps S1305 and S1307 are completed (BUID total register R20 = 1). Next, in step S1311, the backup control unit SW20 transmits the end of initialization to the central processing unit SW10.

次に、ステップＳ１３１３において、中央処理部ＳＷ１０は、ステップＳ１３１１のバックアップ制御部ＳＷ２０の「初期化の終了」の伝達を受け取った時点で、自動的に重複検索サーバＮ２００の処理を終了し、管理端末ＰＣ・Ｎ４００へ終了を通知して表示させる。 Next, in step S1313, the central processing unit SW10 automatically ends the processing of the duplicate search server N200 when receiving the “end of initialization” transmission from the backup control unit SW20 in step S1311, and the management terminal The PC / N 400 is notified of the end and displayed.

《本実施形態に係るクライアントＰＣのハードウェア構成》
図１４は、本実施形態に係るクライアントＰＣのハードウェア構成を示すブロック図である。 << Hardware Configuration of Client PC According to this Embodiment >>
FIG. 14 is a block diagram showing a hardware configuration of the client PC according to the present embodiment.

図１４で、ＣＰＵ１４１０は演算制御用のプロセッサであり、プログラムを実行することで図２Ｂの各機能構成部を実現する。ＲＯＭ１４２０は、初期データ及びプログラムなどの固定データ及びプログラムを記憶する。通信制御部１４３０は、ネットワークを介して重複検索サーバＮ２００及びデータ参照サーバＮ３００と通信する。通信は無線でも有線でもよい。 In FIG. 14, a CPU 1410 is a processor for arithmetic control, and implements each functional component of FIG. 2B by executing a program. The ROM 1420 stores fixed data and programs such as initial data and programs. The communication control unit 1430 communicates with the duplicate search server N200 and the data reference server N300 via the network. Communication may be wireless or wired.

ＲＡＭ１４４０は、ＣＰＵ１４１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ１４４０には、本実施形態の実現に必要なデータを記憶する領域が確保されている。１４４１は、本クライアントＰＣが処理する処理データである。１４４２は、重複検索サーバＮ２００から設定されたショートカットデータである（図４参照）。 The RAM 1440 is a random access memory that the CPU 1410 uses as a work area for temporary storage. The RAM 1440 has an area for storing data necessary for realizing the present embodiment. 1441 is processing data processed by the client PC. 1442 is shortcut data set from the duplicate search server N200 (see FIG. 4).

ストレージ１４５０は、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータ又はプログラムが記憶されている。１４５１は、データ参照サーバＮ３００のデータを参照するための、重複検索サーバＮ２００から設定されたショートカットデータを保持するショートカットテーブルである（図１５参照）。１４５２は、ショートカットデータの設定が完了すれば削除される、元のバックアップデータである。ストレージ１４５０には、以下のプログラムが格納される。１４５３は、処理データ１４４１を使って全体の処理を実行させるデータ処理プログラムである（図１６参照）。１４５４は、データ処理プログラム１４５３に含まれ、本実施形態により作成されたショートカットデータによりデータ参照サーバＮ３００のデータファイルを参照するファイルアクセスモジュールである。 The storage 1450 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. Reference numeral 1451 denotes a shortcut table that holds shortcut data set by the duplicate search server N200 for referring to data of the data reference server N300 (see FIG. 15). Reference numeral 1452 denotes original backup data that is deleted when the setting of the shortcut data is completed. The storage 1450 stores the following programs. Reference numeral 1453 denotes a data processing program for executing the entire processing using the processing data 1441 (see FIG. 16). Reference numeral 1454 denotes a file access module that is included in the data processing program 1453 and refers to the data file of the data reference server N300 using the shortcut data created according to the present embodiment.

入力インタフェース１４６０は、ユーザの指示あるいは機器からのデータ入力のためのインタフェースであり、たとえば、キーボード１４６１や、ユーザの指示などを入力するポインティングデバイス１４６２が接続されている。一方、出力インタフェース１４７０は、外部にデータを出力するためのインタフェースであり、たとえば、表示部１４７１が接続されている。 The input interface 1460 is an interface for inputting a user instruction or data from a device. For example, a keyboard 1461 and a pointing device 1462 for inputting a user instruction are connected. On the other hand, the output interface 1470 is an interface for outputting data to the outside, and for example, a display unit 1471 is connected thereto.

なお、図１４には、本実施形態に必須なデータやプログラムのみが示されており、ＯＳなどの汎用のデータやプログラムは図示されていない。 Note that FIG. 14 shows only data and programs essential to the present embodiment, and general-purpose data and programs such as OS are not shown.

（ショートカットテーブルの構成）
図１５は、本実施形態に係るショートカットテーブル１４５１の構成を示す図である。 (Shortcut table configuration)
FIG. 15 is a diagram showing a configuration of the shortcut table 1451 according to the present embodiment.

ショートカットテーブル１４５１には、クライアントＰＣのバックアップ元クライアントフォルダ１５０１に対応付けられて、バックアップ元クライアントフォルダ１５０１に含まれるファイル名１５０２が記憶される。そして、ファイル名１５０２に対応付けられて、次のデータが記憶される。そのデータは、データ参照サーバＮ３００のデータ参照先フォルダパス１５０３と、ファイル内のバックアップデータのハッシュ値１５０４とである。 The shortcut table 1451 stores a file name 1502 included in the backup source client folder 1501 in association with the backup source client folder 1501 of the client PC. Then, the following data is stored in association with the file name 1502. The data includes a data reference destination folder path 1503 of the data reference server N300 and a hash value 1504 of backup data in the file.

《本実施形態に係るクライアントＰＣの処理手順》
図１６は、本実施形態に係るクライアントＰＣ・Ｎ１００〜Ｎ１０Ｘの処理手順を示すフローチャートである。このフローチャートは、ＣＰＵ１４１０によってＲＡＭ１４４０を使用しながら実行されて、図１４のクライアントＰＣ・Ｎ１００〜Ｎ１０Ｘの各処理部の機能が実現される。 << Processing Procedure of Client PC According to this Embodiment >>
FIG. 16 is a flowchart showing a processing procedure of the client PCs N100 to N10X according to the present embodiment. This flowchart is executed by the CPU 1410 while using the RAM 1440, and the functions of the processing units of the client PCs N100 to N10X in FIG. 14 are realized.

まず、ステップＳ１６１０において、重複検索サーバＮ２００からの重複データの検索であるか否かが判定される。重複データの検索であればステップＳ１６１２に進んで、フォルダ中のデータファイルを重複検索サーバＮ２００の格納策フォルダに送信する。 First, in step S1610, it is determined whether or not a search for duplicate data from the duplicate search server N200. If it is a search for duplicate data, the process advances to step S1612 to transmit the data file in the folder to the storage policy folder of the duplicate search server N200.

また、ステップＳ１６２０においては、重複検索サーバＮ２００からのショートカットパスの設定であるかを判定する。ショートカットパスの設定であればステップＳ１６２２に進んで、ショートカットテーブル１４５１を更新する。次に、ステップＳ１６２４において、ショートカットパスがショートカットテーブル１４５１に記録されたデータファイルを、フォルダから削除する。 In step S1620, it is determined whether or not the shortcut path is set from the duplicate search server N200. If the shortcut path is set, the process advances to step S1622, and the shortcut table 1451 is updated. Next, in step S1624, the data file whose shortcut path is recorded in the shortcut table 1451 is deleted from the folder.

また、ステップＳ１６３０においては、フォルダ内のデータファイルへのアクセスであるかを判定する。データファイルへのアクセスであればステップＳ１６３２に進んで、ショートカットテーブル１４５１に記録されたショートカットパスに従い、データ参照サーバＮ３００のデータ参照フォルダをアクセスして、データを取得する。 In step S1630, it is determined whether the access is to a data file in the folder. If the access is to the data file, the process proceeds to step S1632, and the data reference folder of the data reference server N300 is accessed according to the shortcut path recorded in the shortcut table 1451 to acquire the data.

《本実施形態による具体例の処理》
以下、簡単な具体例に従って、本実施形態の図から〜図１３の処理を説明する。 << Specific Example Processing According to the Present Embodiment >>
Hereinafter, the processing of FIG. 13 to FIG. 13 will be described according to a simple specific example.

（起動前起点での各部データ）
図１７は、本実施形態による具体例の処理における起動前起点での各部データ１７００を示す図である。 (Each part data at the starting point before starting)
FIG. 17 is a diagram showing each part data 1700 at the start point before starting in the processing of the specific example according to the present embodiment.

図１７の総データ一覧は、クライアントＰＣの新たなファイルが２つ（ハッシュ値が“AAAAAAAAAAAAAAAA”の“ａ”と“CCCCCCCCCCCCCCCC”“ｃ”）であり、それぞれ“提案フォルダ”と“構築フォルダ”に含まれていること示す。また、レジスタ値は初期化前のデータであり、ＦＤＩＤ合計レジスタＲ４０の“４”のみがデータ参照サーバＤＢ・Ｄ２０に４つのファイルが登録されていることを示している。また、バッックアップＤＢ・Ｄ１０には、何も登録されていない。また、データ参照ＤＢ・Ｄ２０には、今までデータ参照サーバＮ３００に記憶された４つのファイル（“ｂ”、“ｃ”、“ｄ”、“ｅ”）が、データ参照サーバＮ３００のデータ参照先フォルダパスと共に登録されている。また、バックアップＤＢ・Ｄ１０のカウンタＣ１０は“０”である。 The total data list of FIG. 17 includes two new files of client PCs (“a” and “CCCCCCCCCCCCCCCC” “c” with hash values “AAAAAAAAAAAAAAAA”), and “Proposal folder” and “Construction folder” respectively. Indicates that it is included. The register value is data before initialization, and only “4” in the FDID total register R40 indicates that four files are registered in the data reference server DB • D20. Also, nothing is registered in the backup DB D10. In the data reference DB D20, four files (“b”, “c”, “d”, “e”) stored in the data reference server N300 until now are stored in the data reference server N300. Registered with folder path. The counter C10 of the backup DB · D10 is “0”.

（初期化時点での各部データ）
図１８は、初期化時点（Ｓ７０１）での各部データ１８００を示す図である。 (Data for each part at the time of initialization)
FIG. 18 is a diagram showing each part data 1800 at the time of initialization (S701).

図７のステップＳ７０１における初期化を終えた時点のレジスタ値である。ＢＵＩＤ指数レジスタＲ１０＝１、ＦＤＩＤ指数レジスタＲ３０＝１である。 This is the register value when the initialization in step S701 in FIG. BUID index register R10 = 1 and FDID index register R30 = 1.

（バックアップデータ数の検出時点での各部データ）
図１９は、バックアップデータ数の検出時点（Ｓ７０７）での各部データ１９００を示す図である。 (Data of each part when the number of backup data is detected)
FIG. 19 is a diagram showing each piece of data 1900 at the time of detecting the number of backup data (S707).

図７のステップＳ７０７におけるバックアップＤＢ・Ｄ１０の設定時のデータである。バックアップＤＢ・Ｄ１０には、図１７の総データ一覧のファイル“ａ”と“ｃ”とが複製されている。そして、バックアップＤＢ・Ｄ１０のカウンタＣ１０は“２”に設定される。この状態が、図７のステップＳ７０８の分岐で判定される。本具体例では、カウンタＣ１０は“２”なのでステップＳ７０９の判定では“ＮＯ”となり、ステップＳ７１１に進む。 This is the data at the time of setting the backup DB · D10 in step S707 in FIG. The files “a” and “c” in the total data list of FIG. 17 are duplicated in the backup DB D10. The counter C10 of the backup DB · D10 is set to “2”. This state is determined at the branch of step S708 in FIG. In this specific example, since the counter C10 is “2”, the determination in step S709 is “NO”, and the process proceeds to step S711.

（１番目のバックアップデータの重複データ有無判断中の各部データ）
図２０は、１番目のバックアップデータの重複データ有無判断中（Ｓ８０９の判定）の各部データ２０００を示す図である。 (Each part data in the presence of duplicate data in the first backup data)
FIG. 20 is a diagram showing the data 2000 of each part during the determination of the presence / absence of duplicate data in the first backup data (the determination in S809).

図２０の左上は、図７のステップＳ７１１でカウンタＣ１０の“２”をＢＵＩＤ合計レジスタＲ２０に設定したレジスタ値を示している。図２０の残りの５つのレジスタ値は、最初のファイル“ａ”についてデータ参照サーバＤＢ・Ｄ２０に同じファイルが重複してあるかのファイルＩＤの順の判定時の、レジスタ値である。本具体例では、データ参照サーバＤＢ・Ｄ２０に同じファイル“ａ”は無いので、図８のステップＳ８０９でＦＤＩＤ指数レジスタＲ３０が順にカウントアップされる。 The upper left of FIG. 20 shows a register value in which “2” of the counter C10 is set in the BUID total register R20 in step S711 of FIG. The remaining five register values in FIG. 20 are register values when determining in the order of file IDs whether the same file is duplicated in the data reference server DB • D20 for the first file “a”. In this specific example, since the same file “a” does not exist in the data reference server DB • D20, the FDID index register R30 is sequentially counted up in step S809 in FIG.

図２０の左下のように、４番目のファイル“ｅ”との比較が終わってＦＤＩＤ指数レジスタＲ３０が“５”になり、図８のステップＳ８０３の判定で“ＹＥＳ”と判定し、ステップＳ８１１に進む。ファイル“ａ”がデータ参照サーバＤＢ・Ｄ２０に無く、データ参照サーバＮ３００のデータ参照先フォルダに記憶されていないことが判明する。 As shown in the lower left of FIG. 20, the comparison with the fourth file “e” is completed, the FDID index register R30 becomes “5”, and “YES” is determined in the determination of step S803 in FIG. move on. It turns out that the file “a” does not exist in the data reference server DB · D20 and is not stored in the data reference destination folder of the data reference server N300.

（重複データ無しの場合のデータ参照サーバ書込準備時の各部データ）
図２１は、重複データ無しの場合のデータ参照サーバ書込準備時（Ｓ１００７、Ｓ１００９）の各部データ２１００を示す図である。 (Each data at the time of data reference server writing preparation when there is no duplicate data)
FIG. 21 is a diagram showing each part data 2100 at the time of data reference server writing preparation when there is no duplicate data (S1007, S1009).

図２１のレジスタ値の内、ＦＤＩＤ合計レジスタＲ４０は、図１０のステップＳ１００７で、データ参照サーバＮ３００に記憶されてないファイル“ａ”をデータ参照サーバＤＢ・Ｄ２０に加えるために、カウントアップされる。そして、ステップＳ１００９において、データ参照サーバＤＢ・Ｄ２０の５番目に新たなファイルを追加するための行が、データ参照先フォルダパスと共に準備される。 Of the register values in FIG. 21, the FDID total register R40 is counted up in step S1007 in FIG. 10 to add the file “a” not stored in the data reference server N300 to the data reference server DB · D20. . In step S1009, a line for adding the fifth new file in the data reference server DB • D20 is prepared together with the data reference destination folder path.

（重複データ無しの場合のデータ参照サーバ書込時及びショートカットパス設定時の各部データ）
図２２は、重複データ無しの場合のデータ参照サーバ書込時（Ｓ１１０３、Ｓ１１０５）及びショートカットパス設定時（Ｓ１１０９）の各部データ２２００を示す図である。 (Each part data when writing data reference server and setting shortcut path when there is no duplicate data)
FIG. 22 is a diagram showing each part data 2200 at the time of data reference server writing (S1103, S1105) and shortcut path setting (S1109) when there is no duplicate data.

図１１のステップＳ１１０３において、ファイル“ａ”をデータ参照先フォルダパスのデータ参照先フォルダに記憶する。その後、図２２のデータ参照サーバＤＢ・Ｄ２０に示すように、図１１のステップＳ１１０５において、５番目の行にファイル“ａ”とそのハッシュ値が挿入される。そして、図１１のステップＳ１１０７において、バックアップＤＢ・Ｄ１０およびデータ参照サーバＤＢ・Ｄ２０のファイル“ａ”の情報からショートカットが作成される。作成されたデータ参照先フォルダに記憶されたファイル“ａ”へのショートカットパスは、図１１のステップＳ１１０９において、対応するクライアントＰＣの“提案フォルダ”のファイル“ａ”に置き換えられる。 In step S1103 of FIG. 11, the file “a” is stored in the data reference destination folder of the data reference destination folder path. After that, as shown in the data reference server DB • D20 in FIG. 22, the file “a” and its hash value are inserted in the fifth row in step S1105 in FIG. In step S1107 in FIG. 11, a shortcut is created from the information of the file “a” in the backup DB D10 and the data reference server DB D20. The shortcut path to the file “a” stored in the created data reference destination folder is replaced with the file “a” in the “suggest folder” of the corresponding client PC in step S1109 of FIG.

（１回目の処理済み判定時の各部データ）
図２３は、１回目の処理済み判定時（Ｓ１２０１／Ｓ１２０３）の各部データを示す図である。 (Each part data at the time of the first processed judgment)
FIG. 23 is a diagram illustrating data of each part at the time of the first processing completion determination (S1201 / S1203).

１番目のファイル“ａ”の処理が終了し、図２３のように、図１２のステップＳ１２０１においてＢＵＩＤ指数レジスタＲ１０を“２”として、２番目のファイル“ｃ”が重複データであるかの判定の準備を行なう。次に、ステップＳ１２０３において、ＦＤＩＤ指数レジスタＲ３０を“１”に初期化して、ファイル“ｃ”を５つのファイルが登録されているデータ参照サーバＤＢ・Ｄ２０の最初からの比較の準備をする。図１２のステップＳ１２０７の判定では、ＢＵＩＤ指数レジスタＲ１０＝ＢＵＩＤ合計レジスタＲ２０（ＮＯ）なので、再びステップＳ８０１に戻って、ファイル“ｃ”がデータ参照サーバＤＢ・Ｄ２０に既に登録されているかの判定を始める。 When the processing of the first file “a” is completed, as shown in FIG. 23, the BUID index register R10 is set to “2” in step S1201 of FIG. 12, and it is determined whether the second file “c” is duplicate data. Prepare for. In step S1203, the FDID index register R30 is initialized to “1”, and the file “c” is prepared for comparison from the beginning of the data reference server DB • D20 in which five files are registered. In step S1207 of FIG. 12, BUID index register R10 = BUID total register R20 (NO). Therefore, the process returns to step S801 again to determine whether the file “c” has already been registered in the data reference server DB • D20. start.

（２番目のバックアップデータの重複データ有りの判断時及びショートカットパス設定時の各部データ）
図２４は、２番目のバックアップデータの重複データ有りの判断時（Ｓ８０９）及びショートカットパス設定時（Ｓ９１１）の各部データを示す図である。 (Each part data at the time of judging that there is duplicate data in the second backup data and setting a shortcut path)
FIG. 24 is a diagram illustrating the data of each part when it is determined that duplicate data exists in the second backup data (S809) and when a shortcut path is set (S911).

ファイル“ｃ”がデータ参照サーバＤＢ・Ｄ２０に既に登録されているかの判定で、最初の行のファイルは“ｂ”なので図８のステップＳ８０９でＦＤＩＤ指数レジスタＲ３０が“２”となる。データ参照サーバＤＢ・Ｄ２０に既に登録されている２番目のファイルは“ｃ”でハッシュ値が一致するので、図８のステップＳ８０７で“ＹＥＳ”と判別されてステップＳ８１３に進む。そして、図９のステップＳ９１１において、２つ目のファイル“ｃ”のショートカットが作成される。 By determining whether the file “c” is already registered in the data reference server DB • D20, the file on the first line is “b”, so the FDID index register R30 is set to “2” in step S809 in FIG. The second file already registered in the data reference server DB · D20 is “c” and has the same hash value. Therefore, “YES” is determined in the step S807 in FIG. 8, and the process proceeds to a step S813. In step S911 in FIG. 9, a shortcut for the second file “c” is created.

（２回目の処理済み判定時の各部データ）
図２５は、２回目の処理済み判定時（Ｓ１２０１／Ｓ１２０３）の各部データを示す図である。 (Each part data at the time of the second processed determination)
FIG. 25 is a diagram illustrating each part data at the time of the second processing completion determination (S1201 / S1203).

２番目のファイル“ｃ”の処理が終了し、図２５のように、図１２のステップＳ１２０１においてＢＵＩＤ指数レジスタＲ１０を“３”とする。次に、ステップＳ１２０３において、ＦＤＩＤ指数レジスタＲ３０を“１”に初期化して、３番目のファイルがあれば５つのファイルが登録されているデータ参照サーバＤＢ・Ｄ２０の最初からの比較の準備をする。図１２のステップＳ１２０７の判定では、ＢＵＩＤ指数レジスタＲ１０＞ＢＵＩＤ合計レジスタＲ２０（ＹＥＳ）なのでもう処理してないバックアップデータは無く、ステップＳ１２１１に進んで、終了処理に向かう。 The processing of the second file “c” is completed, and the BUID index register R10 is set to “3” in step S1201 of FIG. 12, as shown in FIG. Next, in step S1203, the FDID index register R30 is initialized to “1”, and if there is a third file, the data reference server DB • D20 in which five files are registered is prepared for comparison from the beginning. . In the determination in step S1207 in FIG. 12, since BUID index register R10> BUID total register R20 (YES), there is no backup data that has not been processed anymore, and the process proceeds to step S1211 and the process ends.

（重複データ処理の終了時点での各部データ）
図２６は、重複データ処理の終了時点（Ｓ１３０７／Ｓ１３０９）での各部データを示す図である。 (Each part data at the end of duplicate data processing)
FIG. 26 is a diagram showing the data of each part at the end of the duplicate data processing (S1307 / S1309).

図１３のステップＳ１３０７で初期化されて、バックアップＤＢ・Ｄ１０は空となり、カウンタＣ１０は“０”となる。そして、ステップＳ１３０９においてはＢＵＩＤ合計レジスタＲ２０もバックアップＤＢ・Ｄ１０の最初の行を指すように初期化されて、重複検索サーバＮ２００の処理が終了する。 Initialized in step S1307 in FIG. 13, the backup DB · D10 becomes empty, and the counter C10 becomes “0”. In step S 1309, the BUID total register R 20 is also initialized to point to the first line of the backup DB · D 10, and the process of the duplicate search server N 200 ends.

本具体例では、ファイル“ａ”と“ｃ”との２つが処理され、まずファイル“ａ”はデータ参照サーバＮ３００に無いので、データ参照サーバＮ３００のデータ参照先フォルダに記憶されて、それへのショートカットパスがクライアントＰＣの提案フォルダに設定された。次にファイル“ｃ”はデータ参照サーバＮ３００に既に有ったので、データ参照サーバＮ３００のデータ参照先フォルダに記憶せず、既に記憶されたファイル“ｃ”へのショートカットパスがクライアントＰＣの提案フォルダに設定された。 In this specific example, two files “a” and “c” are processed. First, since the file “a” does not exist in the data reference server N300, it is stored in the data reference destination folder of the data reference server N300. Is set in the proposal folder of the client PC. Next, since the file “c” already exists in the data reference server N300, the shortcut path to the already stored file “c” is not stored in the data reference destination folder of the data reference server N300, and the suggested folder of the client PC. Was set to

［他の実施形態］
以上、本発明の実施形態について詳述したが、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステム又は装置も、本発明の範疇に含まれる。 [Other Embodiments]
As mentioned above, although embodiment of this invention was explained in full detail, the system or apparatus which combined the separate characteristic contained in each embodiment how was included in the category of this invention.

また、本発明は、複数の機器から構成されるシステムに適用されても良いし、単体の装置に適用されても良い。さらに、本発明は、実施形態の機能を実現する制御プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされる制御プログラム、あるいはその制御プログラムを格納した媒体、その制御プログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、本発明の範疇に含まれる。 Further, the present invention may be applied to a system constituted by a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.

［実施形態の他の表現］
上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 [Other expressions of embodiment]
A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索手段と、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶手段と、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定手段と、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除手段と、
を備えることを特徴とする情報処理装置。
（付記２）
前記検索手段は、当該情報処理装置に接続される全ての情報処理装置がフォルダ内に保持するデータファイルを前記処理対象とするデータファイルとして、前記データ保持手段内に有るか否かを検索し、
前記パス設定手段は、前記処理対象とするデータファイルと同一のデータファイルを含む異なる情報処理装置の異なるフォルダに対して、当該フォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定することを特徴とする付記１に記載の情報処理装置。
（付記３）
当該情報処理装置に接続される全ての情報処理装置がフォルダ内に新たに保持したデータファイルを読み込んで、前記接続される情報処理装置のフォルダへのパスに対応付けて格納する格納手段をさらに備え、
前記記憶手段は、前記格納手段に格納されたデータファイルを読み出して、前記データ保持手段に新たに作成されたフォルダへのパスに対応付けて前記読み出したデータファイルを前記データ保持手段の前記新たに作成されたフォルダに記憶し、
前記パス設定手段は、前記接続される情報処理装置のフォルダへのパスと前記データ保持手段に新たに作成されたフォルダへのパスとから、前記情報処理装置のフォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定し、前記格納手段に格納されたデータファイルを全て削除することを特徴とする付記１又は２に記載の情報処理装置。
（付記４）
前記パス設定手段は、前記検索手段が前記処理対象とするデータファイルと同一のデータファイルを含むフォルダが前記データ保持手段内に有るとした場合に、前記接続される情報処理装置のフォルダへのパスと既に作成されている前記データ保持手段のフォルダへのパスとから、前記情報処理装置のフォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定し、前記格納手段に格納されたデータファイルを全て削除することを特徴とする付記３に記載の情報処理装置。
（付記５）
前記パス設定手段は、
前記情報処理装置のフォルダ内のショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するまでの間、前記接続される情報処理装置のフォルダへのパスを保持するパス保持手段と、
前記データ保持手段のフォルダへのパスを蓄積するパス蓄積手段と、
を備えることを特徴とする付記３又は４に記載の情報処理装置。
（付記６）
フォルダが前記処理対象とするデータファイルと同一のデータファイルを含むか否かは、各データファイルのハッシュ値の比較に基づいて判断することを特徴とする付記１乃至５のいずれか１項に記載の情報処理装置。
（付記７）
前記データファイルはバックアップデータファイルであり、前記フォルダは前記バックアップデータファイルを保持するフォルダであることを特徴とする付記１乃至６のいずれか１項に記載の情報処理装置。
（付記８）
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索ステップと、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶ステップと、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダのショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定ステップと、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除ステップと、
を含むことを特徴とする情報処理装置の制御方法。
（付記９）
処理対象とするデータファイルと同一のデータファイルを含むフォルダが、データを保持するデータ保持手段内に有るか否かを検索する検索ステップと、
同一のデータファイルを含むフォルダが前記データ保持手段内に無い場合は、前記データファイルを含むフォルダを前記データ保持手段に記憶する記憶ステップと、
前記データファイルを保持する全ての情報処理装置のフォルダに対して、当該フォルダのショートカットファイルから前記データ保持手段内の前記データファイルへのパスを設定するパス設定ステップと、
前記情報処理装置のフォルダが保持する前記データファイルを全て削除する削除ステップと、
をコンピュータに実行させることを特徴とする制御プログラム。
（付記１０）
複数のクライアントが生成した異なるフォルダに同じデータファイルを保持することが可能な情報処理システムであって、
前記同じデータファイルを１つのフォルダに保持する保持手段と、
前記複数のクライアントが生成した前記同じデータファイルを保持する全ての異なるフォルダ内のショートカットファイルから、前記保持手段に保持した前記１つのフォルダへのパスを設定するパス設定手段と、
前記全ての異なるフォルダが保持する前記同じデータファイルを全て削除する削除手段と、
を備えることを特徴とする情報処理システム。
（付記１１）
複数のクライアントが生成した異なるフォルダに同じのデータファイルを保持することが可能な情報処理システムにおける重複ファイル排除方法であって、
前記同じデータファイルを１つのフォルダに保持する保持ステップと、
前記複数のクライアントが生成した前記同じデータファイルを保持する全ての異なるフォルダ内のショートカットファイルから、前記保持ステップにおいて保持した前記１つのフォルダへのパスを設定するパス設定ステップと、
前記全ての異なるフォルダが保持する前記同じデータファイルを全て削除する削除ステップと、
を含むことを特徴とする重複ファイル排除方法。 (Appendix 1)
Search means for searching whether or not a folder containing the same data file as the data file to be processed exists in the data holding means for holding data;
If there is no folder containing the same data file in the data holding means, storage means for storing the folder containing the data file in the data holding means;
Path setting means for setting a path from a shortcut file in the folder to the data file in the data holding means for all information processing device folders holding the data file;
Deleting means for deleting all the data files held by the folder of the information processing apparatus;
An information processing apparatus comprising:
(Appendix 2)
The search means searches for whether or not the data file held in a folder by all the information processing apparatuses connected to the information processing apparatus as the processing target data file is in the data holding means,
The path setting means, for different folders of different information processing apparatuses including the same data file as the data file to be processed, a path from the shortcut file in the folder to the data file in the data holding means The information processing apparatus according to appendix 1, wherein:
(Appendix 3)
Storage means for reading a data file newly held in a folder by all the information processing devices connected to the information processing device and storing the data file in association with a path to the folder of the connected information processing device. ,
The storage means reads the data file stored in the storage means, and associates the read data file with the newly created path in the data holding means in association with a path to a folder newly created in the data holding means. Remember it in the created folder,
The path setting means holds the data from the shortcut file in the folder of the information processing apparatus from the path to the folder of the connected information processing apparatus and the path to the folder newly created in the data holding means. The information processing apparatus according to appendix 1 or 2, wherein a path to the data file in the means is set, and all data files stored in the storage means are deleted.
(Appendix 4)
The path setting means, when the search means has a folder containing the same data file as the data file to be processed in the data holding means, the path to the folder of the connected information processing apparatus And a path from the shortcut file in the folder of the information processing device to the data file in the data holding unit from the already created path to the folder of the data holding unit, and stored in the storage unit 4. The information processing apparatus according to appendix 3, wherein all the data files that have been deleted are deleted.
(Appendix 5)
The path setting means includes
A path holding unit that holds a path to the folder of the information processing device to be connected until a path from the shortcut file in the folder of the information processing device to the data file in the data holding unit is set;
Path storage means for storing a path to a folder of the data holding means;
The information processing apparatus according to appendix 3 or 4, further comprising:
(Appendix 6)
6. The method according to any one of appendices 1 to 5, wherein whether or not the folder includes the same data file as the data file to be processed is determined based on a comparison of hash values of the data files. Information processing device.
(Appendix 7)
The information processing apparatus according to any one of appendices 1 to 6, wherein the data file is a backup data file, and the folder is a folder that holds the backup data file.
(Appendix 8)
A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
A method for controlling an information processing apparatus, comprising:
(Appendix 9)
A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
A control program for causing a computer to execute.
(Appendix 10)
An information processing system capable of holding the same data file in different folders generated by a plurality of clients,
Holding means for holding the same data file in one folder;
Path setting means for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding means;
Deleting means for deleting all the same data files held by all the different folders;
An information processing system comprising:
(Appendix 11)
A duplicate file elimination method in an information processing system capable of holding the same data file in different folders generated by a plurality of clients,
A holding step of holding the same data file in one folder;
A path setting step for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding step;
A deletion step of deleting all the same data files held by all the different folders;
A duplicate file elimination method comprising:

Claims

Search means for searching whether or not a folder containing the same data file as the data file to be processed exists in the data holding means for holding data;
If there is no folder containing the same data file in the data holding means, storage means for storing the folder containing the data file in the data holding means;
Path setting means for setting a path from a shortcut file in the folder to the data file in the data holding means for all information processing device folders holding the data file;
Deleting means for deleting all the data files held by the folder of the information processing apparatus;
An information processing apparatus comprising:

The search means searches for whether or not the data file held in a folder by all the information processing apparatuses connected to the information processing apparatus as the processing target data file is in the data holding means,
The path setting means, for different folders of different information processing apparatuses including the same data file as the data file to be processed, a path from the shortcut file in the folder to the data file in the data holding means The information processing apparatus according to claim 1, wherein:

Storage means for reading a data file newly held in a folder by all the information processing devices connected to the information processing device and storing the data file in association with a path to the folder of the connected information processing device. ,
The storage means reads the data file stored in the storage means, and associates the read data file with the newly created path in the data holding means in association with a path to a folder newly created in the data holding means. Remember it in the created folder,
The path setting means holds the data from the shortcut file in the folder of the information processing apparatus from the path to the folder of the connected information processing apparatus and the path to the folder newly created in the data holding means. 3. The information processing apparatus according to claim 1, wherein a path to the data file in the means is set, and all data files stored in the storage means are deleted.

The path setting means, when the search means has a folder containing the same data file as the data file to be processed in the data holding means, the path to the folder of the connected information processing apparatus And a path from the shortcut file in the folder of the information processing device to the data file in the data holding unit from the already created path to the folder of the data holding unit, and stored in the storage unit 4. The information processing apparatus according to claim 3, wherein all the data files that have been deleted are deleted.

The path setting means includes
A path holding unit that holds a path to the folder of the information processing device to be connected until a path from the shortcut file in the folder of the information processing device to the data file in the data holding unit is set;
Path storage means for storing a path to a folder of the data holding means;
The information processing apparatus according to claim 3, further comprising:

The information processing apparatus according to claim 1, wherein the data file is a backup data file, and the folder is a folder that holds the backup data file.

A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
A method for controlling an information processing apparatus, comprising:

A search step for searching whether a folder containing the same data file as the data file to be processed exists in the data holding means for holding the data;
If there is no folder containing the same data file in the data holding means, a storing step of storing the folder containing the data file in the data holding means;
A path setting step for setting a path from a shortcut file of the folder to the data file in the data holding unit for all folders of the information processing apparatus holding the data file;
A deletion step of deleting all the data files held by the folder of the information processing apparatus;
A control program for causing a computer to execute.

An information processing system capable of holding the same data file in different folders generated by a plurality of clients,
Holding means for holding the same data file in one folder;
Path setting means for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding means;
Deleting means for deleting all the same data files held by all the different folders;
An information processing system comprising:

A duplicate file elimination method in an information processing system capable of holding the same data file in different folders generated by a plurality of clients,
A holding step of holding the same data file in one folder;
A path setting step for setting a path from the shortcut files in all the different folders holding the same data file generated by the plurality of clients to the one folder held in the holding step;
A deletion step of deleting all the same data files held by all the different folders;
A duplicate file elimination method comprising: