JP5667702B2

JP5667702B2 - Data distribution management system

Info

Publication number: JP5667702B2
Application number: JP2013541726A
Authority: JP
Inventors: 佐藤　敦; 敦佐藤; 壮一最首
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2011-11-01
Filing date: 2012-10-24
Publication date: 2015-02-12
Anticipated expiration: 2032-10-24
Also published as: JPWO2013065544A1

Description

本発明は、データの保管技術に関し、特に、１つ以上のデータを異なるサーバ等に分散保管するデータ分散管理システムに適用して有効な技術に関するものである。 The present invention relates to a data storage technique, and more particularly to a technique effective when applied to a data distribution management system in which one or more data is distributed and stored in different servers or the like.

近年では、情報セキュリティの観点から、ユーザが利用するＰＣ（Personal Computer）等の情報処理装置において保持や処理されるファイル等のデータの取り扱いが重要視されている。特に、ノート型ＰＣに加えて、ビジネス上での利用が拡がりつつあるいわゆるスマートフォンやタブレット型ＰＣなどの携帯型端末では、これらの端末自体の盗難や紛失等に伴う情報漏洩のリスクを考慮する必要がある。 In recent years, from the viewpoint of information security, handling of data such as files held and processed in an information processing apparatus such as a PC (Personal Computer) used by a user has been regarded as important. In particular, in addition to notebook PCs, portable terminals such as so-called smartphones and tablet PCs that are increasingly used in business need to consider the risk of information leakage due to theft or loss of these terminals themselves. There is.

これに対して、端末内の重要データを含むデータを、セキュリティ対策が施された外部のデータセンターやサーバ等に保管するようないわゆるシンクライアント化等により、端末の紛失等に伴う情報漏洩のリスクを低減することが考えられる。このとき、重要データをそのまま外部のサーバ等に保管するのではなく、例えば、非特許文献１等に記載されているようないわゆる秘密分散の技術を利用して、重要データをそれだけでは意味のない（重要データを復元・推測できない）非重要データに分割し、これら非重要データを外部の複数のサーバ等に分散保管するようにすることも提案されている。これにより、例えば、クラウドコンピューティング環境における仮想データセンターや仮想サーバなどに保管するような場合においても情報漏洩のリスクを低減させることが可能である。 On the other hand, the risk of information leakage due to loss of the terminal due to the so-called thin client that stores data including important data in the terminal in an external data center or server where security measures are taken It is conceivable to reduce. At this time, the important data is not stored in an external server or the like as it is, but for example, the so-called secret sharing technique described in Non-Patent Document 1 or the like is used, and the important data alone is meaningless. It has also been proposed to divide into non-critical data (important data cannot be reconstructed / inferred) and to store these non-critical data in a plurality of external servers. Thereby, for example, the risk of information leakage can be reduced even in the case of storage in a virtual data center or virtual server in a cloud computing environment.

また、秘密分散の技術により重要データを複数のデータに分割した場合、分割データの一部が滅失した場合でも、所定の個数以上の分割データを集めることができれば元の重要データを復元できることから、データの可用性を向上させることもできる。例えば、いわゆる（ｋ，ｎ）閾値型の秘密分散により、重要データからｎ個の分割データを生成した場合、ｋ個以上の分割データを集めることができれば重要データを復元することができる。換言すれば、（ｎ−ｋ）個までの分割データの滅失には耐えることが可能である。このような可用性の高さを利用して、分割データを遠隔地の複数の拠点に分散保管することで元の重要データのバックアップとして利用するということも検討されている。 In addition, when important data is divided into a plurality of data by secret sharing technology, even if a part of the divided data is lost, the original important data can be restored if it can collect a predetermined number of pieces of divided data, Data availability can also be improved. For example, when n pieces of divided data are generated from important data by so-called (k, n) threshold type secret sharing, the important data can be restored if k or more pieces of divided data can be collected. In other words, it is possible to withstand the loss of up to (n−k) pieces of divided data. Utilizing such high availability, it is also considered that the divided data is distributed and stored in a plurality of remote locations to be used as a backup of the original important data.

このように、例えば秘密分散によって生成された分割データなど、一括して取り扱われる複数のデータを他の複数のサーバ等にセキュリティの観点やバックアップの観点等から分散保管する場合、通常は、データの分散元である各ユーザの情報処理装置や、ファイルサーバなどの特定の管理サーバなどが、どのデータをどのサーバ等に保管したかという所在の情報を含む管理情報（以下では「分散管理情報」と記載する場合がある）を保持する。各サーバ等に分散保管された分散データを収集する際には、この分散管理情報を参照することで、必要な分散データがどのサーバ等に保管されているかを特定し、直接対象のサーバ等にアクセスして必要な分散データを収集する。 In this way, for example, when a plurality of data that are handled in a batch, such as divided data generated by secret sharing, are distributed and stored in other servers from the viewpoint of security, backup, etc. Management information (hereinafter referred to as “distributed management information”) including information on where the data is stored in which server by the information processing apparatus of each user who is the distribution source, a specific management server such as a file server, and the like May be included). When collecting distributed data stored in a distributed manner on each server, etc., by referring to this distributed management information, it is possible to identify which server is storing the necessary distributed data, and directly to the target server. Access and collect the necessary distributed data.

例えば、特開２００７−２１３４０５号公報（特許文献１）には、情報管理コンピュータで、割符ファイルを納める割符フォルダＡ、Ｂ、・・と、復元ファイルを納める復元先フォルダと、割符オブジェクトファイルを納める割符オブジェクトフォルダと、復元エンジンプログラムと分割エンジンプログラムを納めた割符エンジンフォルダを備え、割符アプリケーションにそれが読込める範囲であるデコード境界の情報を含む割符パラメータを、割符オブジェクトファイルＡ、Ｂ、・・に割符ファイル名称・格納位置と復元先フォルダのオブジェクト情報を納め、割符ファイルの格納位置とデコード境界に基づいて割符ファイルを直接収集して復元ファイルを生成し、復元先フォルダに格納してオープンすることで、秘密分散法による分散ファイルを効率的に探し出して元データを復元する分散情報ファイル管理手段が記載されている。 For example, Japanese Patent Laid-Open No. 2007-213405 (Patent Document 1) stores tally folders A, B,... For storing tally files, a restoration destination folder for storing restoration files, and a tally object file by an information management computer. A tally object folder, a tally engine folder containing a restoration engine program and a division engine program, and a tally parameter including information on a decoding boundary, which is a range that can be read by the tally application, are set as tally object files A, B,. The tally file name / storage location and the object information of the restoration destination folder are stored in, the tally file is collected directly based on the tally file storage location and the decoding boundary, the restoration file is generated, and the restoration file is stored and opened. By using the secret sharing method Distributed information file management means which locates yl efficiently restore the original data is described.

特開２００７−２１３４０５号公報JP 2007-213405 A

A.Shamir、"How to Share a Secret"、Communications of the ACM、vol.22 no.11 pp.612-613、1979.A. Shamir, "How to Share a Secret", Communications of the ACM, vol.22 no.11 pp.612-613, 1979.

しかしながら、特許文献１などに記載されたような、従来のデータの分散保管の手法では、データの分散元の情報処理装置や、ファイルサーバ等の特定の管理サーバなどが、重要データ（具体的には、重要データに関連する１つ以上の分散データ）に係る分散管理情報を保持することから、セキュリティの観点で課題を有する。すなわち、例えばデータの分散元である携帯型端末等が、重要データに係る分散管理情報を保持している状態で盗難や紛失等にあった場合、第三者に分散管理情報が閲覧されてしまうことで、重要データに関連する分散データの所在に関する情報（分散データを保管する各サーバ等のホスト名やネットワークアドレス、ＵＲＬ（Uniform Resource Locator）等、分散データにアクセスするための情報）を得られてしまうリスクを有する。 However, in the conventional method of distributed storage of data as described in Patent Document 1 and the like, an information processing apparatus that is a data distribution source, a specific management server such as a file server, and the like store important data (specifically, Holds distributed management information related to one or more distributed data related to important data), and thus has a problem in terms of security. That is, for example, if a portable terminal that is a data distribution source holds the distributed management information related to important data and is stolen or lost, the distributed management information is viewed by a third party. As a result, information about the location of distributed data related to important data (information for accessing distributed data such as the host name and network address of each server that stores the distributed data, URL (Uniform Resource Locator), etc.) can be obtained. Have a risk.

また、データの分散元の情報処理装置において分散管理情報を保持する場合、例えば、分散保管する先のサーバ等が障害で使用不可になった等の理由で分散先のサーバ等を変更することが必要となった場合に、データの分散元の各情報処理装置において、新たな保管先のサーバ等の情報により、個別に分散管理情報の内容をそれぞれ書き換える必要が生じる。例えば、分散先のサーバ等を、クラウドコンピューティングサービスによる仮想サーバとするような場合には、仮想サーバがいつ停止されるか不明な状態で運用せざるを得ず、分散先の仮想サーバを変更する都度、分散元の各情報処理端末において分散管理情報の内容を書き換えることは運用負荷が高くなってしまう。 In addition, when the distribution management information is held in the data distribution source information processing apparatus, for example, the distribution destination server may be changed due to a failure in use of the distributed storage destination server or the like. When it becomes necessary, it becomes necessary to individually rewrite the contents of the distribution management information with the information of the new storage destination server or the like in each information processing apparatus of the data distribution source. For example, when the distribution destination server is a virtual server using a cloud computing service, it must be operated in an unknown state when the virtual server is stopped, and the distribution destination virtual server is changed. Each time the information is distributed, rewriting the contents of the distribution management information in each information processing terminal as a distribution source increases the operational load.

また、例えばデータの分散元である携帯型端末等をユーザが紛失等したために、他の情報処理装置を利用して分散データ（分散データに対する元の重要データ）にアクセスしようとする場合や、ユーザが別な事業所や出張先などで通常とは異なる情報処理装置から分散データにアクセスしようとする場合などでは、ユーザの情報処理装置上には対象の重要データ（重要データに対する分散データ）に係る分散管理情報がないことになる。このため、各分散データがどのサーバ等に分散保管されているかを把握することができず、分散データにアクセスすることができなくなってしまい、柔軟性を欠く。 In addition, for example, when a user loses a portable terminal or the like that is a distribution source of data, the user tries to access distributed data (original important data with respect to the distributed data) using another information processing apparatus, When accessing distributed data from an information processing device that is not normal at another business location or business trip destination, etc., the user's information processing device is concerned with the target important data (distributed data for important data). There is no distributed management information. For this reason, it is impossible to grasp on which server each distributed data is distributed and stored, and it becomes impossible to access the distributed data.

そこで本発明の目的は、データの分散元である情報処理装置に分散管理情報を有さず、また、分散データがいずれのサーバ等に保管されているかに影響を受けずにデータの分散保管を行うことを可能とするデータ分散管理システムを提供することにある。本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 Therefore, an object of the present invention is to provide distributed storage of data without having distributed management information in an information processing apparatus that is a data distribution source, and without being affected by which server or the like the distributed data is stored. An object of the present invention is to provide a data distribution management system that can be performed. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.

本発明の代表的な実施の形態によるデータ分散管理システムは、記憶装置を有する複数の情報処理装置と、前記各情報処理装置とネットワークを介して接続され、元データに対応して一括して取り扱われる１つ以上の分散データを前記情報処理装置の前記記憶装置にそれぞれ分散保管するデータ分散装置とを有するデータ分散管理システムであって、以下の特徴を有するものである。 A data distribution management system according to a representative embodiment of the present invention is connected to a plurality of information processing devices having a storage device and the respective information processing devices via a network, and collectively handles corresponding to the original data. A data distribution management system having a data distribution apparatus that distributes and stores one or more distributed data to be stored in the storage device of the information processing apparatus, and has the following characteristics.

すなわち、前記データ分散装置は、前記元データと１つ以上の前記分散データとの対応付けに係る処理を行う分散データ処理部と、前記元データを識別して特定可能とする識別情報を生成し、前記元データに対応する、前記識別情報を含むポインタファイルを生成するポインタファイル処理部と、前記元データに対応する前記識別情報がそれぞれ付加された、前記元データに対応する前記各分散データを、それぞれ異なる前記情報処理装置に送信する分散処理部とを有することを特徴とする。また、前記各情報処理装置は、前記データ分散装置から送信された前記分散データを、前記記憶装置に格納する分散保管部を有することを特徴とする。 That is, the data distribution apparatus generates a distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data, and identification information that can identify and specify the original data. A pointer file processing unit for generating a pointer file including the identification information corresponding to the original data, and each of the distributed data corresponding to the original data to which the identification information corresponding to the original data is added. And a distributed processing unit for transmitting to the different information processing apparatuses. Each of the information processing apparatuses includes a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

本発明の代表的な実施の形態によれば、データの分散元である情報処理装置に分散管理情報を有さず、また、分散データがいずれのサーバ等に保管されているかに影響を受けずにデータの分散保管を行うことが可能となる。 According to the representative embodiment of the present invention, the information processing apparatus that is the data distribution source does not have the distribution management information, and is not affected by which server or the like the distributed data is stored. In addition, data can be distributed and stored.

本発明の実施の形態１であるデータ分散管理システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 1 of this invention. 本発明の実施の形態１におけるポインタファイルおよび分散データに付加される識別情報の内容について例を示した図である。It is the figure which showed the example about the content of the identification information added to the pointer file and shared data in Embodiment 1 of this invention. 本発明の実施の形態１における元データと複数の分散データを対応付けしてこれらを分散保管する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of matching the original data and the some distributed data in Embodiment 1 of this invention, and carrying out these distributed storage. 本発明の実施の形態１における複数の分散データを収集して、これらから元データを得る際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of collecting the some distributed data in Embodiment 1 of this invention, and obtaining original data from these. 本発明の実施の形態１における分散データの使用をロックして元データおよび対応する分散データの取得を制限する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of restricting acquisition of original data and corresponding distributed data by locking use of the distributed data in Embodiment 1 of this invention. 本発明の実施の形態１におけるデータ分散装置上にポインタファイルを有さない場合にこれを復元する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of a process at the time of restoring this, when not having a pointer file on the data distribution apparatus in Embodiment 1 of this invention. 本発明の実施の形態２であるデータ分散管理システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 2 of this invention. 本発明の実施の形態３であるデータ分散管理システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 3 of this invention. 本発明の実施の形態３におけるポインタファイルおよび分散データに付加される識別情報の内容について例を示した図である。It is the figure which showed the example about the content of the identification information added to the pointer file and shared data in Embodiment 3 of this invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

＜実施の形態１＞
本発明の実施の形態１であるデータ分散管理システムは、重要データなどの元データに対応して一括して取り扱われる複数の分散データを他のデータセンターやサーバ等の記憶装置に分散保管するシステムであり、各分散データがいずれのデータセンターやサーバ等に保管されているかという所在に係る情報を含む分散管理情報を有さないものである。本実施の形態では、上記のような分散管理情報の代わりに、データの分散保管を行うデータ分散装置が、各元データを識別する識別情報を生成して保持するとともに、各分散データのヘッダ情報に当該識別情報を付加することで、各分散データが保管されているデータセンターやサーバ等の所在に係る情報を要さずに、必要な分散データの収集を可能とするものである。<Embodiment 1>
The data distribution management system according to the first embodiment of the present invention is a system that distributes and stores a plurality of distributed data that is handled collectively in correspondence with original data such as important data in storage devices such as other data centers and servers. The distributed management information including the information relating to the location of each distributed data stored in which data center or server is not included. In the present embodiment, instead of the above distributed management information, the data distribution apparatus that performs distributed storage of data generates and holds identification information for identifying each original data, and header information of each distributed data By adding the identification information to the ID, it is possible to collect necessary distributed data without requiring information relating to the location of the data center or server where each distributed data is stored.

ここで、元データに対応して一括して取り扱われる１つ以上の分散データとは、対象の元データに対するユーザからの一回の保存や閲覧・参照等の処理要求に対して、まとめて取得や保存、表示などの処理が行われる１つ以上のデータを指す。本実施の形態では、例えば、元データである重要データから秘密分散処理によって生成された複数の分割データをそれぞれ分散データとする場合の例を示しているが、これに限るものではない。 Here, one or more distributed data items that are handled in batches corresponding to the original data are collectively acquired in response to a processing request such as one-time storage, browsing, or reference for the target original data. Or one or more data to be processed such as storage, display, etc. In the present embodiment, for example, a plurality of pieces of divided data generated by secret sharing processing from important data that is original data are shown as distributed data, but the present invention is not limited to this.

例えば、業務アプリケーション等において、ユーザにより作成されたプロジェクトや案件等の管理データに対し、当該業務アプリケーションにより生成された一連の関連ファイル群や、ユーザにより指定された一連の作業ファイル群等を、それぞれ分散データとしてサーバ等に分散保管するようなものであってもよい。なお、元データに対する分散データが１つ（例えば対象の元データそのもの）であってもよい（リモートコピーやバックアップとしての利用形態）。 For example, in a business application etc., a series of related files generated by the business application, a series of work files specified by the user, etc. for management data such as projects and projects created by the user, respectively. Such data may be distributed and stored in a server or the like as distributed data. Note that there may be one distributed data for the original data (for example, the target original data itself) (use form as remote copy or backup).

データ分散装置が各データセンターやサーバ等から必要な分散データを収集する際は、データ分散装置は、元データに係る識別情報の全部もしくは一部を指定して、各データセンターやサーバ等に対して、当該元データに対応する分散データを保持しているか否かを問い合わせるメッセージをブロードキャスト（もしくはマルチキャスト）する。当該メッセージに対して、対象の分散データを保持しているデータセンターやサーバ等が、対象の分散データをデータ分散装置に応答することで、データ分散装置は、各分散データの保管場所に係る分散管理情報を要さずに必要な分散データを収集することができる。 When the data distribution device collects necessary distributed data from each data center, server, etc., the data distribution device specifies all or part of the identification information related to the original data and sends it to each data center, server, etc. Then, a message for inquiring whether or not the distributed data corresponding to the original data is held is broadcast (or multicast). In response to the message, the data center or server holding the target distributed data responds to the data distribution device with the target distributed data. Necessary distributed data can be collected without requiring management information.

これにより、データ分散装置が携帯型端末であるような場合は特に、データ分散装置が盗難や紛失等にあった場合に、分散管理情報が第三者に取得されてしまうことによって、各分散データの保管場所に係る情報が知られてしまい、分散データにアクセス可能となってしまうリスクを回避することができる。また、各分散データがいずれのデータセンターやサーバ等に保管されているかという点に依存せず、容易に各分散データの保管場所を変更することができるため、システムの可用性・柔軟性を向上させることが可能となる。 As a result, especially when the data distribution device is a portable terminal, each distributed data is acquired by the distribution management information being obtained by a third party when the data distribution device is stolen or lost. It is possible to avoid the risk that the information related to the storage location is known and the distributed data can be accessed. In addition, the storage location of each distributed data can be easily changed without depending on which data center or server stores each distributed data, thereby improving system availability and flexibility. It becomes possible.

また、本実施の形態では、データ分散装置上に識別情報を有していなくても、データ分散装置は、各データの識別情報を復元することができる。例えば、ユーザＩＤ等のユーザを識別する情報がユーザにより与えられると、データ分散装置は、当該ユーザに係る識別情報を問い合わせるメッセージをブロードキャスト（もしくはマルチキャスト）する。対象の識別情報を有する分散データを有しているデータセンターやサーバ等が、対象の識別情報をデータ分散装置に応答することで、データ分散装置は、当該ユーザが使用可能な各データに対応する識別情報を取得・復元することができ、この識別情報に基づいて、対応する分散データを収集することが可能となる。 Further, in the present embodiment, the data distribution apparatus can restore the identification information of each data even if the data distribution apparatus does not have identification information. For example, when information identifying a user, such as a user ID, is given by the user, the data distribution apparatus broadcasts (or multicasts) a message for inquiring identification information related to the user. A data center or server having distributed data having target identification information responds to the data distribution apparatus with the target identification information, so that the data distribution apparatus corresponds to each data usable by the user. Identification information can be acquired / restored, and corresponding distributed data can be collected based on this identification information.

これにより、データ分散装置が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置とは異なる情報処理装置を新たにデータ分散装置として利用する場合であっても、容易に識別情報を復元して分散データにアクセスし、業務を継続することが可能となる。 As a result, when an information processing device different from the original data distribution device is newly used as a data distribution device, such as when the data distribution device is stolen or lost, or when using another terminal for business trips, etc. Even so, it is possible to easily restore the identification information, access the distributed data, and continue the business.

［システム構成］
図１は、本発明の実施の形態１であるデータ分散管理システムの構成例について概要を示した図である。データ分散管理システム１は、データ分散装置１００と、１つ以上のサーバ２００とがインターネット等のネットワーク３００を介して互いに接続され通信可能な構成を有する。データ分散装置１００を複数有する構成であってもよい。[System configuration]
FIG. 1 is a diagram showing an outline of a configuration example of a data distribution management system according to the first embodiment of the present invention. The data distribution management system 1 has a configuration in which a data distribution apparatus 100 and one or more servers 200 are connected to each other via a network 300 such as the Internet and can communicate with each other. A configuration having a plurality of data distribution devices 100 may also be possible.

データ分散装置１００は、ＰＣや携帯型端末等の情報処理装置によって構成され、例えば、図示しないＯＳ（Operating System）上で動作するソフトウェアプログラムによって実装される分散データ処理部１１０、ポインタファイル処理部１２０、分散処理部１３０、およびインタフェース部１４０などの各部を有する。また、データ分散装置１００もしくはデータ分散管理システム１によるデータの分散管理サービスを利用することができるユーザに係る情報（例えばアカウント情報）を保持するデータベースやファイル、レジストリ等のデータであるユーザ情報１６０を有する。また、複数の元データ４００にそれぞれ対応して、各サーバ２００に保管されている分散データ４１０を指し示すポインタとしての機能を有するポインタファイル１５０を有する。 The data distribution apparatus 100 is configured by an information processing apparatus such as a PC or a portable terminal. For example, the data distribution apparatus 100 and the pointer file processing unit 120 are implemented by a software program that operates on an OS (Operating System) (not shown). , A distributed processing unit 130, and an interface unit 140. In addition, user information 160 that is data such as a database, a file, and a registry that holds information (for example, account information) related to a user who can use the data distribution management service by the data distribution apparatus 100 or the data distribution management system 1. Have. In addition, a pointer file 150 having a function as a pointer that points to the distributed data 410 stored in each server 200 is provided corresponding to each of the plurality of original data 400.

分散データ処理部１１０は、元データ４００と、これに対応して一括して取り扱われる１つ以上の分散データ４１０との対応付けに係る処理を行う。本実施の形態では、例えば、指定された元データ４００に対して、（ｋ，ｎ）閾値秘密分散法により、分散データ４１０となるｎ個の分割データを生成し、また逆に、指定されたｋ個以上の分散データ４１０を分割データとして、これらから（ｋ，ｎ）閾値秘密分散法により元データ４００を復元する公知の秘密分散ライブラリなどである。 The distributed data processing unit 110 performs processing related to the association between the original data 400 and one or more distributed data 410 handled in a lump in correspondence with the original data 400. In the present embodiment, for example, n pieces of divided data to be distributed data 410 are generated by the (k, n) threshold secret sharing method for the specified original data 400, and conversely, A known secret sharing library that restores the original data 400 by using (k, n) threshold secret sharing method with k or more pieces of shared data 410 as divided data.

なお、上述したように、分散データ４１０は、本実施の形態のように元データ４００から生成される、もしくは元データ４００に基づいて生成されるデータに限らず、元データ４００に関連付けられる既存の複数のデータであってもよい。また、分散データ４１０は１つ（例えば元データ４００そのもの）であってもよい。 As described above, the distributed data 410 is not limited to data generated from the original data 400 or generated based on the original data 400 as in the present embodiment. It may be a plurality of data. Further, the distributed data 410 may be one (for example, the original data 400 itself).

ポインタファイル処理部１２０は、複数の元データ４００にそれぞれ対応して、これに対する分散データ４１０を指し示すポインタとしての機能を有するポインタファイル１５０を生成する。また、後述するインタフェース部１４０を介した、ポインタファイル１５０に対するユーザからの指示に基づいて、元データ４００（もしくは対応する分散データ４１０）に対する処理を行う。 The pointer file processing unit 120 generates a pointer file 150 having a function as a pointer that points to the distributed data 410 corresponding to each of the plurality of original data 400. Further, processing is performed on the original data 400 (or the corresponding distributed data 410) based on an instruction from the user to the pointer file 150 via the interface unit 140 described later.

このポインタファイル１５０は、元データ４００（ひいては対応する分散データ４１０）を指し示す機能を有するが、元データ４００の実体は有しておらず、その内容として、後述するような、元データ４００（ひいては対応する分散データ４１０）を識別して特定可能とする識別情報を有している。すなわち、ポインタファイル１５０は、元データ４００（および対応する分散データ４１０）に対するいわゆるショートカットやシンボリックリンク、エイリアスなどに類似するものである。なお、この識別情報は、分散データ処理部１１０によって生成された各分散データ４１０に対しても、ヘッダ情報等として付加する。 The pointer file 150 has a function of pointing to the original data 400 (and corresponding distributed data 410), but does not have the entity of the original data 400. The contents of the pointer file 150 are as described below. It has identification information that identifies and identifies the corresponding distributed data 410). That is, the pointer file 150 is similar to a so-called shortcut, symbolic link, alias, or the like for the original data 400 (and corresponding distributed data 410). This identification information is also added as header information or the like to each distributed data 410 generated by the distributed data processing unit 110.

ポインタファイル処理部１２０は、この識別情報を生成するため、さらに、識別情報生成部１２１を有する。また、識別情報に含まれる各種ＩＤの値を生成するため、ＩＤ生成部１２２を有する。このＩＤ生成部１２２は、異なる複数のデータ分散装置１００との間でも重複しないユニークなＩＤ（ユニバーサルＩＤ）を生成することができる公知の機能を有するライブラリ等からなる。 The pointer file processing unit 120 further includes an identification information generation unit 121 in order to generate this identification information. In addition, an ID generation unit 122 is provided to generate various ID values included in the identification information. The ID generation unit 122 includes a library having a known function that can generate a unique ID (universal ID) that does not overlap with a plurality of different data distribution apparatuses 100.

分散処理部１３０は、分散データ処理部１１０により元データ４００と対応付けられた分散データ４１０に識別情報を付加し、所定のルールに基づいて各サーバ２００に分散保管する分散部１３１、および元データ４００に対応付けられ分散データ４１０を各サーバ２００から収集する収集部１３２を有する。また、分散データ４１０の保管先となり得るサーバ２００のリストからなるサーバリスト１３３を有していてもよい。 The distributed processing unit 130 adds identification information to the distributed data 410 associated with the original data 400 by the distributed data processing unit 110, and distributes and stores the distributed data in each server 200 based on a predetermined rule, and the original data 400 includes a collection unit 132 that collects the distributed data 410 from each server 200 in association with 400. Further, it may have a server list 133 including a list of servers 200 that can be storage destinations of the distributed data 410.

本実施の形態では、分散部１３１は、例えば、分散データ処理部１１０によって（ｋ，ｎ）閾値秘密分散法により生成され、さらにポインタファイル処理部１２０によって識別情報が付加されたｎ個の分散データ４１０を、サーバリスト１３３から選択したｎ個の異なるサーバ２００に分散保管する。サーバ２００の数がｎ個よりも多い場合は、これらの中から分散データ４１０を保管するｎ個のサーバ２００を、例えばローテーションやランダム抽出などにより選択する。 In the present embodiment, the distribution unit 131 is generated by, for example, the (k, n) threshold secret sharing method by the distributed data processing unit 110, and n pieces of distributed data to which identification information is added by the pointer file processing unit 120. 410 is distributed and stored in n different servers 200 selected from the server list 133. When the number of servers 200 is larger than n, n servers 200 that store the distributed data 410 are selected from among them by, for example, rotation or random extraction.

一方、収集部１３２は、各サーバ２００に対して、元データ４００に対応付けられた分散データ４１０を有しているか否かを問い合わせ、有しているサーバ２００から送信された分散データ４１０収集する。本実施の形態では、例えば、分散データ処理部１１０によって（ｋ，ｎ）閾値秘密分散法により元データ４００を復元するために必要となるｋ個以上の分散データ４１０を収集する。 On the other hand, the collection unit 132 inquires of each server 200 whether or not it has the distributed data 410 associated with the original data 400, and collects the distributed data 410 transmitted from the server 200 that has it. . In this embodiment, for example, the distributed data processing unit 110 collects k or more pieces of distributed data 410 necessary for restoring the original data 400 by the (k, n) threshold secret sharing method.

各サーバ２００への問い合わせに際しては、対象の元データ４００に対応するポインタファイル１５０に含まれる識別情報の全部または一部を含むメッセージを、全てのサーバ２００に対してブロードキャスト（もしくはサーバリスト１３３にリストされている各サーバ２００に対してマルチキャスト）する。なお、ブロードキャスト（マルチキャスト）のプロトコルとしては、公知の技術を適宜利用することができる。 When inquiring each server 200, a message including all or part of the identification information included in the pointer file 150 corresponding to the target original data 400 is broadcast to all the servers 200 (or listed in the server list 133). Multicast to each of the servers 200 being configured. As a broadcast (multicast) protocol, a known technique can be used as appropriate.

インタフェース部１４０は、データ分散装置１００における画面表示等のユーザインタフェースなどの入出力機能を有する。ユーザは、例えば、一般的なＯＳが有するファイル管理用の画面やアプリケーション等を利用して、データ分散管理システム１の機能を利用することができる。 The interface unit 140 has input / output functions such as a user interface such as a screen display in the data distribution apparatus 100. The user can use the functions of the data distribution management system 1 by using, for example, a file management screen or application provided in a general OS.

例えば、ファイル管理用のアプリケーションにおいて元データ４００を特定のフォルダ等にドラッグ＆ドロップなどの簡易な操作により移動する。これをトリガとして、分散データ処理部１１０により分散データ４１０を生成し、これを分散処理部１３０によって各サーバ２００に分散保管する。さらに、ポインタファイル処理部１２０により、当該元データ４００に対応するポインタファイル１５０を生成し、特定のフォルダ等の元データ４００と置き換える。その後は、ユーザからの元データ４００に対する参照等のアクセスは、特定のフォルダ等に配置されたポインタファイル１５０に対して行われる。 For example, in the file management application, the original data 400 is moved to a specific folder or the like by a simple operation such as drag and drop. With this as a trigger, the distributed data processing unit 110 generates the distributed data 410, and the distributed processing unit 130 stores the distributed data in the servers 200 in a distributed manner. Further, the pointer file processing unit 120 generates a pointer file 150 corresponding to the original data 400 and replaces the original data 400 such as a specific folder. Thereafter, access such as reference to the original data 400 from the user is performed on the pointer file 150 arranged in a specific folder or the like.

ユーザが、特定のフォルダ等においてポインタファイル１５０に対して参照等の指示を行うと、ポインタファイル処理部１２０により、当該ポインタファイル１５０によって特定される元データ４００に対応付けられた分散データ４１０を、分散処理部１３０によって各サーバ２００から収集する。さらに、本実施の形態のように必要な場合には、収集した分散データ４１０から分散データ処理部１１０により元データ４００を復元する。その後、元データ４００もしくは分散データ４１０を関連するアプリケーションプログラム等により表示等する。これにより、ユーザに対して、あたかも元データ４００に対して保存・参照等の処理を行っているのと同等のインタフェースを提供し、分散データ４１０に係る処理を隠蔽することができる。 When the user gives an instruction to refer to the pointer file 150 in a specific folder or the like, the pointer file processing unit 120 causes the distributed data 410 associated with the original data 400 specified by the pointer file 150 to be Collected from each server 200 by the distributed processing unit 130. Further, when necessary as in the present embodiment, the original data 400 is restored from the collected distributed data 410 by the distributed data processing unit 110. Thereafter, the original data 400 or the distributed data 410 is displayed by a related application program or the like. Thereby, it is possible to provide the user with an interface equivalent to processing such as storage / reference for the original data 400, and to conceal the processing related to the distributed data 410.

サーバ２００は、データ分散装置１００から送信された分散データ４１０を格納することができる図示しないＨＤＤ（Hard Disk Drive）等の記憶装置を有する情報処理装置であり、例えば、ファイルサーバや、ストレージサーバなどにより構成される。また、これらの情報処理装置を有するデータセンターであってもよい。また、クラウドコンピューティングサービスによる仮想サーバや仮想データセンター等であってもよい。 The server 200 is an information processing apparatus having a storage device such as an HDD (Hard Disk Drive) (not shown) that can store the distributed data 410 transmitted from the data distribution device 100, such as a file server or a storage server. Consists of. Moreover, the data center which has these information processing apparatuses may be sufficient. Further, it may be a virtual server or a virtual data center by a cloud computing service.

サーバ２００は、例えば、図示しないＯＳ上で動作するソフトウェアプログラムによって実装される分散保管部２１０を有する。分散保管部２１０は、データ分散装置１００から送信された分散データ４１０を記憶装置に格納する。また、データ分散装置１００からのブロードキャスト（もしくはマルチキャスト）メッセージに対して、メッセージに含まれる識別情報に合致する識別情報をヘッダ等に含む分散データ４１０を検索し、該当する分散データ４１０を有する場合は、当該分散データ４１０もしくはそのヘッダ等に含まれる識別情報をデータ分散装置１００に応答する。 The server 200 includes, for example, a distributed storage unit 210 that is implemented by a software program that runs on an OS (not shown). The distributed storage unit 210 stores the distributed data 410 transmitted from the data distribution apparatus 100 in a storage device. Further, in response to a broadcast (or multicast) message from the data distribution apparatus 100, when the distributed data 410 including the identification information matching the identification information included in the message is searched for and the corresponding distributed data 410 is included. The identification information contained in the distributed data 410 or its header is returned to the data distribution apparatus 100.

図２は、ポインタファイル処理部１２０の識別情報生成部１２１により生成され、ポインタファイル１５０および分散データ４１０に付加される識別情報の内容について例を示した図である。識別情報１７０は、例えば、オリジナルファイルＩＤ（ＦＩＤ）１７１、カレントファイルＩＤ（ＦＩＤ）１７２、およびユーザＩＤ１７３などの情報を有する。オリジナルＦＩＤ１７１は、各バージョン（世代）を含む元データ４００（元データ４００からなるファイル）全体を一意に識別するＩＤである。このオリジナルＦＩＤ１７１は、元データ４００を最初に分散保管する際、すなわち、元データ４００から最初に分散データ４１０を生成等して、各分散データ４１０を各サーバ２００に分散保管する際に、当該元データ４００およびこれに対応する分散データ４１０を識別するために割り当てられる。 FIG. 2 is a diagram illustrating an example of the contents of identification information generated by the identification information generation unit 121 of the pointer file processing unit 120 and added to the pointer file 150 and the distributed data 410. The identification information 170 includes information such as an original file ID (FID) 171, a current file ID (FID) 172, and a user ID 173, for example. The original FID 171 is an ID for uniquely identifying the entire original data 400 (a file made up of the original data 400) including each version (generation). The original FID 171 is used when the original data 400 is first distributed and stored, that is, when the distributed data 410 is first generated from the original data 400 and distributed and stored in each server 200. Assigned to identify data 400 and corresponding distributed data 410.

カレントＦＩＤ１７２は、各バージョン（世代）の元データ４００（元データ４００からなるファイル）をそれぞれ一意に識別するＩＤである。このカレントＦＩＤ１７２は、元データ４００を最初に分散保管して以降、当該元データに対して編集や更新を行った際に、最新のバージョン（世代）の元データ４００に対して割り当てられるＩＤである。すなわち、当初はカレントＦＩＤ１７２の値はオリジナルＦＩＤ１７１の値と同じであり、その後、元データ４００を編集等するために必要な分散データ４１０を収集し、編集後の最新の元データ４００に対して再度分散データ４１０を対応付けして、各分散データ４１０を各サーバ２００に分散保管する毎に割り当てられるＩＤである。なお、オリジナルＦＩＤ１７１の値は、最初に割り当てられた値のまま更新されないものとする。 The current FID 172 is an ID for uniquely identifying each version (generation) of original data 400 (a file including the original data 400). The current FID 172 is an ID assigned to the latest version (generation) of original data 400 when the original data 400 is first distributedly stored and then edited or updated. . That is, initially, the value of the current FID 172 is the same as the value of the original FID 171, and thereafter, the distributed data 410 necessary for editing the original data 400 is collected, and the latest original data 400 after editing is again collected. The ID is assigned every time the distributed data 410 is associated with the distributed data 410 and distributedly stored in each server 200. It is assumed that the value of original FID 171 is not updated as it was initially assigned.

従って、カレントＦＩＤ１７２は、最新の元データ４００および対応する分散データ４１０を特定するためのＩＤであるだけでなく、当該元データ４００のバージョン情報としての役割を有する。すなわち、各サーバ２００の分散保管部２１０において、編集後の最新の元データ４００に対する分散データ４１０を格納する際に、当該元データ４００の以前のバージョンに対する分散データ４１０（最新のものとはヘッダ等に含まれる識別情報１７０のカレントＦＩＤ１７２が異なる）を履歴として残しておく。これにより、各サーバ２００において複数バージョンの元データ４００に対する分散データ４１０をそれぞれ保管することになるため、ユーザに指定されたバージョンの元データ４００および対応する分散データ４１０を得ることが可能となる。 Therefore, the current FID 172 is not only an ID for specifying the latest original data 400 and the corresponding distributed data 410, but also has a role as version information of the original data 400. That is, when the distributed storage unit 210 of each server 200 stores the distributed data 410 for the latest original data 400 after editing, the distributed data 410 for the previous version of the original data 400 (the latest is a header or the like) (The current FID 172 of the identification information 170 included in is different) is left as a history. As a result, each server 200 stores the distributed data 410 corresponding to a plurality of versions of the original data 400, so that the version of the original data 400 designated by the user and the corresponding distributed data 410 can be obtained.

なお、カレントＦＩＤ１７２は異なるがオリジナルＦＩＤ１７１が同じである複数の分散データ４１０は、それぞれ、同一の元データ４００の異なるバージョンのものであると判断することができる。 A plurality of distributed data 410 having different current FIDs 172 but the same original FIDs 171 can be determined to be of different versions of the same original data 400.

ユーザＩＤ１７３は、当該識別情報１７０に対応するユーザ、すなわち、当該識別情報１７０に対応する元データ４００を作成・編集等したユーザを特定するＩＤである。このＩＤの情報は、例えば、ユーザ情報１６０に登録されている各ユーザのＩＤの情報と対応させることができる。 The user ID 173 is an ID that identifies a user corresponding to the identification information 170, that is, a user who created / edited the original data 400 corresponding to the identification information 170. This ID information can be associated with the ID information of each user registered in the user information 160, for example.

なお、識別情報１７０の各ＩＤは、それぞれ、データ分散管理システム１内で重複しないユニークなＩＤである必要がある。従って、これらのＩＤは、例えば、ポインタファイル処理部１２０のＩＤ生成部１２２によって生成されたＩＤ（ユニバーサルＩＤ）とすることができる。なお、ユーザＩＤ１７３については、例えば、ユーザ情報１６０に格納された各ユーザのアカウント情報におけるユーザのＩＤを利用してもよいし、これに、部署や企業等のユーザが属する組織やグループ、データ分散管理システム１によって提供されるデータ管理サービスの契約単位などを識別する情報を付加することで、データ分散管理システム１内でユニークなＩＤとなるようにしてもよい。 Each ID of the identification information 170 needs to be a unique ID that does not overlap in the data distribution management system 1. Accordingly, these IDs can be IDs (universal IDs) generated by the ID generation unit 122 of the pointer file processing unit 120, for example. As the user ID 173, for example, the user ID in the account information of each user stored in the user information 160 may be used, and to this, an organization or group to which a user such as a department or a company belongs, and data distribution By adding information for identifying a contract unit of the data management service provided by the management system 1, the ID may be unique within the data distribution management system 1.

［処理フロー（分散保管）］
図３は、元データ４００と複数の分散データ４１０を対応付けしてこれらを分散保管する際の処理の例について概要を示した図である。データ分散装置１００において、インタフェース部１４０を介してユーザから元データ４００の保存の指示を受けると、まず、分散データ処理部１１０によって、元データ４００から１つ以上の分散データ４１０を生成する（Ｓ０１）。本実施の形態では、上述したように例えば、元データ４００から（ｋ，ｎ）閾値秘密分散法により、ｋ個以上集めなければ元データ４００を復元することができないｎ個の分散データ４１０を生成する。これにより、元データ４００とｎ個の分散データ４１０が対応付けられることになる。[Processing flow (distributed storage)]
FIG. 3 is a diagram showing an outline of an example of processing when the original data 400 and a plurality of distributed data 410 are associated and stored in a distributed manner. In the data distribution apparatus 100, when receiving an instruction to save the original data 400 from the user via the interface unit 140, first, the distributed data processing unit 110 generates one or more distributed data 410 from the original data 400 (S01). ). In the present embodiment, as described above, for example, n pieces of distributed data 410 that cannot be restored unless the k is collected from the original data 400 by the (k, n) threshold secret sharing method are generated. To do. As a result, the original data 400 and the n distributed data 410 are associated with each other.

次に、ポインタファイル処理部１２０によって、当該元データ４００に対する識別情報１７０を生成し（Ｓ０２）、さらに当該識別情報１７０を含むポインタファイル１５０を生成する（Ｓ０３）。ここでは、上述したように例えばＩＤ生成部１２２等によって、識別情報１７０における各ＩＤの情報を生成し、識別情報生成部１２１によって、これら各ＩＤからなる識別情報１７０を生成する。さらに、ポインタファイル処理部１２０が、当該識別情報１７０の内容を含むポインタファイル１５０を生成する。このとき例えば、ポインタファイル１５０のファイル名（拡張子除く）を元データ４００と同じファイル名とする等により、ユーザが元データ４００に対応するポインタファイル１５０を容易に識別できるようにする。 Next, the pointer file processing unit 120 generates identification information 170 for the original data 400 (S02), and further generates a pointer file 150 including the identification information 170 (S03). Here, as described above, for example, the ID generation unit 122 or the like generates information of each ID in the identification information 170, and the identification information generation unit 121 generates the identification information 170 including these IDs. Further, the pointer file processing unit 120 generates a pointer file 150 including the contents of the identification information 170. At this time, for example, by making the file name (excluding the extension) of the pointer file 150 the same as the original data 400, the user can easily identify the pointer file 150 corresponding to the original data 400.

なお、当該元データ４００が過去に既に分散保管されており、対応するポインタファイル１５０および識別情報１７０を既に有しているものである場合（すなわち、当該元データ４００に対して編集を行った後に再度分散保管を行う場合）には、ステップＳ０２において既存の識別情報１７０内のカレントＦＩＤ１７２のみを新たに生成して更新する（オリジナルＦＩＤ１７１は更新せずにそのままとする）ようにしてもよい。このとき、更新した最新のカレントＦＩＤ１７２の内容と合わせて、既存のカレントＦＩＤ１７２の内容を、過去のバージョン履歴として残すようにしてもよい。 When the original data 400 is already distributed and stored in the past and already has the corresponding pointer file 150 and identification information 170 (that is, after editing the original data 400) In the case where distributed storage is performed again), only the current FID 172 in the existing identification information 170 may be newly generated and updated in step S02 (the original FID 171 is not updated and is left as it is). At this time, the contents of the existing current FID 172 may be left as past version history together with the updated latest FID 172 contents.

次に、ステップＳ０１で生成した各分散データ４１０のヘッダ等に、ステップＳ０２で生成もしくは更新した識別情報１７０の内容を付加もしくは更新した上で、分散処理部１３０の分散部１３１により、各分散データ４１０をそれぞれ異なる複数のサーバ２００（図３の例ではサーバＡ（２００ａ）とサーバＢ（２００ｂ））に分散保管のため送信する（Ｓ０４）。複数のサーバ２００の選択は、上述したように、例えば、サーバリスト１３３に登録されたサーバ２００からローテーションやランダム抽出などにより選択する。本実施の形態では、分散データ処理部１１０によって生成されたｎ個の分散データ４１０を保管するｎ個のサーバ２００を選択する。このとき、各サーバ２００に対して分散データ４１０の保管が可能か否かを問い合わせる処理を行ってもよい。 Next, after adding or updating the contents of the identification information 170 generated or updated in step S02 to the header or the like of each distributed data 410 generated in step S01, each distributed data is processed by the distribution unit 131 of the distributed processing unit 130. 410 is transmitted to a plurality of different servers 200 (server A (200a) and server B (200b) in the example of FIG. 3) for distributed storage (S04). As described above, the plurality of servers 200 are selected from the servers 200 registered in the server list 133 by rotation or random extraction, for example. In the present embodiment, n servers 200 that store the n distributed data 410 generated by the distributed data processing unit 110 are selected. At this time, a process of inquiring each server 200 as to whether or not the distributed data 410 can be stored may be performed.

分散データ４１０を受信した各サーバ２００では、それぞれ、分散保管部２１０により記憶装置に分散データ４１０を格納する（Ｓ０５）。このとき、過去のバージョンの元データ４００に対応する分散データ４１０が存在する場合は、これを残した上で格納するようにしてもよい。この場合、さらに、過去のバージョンの元データ４００に対応する分散データ４１０を削除して整理し（Ｓ０６）、一連の処理結果をデータ分散装置１００に応答する。 In each server 200 that receives the distributed data 410, the distributed storage unit 210 stores the distributed data 410 in the storage device (S05). At this time, if the distributed data 410 corresponding to the past version of the original data 400 exists, the distributed data 410 may be left and stored. In this case, the distributed data 410 corresponding to the past version of the original data 400 is further deleted and organized (S06), and a series of processing results are returned to the data distribution apparatus 100.

ステップＳ０６では、例えば、分散保管部２１０により、新たに格納する最新の分散データ４１０のヘッダ等に含まれる識別情報１７０のオリジナルＦＩＤ１７１と同じオリジナルＦＩＤ１７１を含む識別情報１７０を有する分散データ４１０（すなわち、同一の元データ４００の異なるバージョンに対応する分散データ４１０）を検索する。検索された分散データ４１０の数が所定の数（保管可能な世代数）よりも多い場合は、最古の分散データ４１０から順に所定の世代数になるまで削除する。なお、分散データ４１０の新旧は、例えば、分散データ４１０からなるファイルに付されたタイムスタンプ等により把握することができる。 In step S06, for example, the distributed storage unit 210 uses the distributed data 410 having the identification information 170 including the original FID 171 identical to the original FID 171 of the identification information 170 included in the header of the latest distributed data 410 to be newly stored (that is, The distributed data 410) corresponding to different versions of the same original data 400 is searched. If the number of retrieved distributed data 410 is greater than a predetermined number (number of storable generations), the oldest distributed data 410 is deleted in order from the oldest distributed data 410 until the predetermined number of generations are reached. In addition, the new and old of the distributed data 410 can be grasped by, for example, a time stamp attached to a file including the distributed data 410.

ステップＳ０６での古い分散データ４１０の削除処理は、上述したように、ステップＳ０５での分散データ４１０の格納の都度行うようにしてもよいし、各サーバ２００において所定の時刻に定期的に起動されるバッチプログラム等により、全ての分散データ４１０に対して一括して行うようにしてもよい。なお、後述するＩＤのロックの手順と同様の手順により、分散データ４１０の特定のバージョン（すなわち、特定のカレントＦＩＤ１７２を含む識別情報１７０を有する分散データ４１０）については、削除されないようロックすることも可能である。 As described above, the deletion processing of the old distributed data 410 in step S06 may be performed each time the distributed data 410 is stored in step S05, or is periodically started at each server 200 at a predetermined time. Alternatively, all distributed data 410 may be collectively processed by a batch program or the like. A specific version of the distributed data 410 (that is, the distributed data 410 having the identification information 170 including the specific current FID 172) may be locked so as not to be deleted by a procedure similar to the ID locking procedure described later. Is possible.

各サーバ２００での分散保管が完了すると、データ分散装置１００は、分散部１３１により、分散保管処理が正常に完了したか否かを判定する（Ｓ０７）。例えば、本実施の形態では、ｎ個の分散データ４１０をｎ個のサーバ２００に正常に保管できたか否かを判定する。正常に保管できなかった分散データ４１０がある場合は、別なサーバ２００を選択して全ての分散データ４１０が保管できるまでステップＳ０４〜Ｓ０６の処理を再試行するようにしてもよい。また、保管が可能なサーバ２００がもはや存在しなくなった場合は、分散保管処理をエラーとして終了させるようにしてもよい。なお、このとき、既に行った処理をロールバックするようにしてもよい。 When the distributed storage in each server 200 is completed, the data distribution apparatus 100 determines whether the distributed storage processing has been normally completed by the distribution unit 131 (S07). For example, in the present embodiment, it is determined whether n pieces of distributed data 410 have been normally stored in n servers 200. If there is distributed data 410 that could not be stored normally, another server 200 may be selected and the processes in steps S04 to S06 may be retried until all the distributed data 410 can be stored. Further, when there is no longer a server 200 that can be stored, the distributed storage process may be terminated as an error. At this time, the processing already performed may be rolled back.

分散保管処理が正常に完了すると、データ分散装置１００は、データ分散装置１００上に保持する元データ４００および生成された分散データ４１０を削除し（Ｓ０８）、処理を終了する。データ分散装置１００上のこれらのデータを削除することで、データ分散装置１００自体の盗難や紛失等に対して、元データ４００（および対応する分散データ４１０）が漏洩することを回避することが可能となる。 When the distributed storage process is normally completed, the data distribution apparatus 100 deletes the original data 400 and the generated distributed data 410 held on the data distribution apparatus 100 (S08), and ends the process. By deleting these data on the data distribution apparatus 100, it is possible to avoid the leakage of the original data 400 (and corresponding distribution data 410) for theft or loss of the data distribution apparatus 100 itself. It becomes.

また、データ分散装置１００上に保持するポインタファイル１５０には、元データ４００（および対応する分散データ４１０）を識別するファイルＩＤの情報しか有さず、データの内容自体に係る情報や、データが実際に保管されているサーバ２００に係る情報を有していない。従って、ポインタファイル１５０の内容を第三者が知った場合でも、分散データ４１０を収集することはできず、元データ４００を復元する（元データ４００に係る情報を得る）ことはできない。 Further, the pointer file 150 held on the data distribution apparatus 100 has only file ID information for identifying the original data 400 (and corresponding distribution data 410), and information and data related to the data content itself are included. It does not have information related to the server 200 that is actually stored. Therefore, even if a third party knows the contents of the pointer file 150, the distributed data 410 cannot be collected, and the original data 400 cannot be restored (information related to the original data 400 can be obtained).

なお、本実施の形態では、上記のようなセキュリティの観点を考慮して元データ４００および分散データ４１０をデータ分散装置１００から削除するものとしているが、データ分散装置１００上の元データ４００に対するバックアップとして当該分散保管サービスを利用する場合は、元データ４００を削除せずに残しておいてもよい。 In the present embodiment, the original data 400 and the distributed data 410 are deleted from the data distribution apparatus 100 in consideration of the security viewpoint as described above. However, the backup of the original data 400 on the data distribution apparatus 100 is used. When the distributed storage service is used, the original data 400 may be left without being deleted.

［処理フロー（元データ取得）］
図４は、複数の分散データ４１０を収集して、これらから元データ４００を得る際の処理の例について概要を示した図である。データ分散装置１００において、インタフェース部１４０を介したユーザによるポインタファイル１５０への操作によって、元データ４００の参照（編集のための参照含む）の指示を受けると、まず、ポインタファイル処理部１２０により、当該ポインタファイル１５０に含まれる識別情報１７０の内容を取得する（Ｓ１１）。次に、識別情報１７０内のカレントＦＩＤ１７２の情報に基づいて、分散処理部１３０の収集部１３２により、各サーバ２００に対して対応する分散データ４１０を保持しているかを問い合わせる（Ｓ１２）。[Processing flow (original data acquisition)]
FIG. 4 is a diagram showing an outline of an example of processing when collecting a plurality of distributed data 410 and obtaining original data 400 from these. In the data distribution apparatus 100, when an instruction to refer to the original data 400 (including reference for editing) is received by an operation on the pointer file 150 by the user via the interface unit 140, first, the pointer file processing unit 120 The contents of the identification information 170 included in the pointer file 150 are acquired (S11). Next, based on the information of the current FID 172 in the identification information 170, the collection unit 132 of the distributed processing unit 130 inquires each server 200 whether the corresponding distributed data 410 is held (S12).

具体的には、上述したように、例えば、カレントＦＩＤ１７２の情報を含む分散データ４１０の問い合わせメッセージを各サーバ２００に対してブロードキャストする。サーバ２００の数が多い場合は、サーバリスト１３３にリストされているサーバ２００に対してマルチキャストするようにして、ネットワーク３００に対する負荷を低減するようにしてもよい。 Specifically, as described above, for example, an inquiry message of the distributed data 410 including information on the current FID 172 is broadcast to each server 200. When the number of servers 200 is large, the load on the network 300 may be reduced by multicasting the servers 200 listed in the server list 133.

問い合わせのブロードキャストメッセージを受信した各サーバ２００では、分散保管部２１０により、メッセージに含まれるカレントＦＩＤ１７２の情報を取得し、当該カレントＦＩＤ１７２に対応する分散データ４１０を検索する（Ｓ１３）。具体的には、メッセージに含まれるカレントＦＩＤ１７２と合致するカレントＦＩＤ１７２を含む識別情報１７０をヘッダ等に有する分散データ４１０を検索する。該当する分散データ４１０を保管していない場合（例えば、図４のサーバＢ（２００ｂ））は、その旨をデータ分散装置１００に応答する。 In each server 200 that has received the inquiry broadcast message, the distributed storage unit 210 acquires information on the current FID 172 included in the message, and searches the distributed data 410 corresponding to the current FID 172 (S13). Specifically, the distributed data 410 having the identification information 170 including the current FID 172 that matches the current FID 172 included in the message in the header or the like is searched. When the corresponding distributed data 410 is not stored (for example, the server B (200b) in FIG. 4), a response to that effect is sent to the data distribution apparatus 100.

一方、該当する分散データ４１０を保管している場合（例えば、図４のサーバＡ（２００ａ））は、当該分散データ４１０のヘッダ等に含まれる識別情報１７０がロックされているか否かを確認する（Ｓ１４）。具体的には、対象の識別情報１７０内の各ＩＤ（オリジナルＦＩＤ１７１、カレントＦＩＤ１７２もしくはユーザＩＤ１７３）の値が、サーバ２００に保持する図示しないロックリストに登録されているか否かを確認する。登録されている場合には、対象の分散データ４１０については使用がロックされていることから、その旨をデータ分散装置１００に応答する。登録されていない場合は、対象の分散データ４１０をデータ分散装置１００に対して送信する（Ｓ１５）。なお、ロックリストへのＩＤの登録については後述する。 On the other hand, when the corresponding distributed data 410 is stored (for example, server A (200a) in FIG. 4), it is confirmed whether or not the identification information 170 included in the header of the distributed data 410 is locked. (S14). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172 or user ID 173) in the target identification information 170 is registered in a lock list (not shown) held in the server 200. If registered, since the use of the target distributed data 410 is locked, a response to that effect is sent to the data distribution apparatus 100. If not registered, the target distributed data 410 is transmitted to the data distribution apparatus 100 (S15). Registration of IDs in the lock list will be described later.

各サーバ２００でのブロードキャストメッセージに対する処理が完了すると、データ分散装置１００は、収集部１３２により、収集した分散データ４１０（各サーバ２００から送信された分散データ４１０）により元データ４００の取得が可能であるか否かを判定する（Ｓ１６）。例えば、本実施の形態では、元データ４００を復元可能なｋ個以上の分散データ４１０を収集することができたか否かを判定する。元データ４００を取得（復元）できない場合、すなわち、本実施の形態では収集できた分散データ４１０がｋ個未満であった場合は、元データ４００の取得処理をエラーとして終了させるようにしてもよい。 When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 can acquire the original data 400 from the collected distributed data 410 (distributed data 410 transmitted from each server 200) by the collection unit 132. It is determined whether or not there is (S16). For example, in this embodiment, it is determined whether or not k or more pieces of distributed data 410 that can restore the original data 400 have been collected. When the original data 400 cannot be acquired (restored), that is, when there are less than k pieces of distributed data 410 that can be collected in the present embodiment, the acquisition process of the original data 400 may be terminated as an error. .

ステップＳ１６で元データ４００の取得が可能であると判定した場合は、収集した分散データ４１０から分散データ処理部１１０によって元データ４００を取得（復元）し（Ｓ１７）、処理を終了する。本実施の形態では、収集したｋ個以上の分散データ４１０から（ｋ，ｎ）閾値秘密分散法により元データ４００を復元する。なおこのとき、復元した元データ４００の種別に応じて、これに関連付けられたアプリケーションプログラムを起動し、復元した元データ４００を表示させるようにしてもよい。 If it is determined in step S16 that the original data 400 can be acquired, the original data 400 is acquired (restored) by the distributed data processing unit 110 from the collected distributed data 410 (S17), and the process ends. In the present embodiment, the original data 400 is restored from the collected k or more pieces of distributed data 410 by the (k, n) threshold secret sharing method. At this time, according to the type of the restored original data 400, an application program associated therewith may be activated to display the restored original data 400.

このように、ユーザは、インタフェース部１４０を介してポインタファイル１５０に対して元データ４００に対するものと同様の処理を行うことで、データ分散装置１００が必要な分散データ４１０を収集して元データ４００を取得（復元）した上で表示等することができるため、分散データ４１０が複数のサーバ２００に分散保管されていることを意識することなく、シームレスに元データ４００（もしくは対応する分散データ４１０）に対するアクセスを行うことが可能である。また、データ分散装置１００にとっても、各分散データ４１０がどのサーバ２００に保管されているかという情報を保持することなく、必要な分散データ４１０を収集することができる。 In this way, the user performs the same processing as that for the original data 400 on the pointer file 150 via the interface unit 140, whereby the data distribution apparatus 100 collects the necessary distributed data 410 and collects the original data 400. Since the distributed data 410 is distributed and stored in the plurality of servers 200, the original data 400 (or the corresponding distributed data 410) can be seamlessly obtained. Can be accessed. The data distribution apparatus 100 can also collect the necessary distributed data 410 without retaining information on which server 200 each distributed data 410 is stored in.

なお、上記の図４の例では、識別情報１７０内のカレントＦＩＤ１７２の情報に基づいて、各サーバ２００に対して分散データ４１０を保持しているかを問い合わせているが、識別情報１７０内の他のＩＤ情報を用いて問い合わせを行うようにしてもよい。例えば、ユーザの指示に基づいて、オリジナルＦＩＤ１７１を指定して問い合わせることで、異なるバージョン（カレントＦＩＤ１７２）の複数の元データ４００に対応する分散データ４１０を収集することができる。また、ユーザＩＤ１７３を指定して問い合わせることで、対応するユーザが作成・編集した元データ４００に対応する分散データ４１０を全て収集することができる。 In the example of FIG. 4 described above, each server 200 is inquired as to whether the distributed data 410 is held based on the information of the current FID 172 in the identification information 170. The inquiry may be made using the ID information. For example, it is possible to collect the distributed data 410 corresponding to a plurality of original data 400 of different versions (current FID 172) by inquiring by specifying the original FID 171 based on a user instruction. In addition, by designating and inquiring the user ID 173, all the distributed data 410 corresponding to the original data 400 created and edited by the corresponding user can be collected.

［処理フロー（ＩＤロック）］
本実施の形態では、例えば、データ分散装置１００である携帯型端末の盗難や紛失等などに際して、上述したように、データ分散装置１００に元データ４００を保持せず、また、各分散データ４１０の保管場所（サーバ２００）に係る情報を含む分散管理情報も有さないことから、元データ４００の漏洩のリスクを低減することができる。[Processing flow (ID lock)]
In the present embodiment, for example, when the portable terminal that is the data distribution apparatus 100 is stolen or lost, the data distribution apparatus 100 does not hold the original data 400 as described above. Since there is no distributed management information including information related to the storage location (server 200), the risk of leakage of the original data 400 can be reduced.

しかしながら、元データ４００（および対応する分散データ４１０）についてのファイルＩＤや、ユーザＩＤの情報を含む識別情報１７０を有するポインタファイル１５０はデータ分散装置１００上に存在するため、第三者に参照され得る。そこで、本実施の形態では、データ分散装置１００の盗難や紛失等の際に、第三者によって識別情報１７０に含まれる各ＩＤの情報に基づいて各サーバ２００から分散データ４１０が取得されるリスクを極力低減させるため、識別情報１７０内の各ＩＤをロックすることで対応する分散データ４１０の使用を制限することを可能とする。 However, since the pointer file 150 having the identification information 170 including the file ID of the original data 400 (and the corresponding distributed data 410) and the user ID information exists on the data distribution apparatus 100, it is referred to by a third party. obtain. Therefore, in the present embodiment, when the data distribution apparatus 100 is stolen or lost, the risk that the distributed data 410 is acquired from each server 200 based on the information of each ID included in the identification information 170 by a third party. Therefore, it is possible to restrict the use of the corresponding distributed data 410 by locking each ID in the identification information 170.

図５は、分散データ４１０の使用をロックして元データ４００および対応する分散データ４１０の取得を制限する際の処理の例について概要を示した図である。まず、データ分散装置１００において、ユーザは、インタフェース部１４０を介してロックする対象となるＩＤの値を指定する（Ｓ２１）。具体的には、識別情報１７０内のオリジナルＦＩＤ１７１、カレントＦＩＤ１７２もしくはユーザＩＤ１７３のうち少なくとも１つ以上について値を指定する。次に、指定されたＩＤの情報に基づいて、分散処理部１３０の分散部１３１により、各サーバ２００に対してＩＤのロックの指示を行う（Ｓ２２）。具体的には、ロック対象のＩＤの値を含むロック指示のメッセージを各サーバ２００に対してブロードキャスト（もしくはマルチキャスト）する。 FIG. 5 is a diagram showing an outline of an example of processing when the use of the distributed data 410 is locked and the acquisition of the original data 400 and the corresponding distributed data 410 is restricted. First, in the data distribution apparatus 100, the user specifies an ID value to be locked via the interface unit 140 (S21). Specifically, a value is specified for at least one of the original FID 171, the current FID 172, and the user ID 173 in the identification information 170. Next, based on the specified ID information, the distribution unit 131 of the distribution processing unit 130 instructs each server 200 to lock the ID (S22). Specifically, a lock instruction message including a lock target ID value is broadcast (or multicast) to each server 200.

ロック指示のブロードキャストメッセージを受信した各サーバ２００では、メッセージに含まれるＩＤの情報を、ロックリスト（図示しない）等に登録する（Ｓ２３）。その後、登録の成否をデータ分散装置１００に応答する。各サーバ２００でのロックリストへのＩＤの登録が完了すると、データ分散装置１００は、対象の全てのサーバ２００において正常にロックリストへのＩＤの登録が完了したか否かを判定する（Ｓ２４）。登録が失敗したサーバ２００や、タイムアウトで応答を受信できなかったサーバ２００がある場合は、ＩＤのロック処理をエラーとして終了させるようにしてもよい。なお、このとき、既に行った処理をロールバックするようにしてもよい。 Each server 200 that has received the lock instruction broadcast message registers the ID information included in the message in a lock list (not shown) or the like (S23). After that, the success or failure of registration is returned to the data distribution apparatus 100. When the registration of the ID to the lock list in each server 200 is completed, the data distribution apparatus 100 determines whether or not the registration of the ID to the lock list has been normally completed in all the target servers 200 (S24). . If there is a server 200 that has failed to register or a server 200 that has failed to receive a response due to timeout, the ID lock processing may be terminated as an error. At this time, the processing already performed may be rolled back.

対象の全てのサーバ２００において正常にロックリストへのＩＤの登録が完了した場合、ＩＤのロック処理を終了する。なお、ＩＤのロックの解除についても上記と同様の処理により、各サーバ２００においてロックリストから対象のＩＤの登録を削除することで実現することができる。 When the registration of the ID to the lock list is normally completed in all the target servers 200, the ID lock process is terminated. The unlocking of the ID can also be realized by deleting the registration of the target ID from the lock list in each server 200 by the same process as described above.

［処理フロー（ポインタファイル復元）］
図６は、データ分散装置１００上にポインタファイル１５０を有さない場合にこれを復元する際の処理の例について概要を示した図である。本実施の形態では、データ分散装置１００が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置１００（元データ４００に対応するポインタファイル１５０を有するデータ分散装置１００）とは異なる情報処理装置を新たにデータ分散装置１００として利用する場合に、ポインタファイル１５０（およびこれに含まれる識別情報１７０）を復元して元データ４００もしくは対応する分散データ４１０へのアクセスを可能とする。[Processing flow (pointer file restoration)]
FIG. 6 is a diagram showing an outline of an example of processing when restoring the pointer file 150 when the data distribution apparatus 100 does not exist. In the present embodiment, when the data distribution device 100 is stolen or lost, or when another terminal is used for a business trip or the like, the original data distribution device 100 (the pointer file 150 corresponding to the original data 400 is stored). When an information processing apparatus different from the data distribution apparatus 100) is newly used as the data distribution apparatus 100, the pointer file 150 (and the identification information 170 included therein) is restored to restore the original data 400 or the corresponding distributed data 410 can be accessed.

まず、データ分散装置１００において、ユーザは、インタフェース部１４０を介して、ポインタファイル１５０を復元するためのキー情報となる、識別情報１７０におけるユーザＩＤ１７３の情報を指定する（Ｓ３１）。次に、指定されたユーザＩＤ１７３の情報に基づいて、分散処理部１３０の分散部１３１により、各サーバ２００に対して識別情報１７０の問い合わせを行う（Ｓ３２）。具体的には、指定された値のユーザＩＤ１７３を含む識別情報１７０の問い合わせのメッセージを各サーバ２００に対してブロードキャスト（もしくはマルチキャスト）する。 First, in the data distribution apparatus 100, the user designates the information of the user ID 173 in the identification information 170, which is key information for restoring the pointer file 150, via the interface unit 140 (S31). Next, based on the information of the specified user ID 173, the distribution unit 131 of the distribution processing unit 130 inquires each server 200 about the identification information 170 (S32). Specifically, the inquiry message of the identification information 170 including the user ID 173 having a designated value is broadcast (or multicast) to each server 200.

識別情報１７０の問い合わせのブロードキャストメッセージを受信した各サーバ２００では、メッセージに含まれるユーザＩＤ１７３の情報を取得し、当該ユーザＩＤ１７３に合致する識別情報１７０を検索する（Ｓ３３）。具体的には、メッセージに含まれるユーザＩＤ１７３の値と合致するユーザＩＤ１７３を含む識別情報１７０を、保管している各分散データ４１０のヘッダ等から検索する。該当する識別情報１７０（これをヘッダ等に有する分散データ４１０）がない場合（例えば、図６のサーバＡ（２００ａ））は、その旨をデータ分散装置１００に応答する。 Each server 200 that has received the broadcast message for inquiry about the identification information 170 acquires the information of the user ID 173 included in the message, and searches for the identification information 170 that matches the user ID 173 (S33). Specifically, the identification information 170 including the user ID 173 that matches the value of the user ID 173 included in the message is searched from the header of each distributed data 410 stored. If there is no corresponding identification information 170 (distributed data 410 having this in the header or the like) (for example, server A (200a) in FIG. 6), a response to that effect is sent to the data distribution apparatus 100.

一方、該当する識別情報１７０（これをヘッダ等に有する分散データ４１０）を有する場合（例えば、図６のサーバＢ（２００ｂ））は、当該識別情報１７０について、それぞれロックされているか否かを確認する（Ｓ３４）。具体的には、該当する各識別情報１７０内の各ＩＤ（オリジナルＦＩＤ１７１、カレントＦＩＤ１７２およびユーザＩＤ１７３）の値が、サーバ２００のロックリストに登録されているか否かを確認する。該当の識別情報１７０のうちロックされていないものが１つ以上存在する場合は、これをデータ分散装置１００に送信する一方、全ての識別情報１７０がロックされている場合は、該当する識別情報１７０がない旨をデータ分散装置１００に応答する（Ｓ３５）。 On the other hand, when the corresponding identification information 170 (distributed data 410 having this in the header or the like) is included (for example, server B (200b) in FIG. 6), it is confirmed whether or not each of the identification information 170 is locked. (S34). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172, and user ID 173) in each corresponding identification information 170 is registered in the lock list of the server 200. If one or more of the corresponding identification information 170 is not locked, this is transmitted to the data distribution apparatus 100. On the other hand, if all the identification information 170 is locked, the corresponding identification information 170 is transmitted. A response indicating that there is no data is returned to the data distribution apparatus 100 (S35).

各サーバ２００でのブロードキャストメッセージに対する処理が完了すると、データ分散装置１００は、収集部１３２により、収集した識別情報１７０に基づいて、これを含むポインタファイル１５０を復元し（Ｓ３６）、処理を終了する。なお、複数のサーバ２００から同じ元データ４００に対応する同一内容の識別情報１７０が複数送信される場合があり得るが、この場合は、重複するものを排除して１つの識別情報１７０にまとめる。 When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 restores the pointer file 150 including the identification information 170 by the collection unit 132 based on the collected identification information 170 (S36), and ends the processing. . There may be a case where a plurality of pieces of identification information 170 having the same contents corresponding to the same original data 400 are transmitted from a plurality of servers 200. In this case, duplicate information is excluded and combined into one piece of identification information 170.

また、図２に示したような識別情報１７０の内容からは、ＩＤの情報しか得られないため、ポインタファイル１５０を復元する際に、元データ４００のファイル名と同じファイル名を設定することができない。従って、ダミーのファイル名を自動設定するか、識別情報１７０に、図２に示したようなＩＤの情報だけでなく、カレントＦＩＤ１７２毎に元データ４００のファイル名の情報も合わせて保持するようにし、この情報に基づいてポインタファイル１５０のファイル名を設定するようにしてもよい。 In addition, since only the ID information can be obtained from the contents of the identification information 170 as shown in FIG. 2, when restoring the pointer file 150, the same file name as the file name of the original data 400 can be set. Can not. Therefore, a dummy file name is automatically set, or the identification information 170 holds not only the ID information as shown in FIG. 2 but also the file name information of the original data 400 for each current FID 172. The file name of the pointer file 150 may be set based on this information.

以上に説明したように、本発明の実施の形態１であるデータ分散管理システム１によれば、データ分散装置１００上に分散データ４１０の保管先に係る情報を含む分散管理情報を有さず、また、分散データ４１０がいずれのサーバ２００に保管されているかに影響を受けずに元データ４００の分散保管を行うことが可能となる。 As described above, according to the data distribution management system 1 according to the first exemplary embodiment of the present invention, the data distribution apparatus 100 does not have distribution management information including information related to the storage destination of the distributed data 410, Further, the original data 400 can be distributed and stored without being affected by which server 200 the distributed data 410 is stored.

すなわち、データ分散装置１００が各サーバ２００から必要な分散データ４１０を収集する際は、データ分散装置１００は、元データ４００に係る識別情報１７０の全部もしくは一部を指定して、各サーバ２００に対して、当該元データ４００に係る分散データ４１０を保持しているか否かを問い合わせるメッセージをブロードキャスト等する。当該メッセージに対して、対象の分散データ４１０を保持しているサーバ２００が、対象の分散データ４１０をデータ分散装置１００に応答することで、データ分散装置１００は、各分散データ４１０の保管場所に係る分散管理情報を要さずに必要な分散データ４１０を収集することが可能となる。 That is, when the data distribution apparatus 100 collects necessary distributed data 410 from each server 200, the data distribution apparatus 100 designates all or a part of the identification information 170 related to the original data 400 to each server 200. On the other hand, a message for inquiring whether or not the distributed data 410 related to the original data 400 is held is broadcast. In response to the message, the server 200 holding the target distributed data 410 returns the target distributed data 410 to the data distribution apparatus 100, so that the data distribution apparatus 100 stores the distributed data 410 in the storage location. Necessary distributed data 410 can be collected without requiring such distributed management information.

これにより、データ分散装置１００が携帯型端末であるような場合は特に、分散管理情報が第三者に取得されてしまうことによって、各分散データ４１０の保管場所に係る情報が知られてしまい、分散データ４１０にアクセス可能となってしまうリスクを回避することが可能となる。また、各分散データ４１０がいずれのサーバ２００に保管されているかという点に依存せず、容易に各分散データ４１０を保管するサーバ２００を変更することが可能となる。 Thereby, especially when the data distribution apparatus 100 is a portable terminal, the information regarding the storage location of each distributed data 410 is known by the distribution management information being acquired by a third party, It is possible to avoid the risk that the distributed data 410 can be accessed. Further, it is possible to easily change the server 200 that stores each distributed data 410 without depending on which server 200 stores each distributed data 410.

また、データ分散装置１００上に識別情報１７０およびこれを有するポインタファイル１５０を有していなくても、データ分散装置１００は、各元データ４００に対応する識別情報１７０およびこれを有するポインタファイル１５０を復元することが可能となる。例えば、ユーザＩＤ１７３の情報がユーザにより与えられると、データ分散装置１００は、当該ユーザＩＤ１７３を含む識別情報１７０を有するか否かを問い合わせるメッセージをブロードキャスト等する。対象の識別情報１７０をヘッダ等に含む分散データ４１０を有しているサーバ２００が、対象の識別情報１７０をデータ分散装置１００に応答することで、データ分散装置１００は、当該ユーザが使用可能な元データ４００に対応する識別情報１７０およびこれを含むポインタファイル１５０を取得・復元することが可能となる。 Even if the data distribution apparatus 100 does not have the identification information 170 and the pointer file 150 having the identification information 170, the data distribution apparatus 100 stores the identification information 170 corresponding to each original data 400 and the pointer file 150 having the identification information 170. It can be restored. For example, when the information of the user ID 173 is given by the user, the data distribution apparatus 100 broadcasts a message inquiring whether or not the identification information 170 including the user ID 173 is included. When the server 200 having the distributed data 410 including the target identification information 170 in the header or the like responds to the data distribution device 100 with the target identification information 170, the data distribution device 100 can be used by the user. The identification information 170 corresponding to the original data 400 and the pointer file 150 including the identification information 170 can be acquired and restored.

これにより、データ分散装置１００が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置１００とは異なる情報処理装置を新たにデータ分散装置１００として利用する場合にも容易にポインタファイル１５０を復元して元データ４００もしくは対応する分散データ４１０にアクセスし、業務を継続することが可能となる。 As a result, when the data distribution apparatus 100 is stolen or lost, or when another terminal is used for a business trip or the like, an information processing apparatus different from the original data distribution apparatus 100 is newly set as the data distribution apparatus 100. Even when used, it is possible to easily restore the pointer file 150, access the original data 400 or the corresponding distributed data 410, and continue the business.

＜実施の形態２＞
図１に示したような上述の実施の形態１の構成では、元データ４００に対応する分散データ４１０を保管するｎ個のサーバ２００のうち、ｋ個以上が正常に稼働しており、これらの各サーバ２００からｋ個以上の分散データ４１０を収集することができれば、元データ４００を復元することができる。すなわち、障害等により分散データ４１０を取得することができないサーバ２００が（ｎ−ｋ）個以下であれば正常に元データ４００を復元することができるという点で、高い可用性を有する。<Embodiment 2>
In the configuration of the first embodiment as shown in FIG. 1, k or more of n servers 200 that store the distributed data 410 corresponding to the original data 400 are operating normally. If k or more distributed data 410 can be collected from each server 200, the original data 400 can be restored. In other words, if there are (n−k) or less servers 200 from which the distributed data 410 cannot be acquired due to a failure or the like, the original data 400 can be restored normally.

しかしながら、たとえサーバ２００側がこのように高い可用性を有していても、図１の例のような構成では、データ分散装置１００がシングルポイントとなっていることから、データ分散装置１００が障害となった場合には元データ４００を復元することができなくなってしまう。 However, even if the server 200 side has such high availability, in the configuration as in the example of FIG. 1, the data distribution apparatus 100 is a single point, and therefore the data distribution apparatus 100 becomes an obstacle. If this happens, the original data 400 cannot be restored.

そこで、本実施の形態では、例えば、図１に示したものと同様の構成のデータ分散装置１００をファイルサーバとして構成する。図７は、本発明の実施の形態２であるデータ分散管理システム１の構成例について概要を示した図である。図７の例では、ファイルサーバとして構成されたデータ分散装置１００に対して、複数のクライアント端末５００が接続する構成を有する。さらに、ファイルサーバとしてのデータ分散装置１００を複数のサーバにより冗長化して構成する。 Therefore, in this embodiment, for example, the data distribution apparatus 100 having the same configuration as that shown in FIG. 1 is configured as a file server. FIG. 7 is a diagram showing an outline of a configuration example of the data distribution management system 1 according to the second embodiment of the present invention. 7 has a configuration in which a plurality of client terminals 500 are connected to the data distribution apparatus 100 configured as a file server. Furthermore, the data distribution apparatus 100 as a file server is configured by redundancy with a plurality of servers.

これにより、データ分散装置１００がシングルポイントとならないように構成することができ、データ分散装置１００を構成する１つのサーバが障害その他で停止した場合であっても、他のサーバにテイクオーバーして処理を継続することで可用性を向上させることができる。このとき、例えば、データ分散装置１００を構成する複数のサーバを、図７に示すように１つ以上の物理サーバ１０１上に複数構成した仮想サーバにより構成することで、より簡易かつ低コストで実装することができる。 As a result, the data distribution apparatus 100 can be configured not to be a single point, and even if one server constituting the data distribution apparatus 100 is stopped due to a failure or the like, the data distribution apparatus 100 can take over to another server. Availability can be improved by continuing processing. At this time, for example, a plurality of servers constituting the data distribution apparatus 100 are configured by a plurality of virtual servers on one or more physical servers 101 as shown in FIG. can do.

なお、このとき例えば、図１に示すデータ分散装置１００の構成と同様に、ファイルサーバとしてのデータ分散装置１００上にポインタファイル１５０およびインタフェース部１４０を有し、クライアント端末５００からは、データ分散装置１００上のポインタファイル１５０に対してネットワーク３００を介してアクセスすることで、対応する元データ４００をデータ分散装置１００上で復元して、クライアント端末５００のローカル上に送信する構成とすることができる。 At this time, for example, similarly to the configuration of the data distribution apparatus 100 shown in FIG. 1, the data distribution apparatus 100 as a file server has a pointer file 150 and an interface unit 140. From the client terminal 500, the data distribution apparatus By accessing the pointer file 150 on 100 via the network 300, the corresponding original data 400 can be restored on the data distribution device 100 and transmitted locally on the client terminal 500. .

このとき、複数ユーザからのデータ分散装置１００（ファイルサーバ）上の同一の元データ４００（対応するポインタファイル１５０）へのアクセスに対しては、例えば、識別情報１７０のユーザＩＤ１７３の情報に基づいて対応するユーザのみがアクセス可能としたり（もしくは他のユーザも参照可能としつつ、更新については対応するユーザのみ可能とする）、ポインタファイル１５０を各ユーザが排他的にアクセスするよう制御したりすることで、複数のユーザが同一の元データ４００に対して重複して更新を行うことによる不整合を防止する。ポインタファイル１５０およびインタフェース部１４０に係る部分など、データ分散装置１００の機能の一部を各クライアント端末５００上に有する構成とすることも可能である。 At this time, for access to the same original data 400 (corresponding pointer file 150) on the data distribution apparatus 100 (file server) from a plurality of users, for example, based on the information of the user ID 173 of the identification information 170 Only corresponding users can access (or other users can refer to them while only corresponding users can update), or the pointer file 150 is controlled so that each user has exclusive access. Thus, inconsistency caused by a plurality of users updating the same original data 400 repeatedly is prevented. A part of the functions of the data distribution device 100 such as the pointer file 150 and the interface unit 140 may be provided on each client terminal 500.

＜実施の形態３＞
上述の実施の形態１では、データ分散装置１００の分散処理部１３０は、分散データ処理部１１０によって元データ４００から生成されたｎ個の分散データ４１０をそれぞれ保管するｎ個のサーバ２００を選択する。このサーバ２００の選択は、上述したように、例えば、サーバリスト１３３に登録されたサーバ２００からローテーションやランダム抽出などにより選択する。このとき、各サーバ２００に対して分散データ４１０の保管が可能か否か（すなわちサーバ２００の稼動状況）を問い合わせる処理を行ってもよい。<Embodiment 3>
In the first embodiment described above, the distributed processing unit 130 of the data distribution apparatus 100 selects n servers 200 each storing the n distributed data 410 generated from the original data 400 by the distributed data processing unit 110. . As described above, the server 200 is selected by, for example, rotation or random extraction from the servers 200 registered in the server list 133. At this time, a process of inquiring whether each server 200 can store the distributed data 410 (that is, the operating status of the server 200) may be performed.

ここで、サーバリスト１３３に登録された各サーバ２００（分散データ４１０の保管先となり得る各サーバ２００）は、例えば、スペックや、セキュリティ等を含む運用体制、設置場所（国や地域などのロケーションや、地形的特性など）等の属性などがそれぞれ異なる場合があり得る。すなわち、各サーバ２００における分散データ４１０の保管能力には差異があり得る。従って、このような差異を考慮しない一律のローテーションやその他の選択手法では、分散データ４１０の内容や属性等に応じた適切なサーバ２００の選択を行うことができないことも考えられる。 Here, each server 200 registered in the server list 133 (each server 200 that can be a storage destination of the distributed data 410) is, for example, an operating system including specifications, security, etc., installation location (location such as country and region, , Topographic characteristics, etc.) may be different. That is, there may be a difference in the storage capacity of the distributed data 410 in each server 200. Therefore, it may be impossible to select an appropriate server 200 according to the contents, attributes, and the like of the distributed data 410 with uniform rotation and other selection methods that do not consider such differences.

そこで、本実施の形態では、分散データ４１０を保管する対象となり得る各サーバ２００について、分散データ４１０の保管能力等に応じてアクセス権を設定し、当該アクセス権と、分散データ４１０の属性とに基づいて分散データ４１０を保管するサーバ２００を決定することが可能となるようにする。 Therefore, in the present embodiment, for each server 200 that can be a target for storing the distributed data 410, an access right is set according to the storage capability of the distributed data 410, and the access right and the attribute of the distributed data 410 are set. Based on this, the server 200 that stores the distributed data 410 can be determined.

図８は、本発明の実施の形態３であるデータ分散管理システム１の構成例について概要を示した図である。図８の例では、データ分散装置１００が分散データ４１０を保管する対象となり得る各サーバ２００に対して、アクセス権を設定するためのアクセス権管理サーバ２２０を有する。アクセス権管理サーバ２２０は、各サーバ２００のスペックや、セキュリティ等を含む運用体制、設置場所等の属性の情報に基づいて、所定の基準により、手動もしくは自動で各サーバ２００に対してアクセス権を設定する。なお、図８の例ではアクセス権設定サーバ２２０を独立したサーバとして構成しているが、データ分散装置１００と同一の筐体上で構成してもよい。 FIG. 8 is a diagram showing an outline of a configuration example of the data distribution management system 1 according to the third embodiment of the present invention. In the example of FIG. 8, the data distribution apparatus 100 includes an access right management server 220 for setting an access right for each server 200 that can be a target for storing the distributed data 410. The access right management server 220 assigns the access right to each server 200 manually or automatically based on the predetermined criteria based on the specifications of each server 200, the operation system including security, and the attribute information such as the installation location. Set. In the example of FIG. 8, the access right setting server 220 is configured as an independent server, but may be configured on the same housing as the data distribution apparatus 100.

図９は、本実施の形態におけるポインタファイル１５０および分散データ４１０に付加される識別情報１７０の内容について例を示した図である。図９の例では、図２に示した実施の形態１における識別情報１７０に対して、さらに属性情報１７４が加えられている。属性情報１７４は、フォーマット等は特に限定されないが、対応する元データ４００（元データ４００からなるファイル）についての重要度やファイルの種別等を識別する情報を含む。 FIG. 9 is a diagram showing an example of the contents of identification information 170 added to the pointer file 150 and the distributed data 410 in the present embodiment. In the example of FIG. 9, attribute information 174 is further added to the identification information 170 in the first embodiment shown in FIG. The attribute information 174 is not particularly limited in format and the like, but includes information for identifying the importance of the corresponding original data 400 (a file made up of the original data 400), the file type, and the like.

データ分散装置１００の分散処理部１３０は、元データ４００から生成された分散データ４１０を保管する対象となるサーバ２００を選択する際に、例えば、各サーバ２００に対して、アクセス権の情報を問い合わせて取得し、これと分散データ４１０に付加されている識別情報１７０の属性情報１７４とに基づいて、分散データ４１０を保管することが許可されるか否かを判定し、保管が許可される場合には当該サーバ２００に対して分散データ４１０を保管する。これにより、単純なローテーション等によるサーバ２００の選択ではなく、サーバ２００の保管能力（アクセス権）に応じて対象の分散データ４１０を保管すべきサーバ２００を選択することが可能となる。 When the distribution processing unit 130 of the data distribution apparatus 100 selects a server 200 that is to store the distributed data 410 generated from the original data 400, for example, the server 200 inquires each server 200 about access right information. Is obtained, and based on this and attribute information 174 of the identification information 170 added to the distributed data 410, it is determined whether or not storage of the distributed data 410 is permitted, and storage is permitted. The distributed data 410 is stored in the server 200. Accordingly, it is possible to select the server 200 that should store the target distributed data 410 according to the storage capability (access right) of the server 200, not the selection of the server 200 by simple rotation or the like.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

本発明は、１つ以上のデータを異なるサーバ等に分散保管するデータ分散管理システムに利用可能である。 The present invention can be used in a data distribution management system in which one or more data is distributed and stored in different servers or the like.

１…データ分散管理システム、
１００…データ分散装置、１１０…分散データ処理部、１２０…ポインタファイル処理部、１２１…識別情報生成部、１２２…ＩＤ生成部、１３０…分散処理部、１３１…分散部、１３２…収集部、１３３…サーバリスト、１４０…インタフェース部、１５０…ポインタファイル、１６０…ユーザ情報、１７０…識別情報、１７１…オリジナルファイルＩＤ（ＦＩＤ）、１７２…トレントファイルＩＤ（ＦＩＤ）、１７３…ユーザＩＤ、
２００、２００ａ、ｂ…サーバ、２１０…分散保管部、
３００…ネットワーク、
４００…元データ、４１０…分散データ。1 ... Data distribution management system,
DESCRIPTION OF SYMBOLS 100 ... Data distribution apparatus, 110 ... Distributed data processing part, 120 ... Pointer file processing part, 121 ... Identification information generation part, 122 ... ID generation part, 130 ... Distributed processing part, 131 ... Distribution part, 132 ... Collection part, 133 ... server list, 140 ... interface unit, 150 ... pointer file, 160 ... user information, 170 ... identification information, 171 ... original file ID (FID), 172 ... torrent file ID (FID), 173 ... user ID,
200, 200a, b ... server, 210 ... distributed storage unit,
300 ... Network,
400 ... original data, 410 ... distributed data.

Claims

A plurality of information processing devices having a storage device, and one or more distributed data connected to each of the information processing devices via a network and handled collectively in correspondence with the original data; A data distribution management system having a data distribution device for distributed storage in each
The data distribution device includes:
A distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data;
A pointer file processing unit that generates identification information that identifies and identifies the original data, and generates a pointer file that includes the identification information and corresponds to the original data;
A distributed processing unit that transmits each of the distributed data corresponding to the original data, to which the identification information corresponding to the original data is added, to the different information processing apparatuses,
Each of the information processing devices
The distributed data transmitted from the data distribution apparatus, have a dispersion storage unit for storing in said storage device,
The distributed processing unit of the data distribution apparatus includes:
All or part of the identification information included in the pointer file designated by the user is designated, and the distributed data corresponding to the designated part of the identification information is held for each information processing apparatus. Broadcast a first message asking whether or not
The distributed storage unit of each information processing apparatus,
A search is performed to determine whether the distributed data including the identification information that matches the specified portion of the identification information specified in the first message is stored in its own storage device. Transmitting the distributed data to the data distribution device,
The distributed data processing unit of the data distribution apparatus includes:
Obtaining the corresponding original data based on the distributed data transmitted from each of the information processing devices;
In addition, the distributed processing unit of the data distribution apparatus,
Specifying all or part of the identification information specified by the user, and broadcasting a second message to the respective information processing devices to limit the use of the corresponding distributed data,
The distributed storage unit of each information processing apparatus,
The information of the specified part of the identification information specified in the second message is registered in a lock list, and the identification that matches the specified part of the identification information specified in the first message When searching for the distributed data including information, if the identification information included in the distributed data includes contents registered in the lock list, the use of the corresponding distributed data is restricted. Distributed data management system.

In the data distribution management system according to claim 1 ,
The distributed processing unit of the data distribution apparatus includes:
A third message that specifies a value that identifies the user among the identification information specified by the user and inquires of each information processing apparatus whether or not the corresponding identification information is held. Broadcast,
The distributed storage unit of each information processing apparatus,
If the distributed data including the identification information that matches the value specified for the user specified in the third message is stored in the storage device of the third message, and if it is stored, it is applicable Transmitting the identification information included in the distributed data to the data distribution device;
The pointer file processing unit of the data distribution apparatus is
A data distribution management system which restores the corresponding pointer file based on the identification information transmitted from each information processing apparatus.

In the data distribution management system according to claim 1 or 2 ,
The identification information includes ID information for identifying the entire original data, ID information for identifying the original data for each version when the original data is edited, and a user who created or edited the original data. A data distribution management system comprising ID information for identification.

In the data distribution management system of any one of Claims 1-3 ,
The distributed data processing unit of the data distribution apparatus includes:
A data distribution management system, comprising: generating a plurality of the shared data from the original data by a secret sharing method; and restoring the original data from the plurality of the distributed data by the secret sharing method.

In the data distribution management system according to any one of claims 1 to 4 ,
The distributed storage unit of each information processing apparatus,
When the distributed data corresponding to the original data transmitted from the data distribution device is stored in the storage device, if the past distributed data corresponding to the original data exists, the past distributed data A data distribution management system characterized in that the data is stored after being stored.

In the data distribution management system according to claim 5 ,
The distributed storage unit of each information processing apparatus,
A data distribution management system, wherein the distributed data corresponding to the original data past a predetermined number of generations is deleted at a predetermined timing.

The data distribution management system according to claim 6 , wherein
The distributed processing unit of the data distribution apparatus includes:
Designating information for specifying the original data of the version to be stored designated by the user, and restricting deletion of the distributed data corresponding to the corresponding original data for each information processing apparatus Broadcast 4 messages,
The distributed storage unit of each information processing apparatus,
Registering information that specifies the version of the original data specified in the fourth message in a list, and further deleting the distributed data corresponding to the original data that is past a predetermined number of generations, When the identification information included in the distributed data includes information specifying the original data registered in the list, the data distribution management system is configured to restrict deletion of the corresponding distributed data.

In data distribution management system according to any one of claim 1 to 7
The data distribution device includes:
A data distribution management system comprising a plurality of file servers or a plurality of virtual file servers.

In the data distribution management system according to any one of claims 1 to 8 ,
The identification information further includes attribute information about the original data,
The distributed processing unit of the data distribution apparatus includes:
The information processing capable of storing the distributed data based on access right information set for each information processing apparatus and the attribute information of the identification information added to the distributed data A data distribution management system characterized by selecting a device.