WO2013065544A1

WO2013065544A1 - Data distribution management system

Info

Publication number: WO2013065544A1
Application number: PCT/JP2012/077460
Authority: WO
Inventors: 佐藤　敦; 壮一最首
Original assignee: 株式会社野村総合研究所
Priority date: 2011-11-01
Filing date: 2012-10-24
Publication date: 2013-05-10
Also published as: WO2013065134A1

Abstract

A data distribution management system capable of distributed storage of data without being affected by which server, etc., the distributed data is stored in, and that does not have distribution management information in an information processing device being a data distribution source. A typical embodiment of this invention comprises a data distribution device and an information processing device having a distributed storage unit that stores, in a storage device, distributed data sent from the data distribution device. The data distribution device has: a distribution data processing unit that performs processing related to associating source data and at least one unit of distributed data; a pointer file processing unit that identifies source data and generates specifiable identification information, and generates a pointer file including identification information corresponding to the source data; and a distribution processing unit that sends each unit of distributed data, corresponding to the source data and each having identification information corresponding to the source data added thereto, to different information processing devices.

Description

Data distribution management system

The present invention relates to a data storage technique, and more particularly to a technique effective when applied to a data distribution management system that distributes and stores one or more data in different servers.

In recent years, from the viewpoint of information security, the handling of data such as files held and processed in information processing apparatuses such as PCs (Personal Computers) used by users has been regarded as important. In particular, in addition to notebook PCs, portable terminals such as so-called smartphones and tablet PCs that are increasingly used in business need to consider the risk of information leakage due to theft or loss of these terminals themselves. There is.

On the other hand, the risk of information leakage due to loss of the terminal due to the so-called thin client that stores data including important data in the terminal in an external data center or server where security measures are taken It is conceivable to reduce. At this time, the important data is not stored in an external server or the like as it is, but for example, the so-called secret sharing technique described in Non-Patent Document 1 or the like is used, and the important data alone is meaningless. It has also been proposed to divide into non-critical data (important data cannot be reconstructed / inferred) and to store these non-critical data in a plurality of external servers. Thereby, for example, the risk of information leakage can be reduced even in the case of storage in a virtual data center or virtual server in a cloud computing environment.

In addition, when important data is divided into a plurality of data by secret sharing technology, even if a part of the divided data is lost, the original important data can be restored if it can collect a predetermined number of pieces of divided data, Data availability can also be improved. For example, when n pieces of divided data are generated from important data by so-called (k, n) threshold type secret sharing, the important data can be restored if k or more pieces of divided data can be collected. In other words, it is possible to withstand the loss of up to (n−k) pieces of divided data. Utilizing such high availability, it is also considered that the divided data is distributed and stored in a plurality of remote locations to be used as a backup of the original important data.

In this way, for example, when a plurality of data that are handled in a batch, such as divided data generated by secret sharing, are distributed and stored in other servers from the viewpoint of security, backup, etc. Management information (hereinafter referred to as “distributed management information”) including information on where the data is stored in which server by the information processing apparatus of each user who is the distribution source, a specific management server such as a file server, and the like May be included). When collecting distributed data stored in a distributed manner on each server, etc., by referring to this distributed management information, it is possible to identify which server is storing the necessary distributed data, and directly to the target server. Access and collect the necessary distributed data.

For example, Japanese Patent Laid-Open No. 2007-213405 (Patent Document 1) stores tally folders A, B,... For storing tally files, a restoration destination folder for storing restoration files, and a tally object file by an information management computer. A tally object folder, a tally engine folder containing a restoration engine program and a division engine program, and a tally parameter including information on a decoding boundary, which is a range that can be read by the tally application, are set as tally object files A, B,. The tally file name / storage location and the object information of the restoration destination folder are stored in, the tally file is collected directly based on the tally file storage location and the decoding boundary, the restoration file is generated, and the restoration file is stored and opened. By the secret sharing method Distributed information file management means for restoring efficiently locate and original data to prevent file is described.

JP 2007-213405 A

However, in the conventional method of distributed storage of data as described in Patent Document 1 and the like, an information processing apparatus that is a data distribution source, a specific management server such as a file server, and the like store important data (specifically, Holds distributed management information related to one or more distributed data related to important data), and thus has a problem in terms of security. That is, for example, if a portable terminal that is a data distribution source holds the distributed management information related to important data and is stolen or lost, the distributed management information is viewed by a third party. As a result, information on the location of distributed data related to important data (host name and network address of each server that stores the distributed data, URL (Uniform Resource Locator) etc. information for accessing the distributed data) can be obtained. Have a risk.

In addition, when the distribution management information is held in the data distribution source information processing apparatus, for example, the distribution destination server may be changed due to a failure in use of the distributed storage destination server or the like. When it becomes necessary, it becomes necessary to individually rewrite the contents of the distribution management information with the information of the new storage destination server or the like in each information processing apparatus of the data distribution source. For example, when the distribution destination server is a virtual server using a cloud computing service, it must be operated in an unknown state when the virtual server is stopped, and the distribution destination virtual server is changed. Each time the information is distributed, rewriting the contents of the distribution management information in each information processing terminal as a distribution source increases the operational load.

In addition, for example, when a user loses a portable terminal or the like that is a distribution source of data, the user tries to access distributed data (original important data with respect to the distributed data) using another information processing apparatus, When accessing distributed data from an information processing device that is not normal at another business location or business trip destination, etc., the user's information processing device is concerned with the target important data (distributed data for important data). There is no distributed management information. For this reason, it is impossible to grasp on which server each distributed data is distributed and stored, and it becomes impossible to access the distributed data.

Therefore, an object of the present invention is to provide distributed storage of data without having distributed management information in an information processing apparatus that is a data distribution source, and without being affected by which server or the like the distributed data is stored. An object of the present invention is to provide a data distribution management system that can be performed. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.

A data distribution management system according to a representative embodiment of the present invention is connected to a plurality of information processing devices having a storage device and the respective information processing devices via a network, and collectively handles corresponding to the original data. A data distribution management system having a data distribution apparatus that distributes and stores one or more distributed data to be stored in the storage device of the information processing apparatus, and has the following characteristics.

That is, the data distribution apparatus generates a distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data, and identification information that can identify and specify the original data. A pointer file processing unit for generating a pointer file including the identification information corresponding to the original data, and each of the distributed data corresponding to the original data to which the identification information corresponding to the original data is added. And a distributed processing unit for transmitting to the different information processing apparatuses. Each of the information processing apparatuses includes a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.

Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

According to the representative embodiment of the present invention, the information processing apparatus that is the data distribution source does not have the distribution management information, and is not affected by which server or the like the distributed data is stored. In addition, data can be distributed and stored.

It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 1 of this invention. It is the figure which showed the example about the content of the identification information added to the pointer file and shared data in Embodiment 1 of this invention. It is the figure which showed the outline | summary about the example of the process at the time of matching the original data and the some distributed data in Embodiment 1 of this invention, and carrying out these distributed storage. It is the figure which showed the outline | summary about the example of the process at the time of collecting the some distributed data in Embodiment 1 of this invention, and obtaining original data from these. It is the figure which showed the outline | summary about the example of the process at the time of restricting acquisition of original data and corresponding distributed data by locking use of the distributed data in Embodiment 1 of this invention. It is the figure which showed the outline | summary about the example of a process at the time of restoring this, when not having a pointer file on the data distribution apparatus in Embodiment 1 of this invention. It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 2 of this invention. It is the figure which showed the outline | summary about the structural example of the data distribution management system which is Embodiment 3 of this invention. It is the figure which showed the example about the content of the identification information added to the pointer file and shared data in Embodiment 3 of this invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

<Embodiment 1>
The data distribution management system according to the first embodiment of the present invention is a system that distributes and stores a plurality of distributed data that is handled collectively in correspondence with original data such as important data in storage devices such as other data centers and servers. The distributed management information including the information relating to the location of each distributed data stored in which data center or server is not included. In the present embodiment, instead of the above distributed management information, the data distribution apparatus that performs distributed storage of data generates and holds identification information for identifying each original data, and header information of each distributed data By adding the identification information to the ID, it is possible to collect necessary distributed data without requiring information relating to the location of the data center or server where each distributed data is stored.

Here, one or more distributed data items that are handled in batches corresponding to the original data are collectively acquired in response to a processing request such as one-time storage, browsing, or reference for the target original data. Or one or more data to be processed such as storage, display, etc. In the present embodiment, for example, a plurality of pieces of divided data generated by secret sharing processing from important data that is original data are shown as distributed data, but the present invention is not limited to this.

For example, in a business application etc., a series of related files generated by the business application, a series of work files specified by the user, etc. for management data such as projects and projects created by the user, respectively. Such data may be distributed and stored in a server or the like as distributed data. Note that there may be one distributed data for the original data (for example, the target original data itself) (use form as remote copy or backup).

When the data distribution device collects necessary distributed data from each data center, server, etc., the data distribution device specifies all or part of the identification information related to the original data and sends it to each data center, server, etc. Then, a message for inquiring whether or not the distributed data corresponding to the original data is held is broadcast (or multicast). In response to the message, the data center or server holding the target distributed data responds to the data distribution device with the target distributed data. Necessary distributed data can be collected without requiring management information.

As a result, especially when the data distribution device is a portable terminal, each distributed data is acquired by the distribution management information being obtained by a third party when the data distribution device is stolen or lost. It is possible to avoid the risk that the information related to the storage location is known and the distributed data can be accessed. In addition, the storage location of each distributed data can be easily changed without depending on which data center or server stores each distributed data, thereby improving system availability and flexibility. It becomes possible.

In this embodiment, even if the data distribution apparatus does not have identification information, the data distribution apparatus can restore the identification information of each data. For example, when information identifying a user, such as a user ID, is given by the user, the data distribution apparatus broadcasts (or multicasts) a message for inquiring identification information related to the user. A data center or server having distributed data having target identification information responds to the data distribution apparatus with the target identification information, so that the data distribution apparatus corresponds to each data usable by the user. Identification information can be acquired / restored, and corresponding distributed data can be collected based on this identification information.

As a result, when an information processing device different from the original data distribution device is newly used as a data distribution device, such as when the data distribution device is stolen or lost, or when using another terminal for business trips, etc. Even so, it is possible to easily restore the identification information, access the distributed data, and continue the business.

[System configuration]
FIG. 1 is a diagram showing an outline of a configuration example of a data distribution management system according to the first embodiment of the present invention. The data distribution management system 1 has a configuration in which a data distribution apparatus 100 and one or more servers 200 are connected to each other via a network 300 such as the Internet and can communicate with each other. A configuration having a plurality of data distribution devices 100 may also be possible.

The data distribution apparatus 100 is configured by an information processing apparatus such as a PC or a portable terminal. For example, the data distribution apparatus 100 and the pointer file processing unit 120 are implemented by a software program that operates on an operating system (not shown). , A distributed processing unit 130, and an interface unit 140. In addition, user information 160 that is data such as a database, a file, and a registry that holds information (for example, account information) related to a user who can use the data distribution management service by the data distribution apparatus 100 or the data distribution management system 1. Have. In addition, a pointer file 150 having a function as a pointer that points to the distributed data 410 stored in each server 200 is provided corresponding to each of the plurality of original data 400.

The distributed data processing unit 110 performs processing related to the association between the original data 400 and one or more distributed data 410 handled in a lump in correspondence with the original data 400. In the present embodiment, for example, n pieces of divided data to be distributed data 410 are generated by the (k, n) threshold secret sharing method for the specified original data 400, and conversely, A known secret sharing library that restores the original data 400 by using (k, n) threshold secret sharing method with k or more pieces of shared data 410 as divided data.

As described above, the distributed data 410 is not limited to data generated from the original data 400 or generated based on the original data 400 as in the present embodiment. It may be a plurality of data. Further, the distributed data 410 may be one (for example, the original data 400 itself).

The pointer file processing unit 120 generates a pointer file 150 having a function as a pointer that points to the distributed data 410 corresponding to each of the plurality of original data 400. Further, processing is performed on the original data 400 (or the corresponding distributed data 410) based on an instruction from the user to the pointer file 150 via the interface unit 140 described later.

The pointer file 150 has a function of pointing to the original data 400 (and corresponding distributed data 410), but does not have the entity of the original data 400. The contents of the pointer file 150 are as described below. It has identification information that identifies and identifies the corresponding distributed data 410). That is, the pointer file 150 is similar to a so-called shortcut, symbolic link, alias, or the like for the original data 400 (and corresponding distributed data 410). This identification information is also added as header information or the like to each distributed data 410 generated by the distributed data processing unit 110.

The pointer file processing unit 120 further includes an identification information generation unit 121 in order to generate this identification information. In addition, an ID generation unit 122 is provided to generate various ID values included in the identification information. The ID generation unit 122 includes a library having a known function that can generate a unique ID (universal ID) that does not overlap with a plurality of different data distribution apparatuses 100.

The distributed processing unit 130 adds identification information to the distributed data 410 associated with the original data 400 by the distributed data processing unit 110, and distributes and stores the distributed data in each server 200 based on a predetermined rule, and the original data 400 includes a collection unit 132 that collects the distributed data 410 associated with each of the servers 200 from each server 200. Further, it may have a server list 133 including a list of servers 200 that can be storage destinations of the distributed data 410.

In the present embodiment, the distribution unit 131 is generated by, for example, the (k, n) threshold secret sharing method by the distributed data processing unit 110, and n pieces of distributed data to which identification information is added by the pointer file processing unit 120. 410 is distributed and stored in n different servers 200 selected from the server list 133. When the number of servers 200 is larger than n, n servers 200 that store the distributed data 410 are selected from among them by, for example, rotation or random extraction.

On the other hand, the collection unit 132 inquires of each server 200 whether or not it has the distributed data 410 associated with the original data 400, and collects the distributed data 410 transmitted from the server 200 that has it. . In this embodiment, for example, the distributed data processing unit 110 collects k or more pieces of distributed data 410 necessary for restoring the original data 400 by the (k, n) threshold secret sharing method.

When inquiring each server 200, a message including all or part of the identification information included in the pointer file 150 corresponding to the target original data 400 is broadcast to all the servers 200 (or listed in the server list 133). Multicast to each of the servers 200 being configured. As a broadcast (multicast) protocol, a known technique can be used as appropriate.

The interface unit 140 has an input / output function such as a user interface such as a screen display in the data distribution apparatus 100. The user can use the functions of the data distribution management system 1 by using, for example, a file management screen or application provided in a general OS.

For example, in the file management application, the original data 400 is moved to a specific folder or the like by a simple operation such as drag and drop. With this as a trigger, the distributed data processing unit 110 generates the distributed data 410, and the distributed processing unit 130 stores the distributed data in the servers 200 in a distributed manner. Further, the pointer file processing unit 120 generates a pointer file 150 corresponding to the original data 400 and replaces the original data 400 such as a specific folder. Thereafter, access such as reference to the original data 400 from the user is performed on the pointer file 150 arranged in a specific folder or the like.

When the user gives an instruction to refer to the pointer file 150 in a specific folder or the like, the pointer file processing unit 120 causes the distributed data 410 associated with the original data 400 specified by the pointer file 150 to be Collected from each server 200 by the distributed processing unit 130. Further, when necessary as in the present embodiment, the original data 400 is restored from the collected distributed data 410 by the distributed data processing unit 110. Thereafter, the original data 400 or the distributed data 410 is displayed by a related application program or the like. Thereby, it is possible to provide the user with an interface equivalent to processing such as storage / reference for the original data 400, and to conceal the processing related to the distributed data 410.

The server 200 is an information processing apparatus having a storage device such as an HDD (Hard Disk Disk Drive) (not shown) that can store the distributed data 410 transmitted from the data distribution apparatus 100, such as a file server or a storage server. Consists of. Moreover, the data center which has these information processing apparatuses may be sufficient. Further, it may be a virtual server or a virtual data center by a cloud computing service.

The server 200 includes, for example, a distributed storage unit 210 that is implemented by a software program that runs on an OS (not shown). The distributed storage unit 210 stores the distributed data 410 transmitted from the data distribution apparatus 100 in a storage device. Further, in response to a broadcast (or multicast) message from the data distribution apparatus 100, when the distributed data 410 including the identification information matching the identification information included in the message is searched for and the corresponding distributed data 410 is included. The identification information contained in the distributed data 410 or its header is returned to the data distribution apparatus 100.

FIG. 2 is a diagram showing an example of the contents of identification information generated by the identification information generation unit 121 of the pointer file processing unit 120 and added to the pointer file 150 and the distributed data 410. The identification information 170 includes information such as an original file ID (FID) 171, a current file ID (FID) 172, and a user ID 173, for example. The original FID 171 is an ID for uniquely identifying the entire original data 400 (a file made up of the original data 400) including each version (generation). The original FID 171 is used when the original data 400 is first distributed and stored, that is, when the distributed data 410 is first generated from the original data 400 and distributed and stored in each server 200. Assigned to identify data 400 and corresponding distributed data 410.

The current FID 172 is an ID for uniquely identifying each version (generation) of the original data 400 (a file including the original data 400). The current FID 172 is an ID assigned to the latest version (generation) of original data 400 when the original data 400 is first distributedly stored and then edited or updated. . That is, initially, the value of the current FID 172 is the same as the value of the original FID 171, and thereafter, the distributed data 410 necessary for editing the original data 400 is collected, and the latest original data 400 after editing is again collected. The ID is assigned every time the distributed data 410 is associated with the distributed data 410 and distributedly stored in each server 200. It is assumed that the value of original FID 171 is not updated as it was initially assigned.

Therefore, the current FID 172 is not only an ID for specifying the latest original data 400 and the corresponding distributed data 410, but also has a role as version information of the original data 400. That is, when the distributed storage unit 210 of each server 200 stores the distributed data 410 for the latest original data 400 after editing, the distributed data 410 for the previous version of the original data 400 (the latest is a header or the like) (The current FID 172 of the identification information 170 included in is different) is left as a history. As a result, each server 200 stores the distributed data 410 corresponding to a plurality of versions of the original data 400, so that the version of the original data 400 designated by the user and the corresponding distributed data 410 can be obtained.

It should be noted that a plurality of distributed data 410 having different current FIDs 172 but the same original FIDs 171 can be determined to be of different versions of the same original data 400.

The user ID 173 is an ID that identifies a user corresponding to the identification information 170, that is, a user who created or edited the original data 400 corresponding to the identification information 170. This ID information can be associated with the ID information of each user registered in the user information 160, for example.

Each ID of the identification information 170 needs to be a unique ID that does not overlap in the data distribution management system 1. Accordingly, these IDs can be IDs (universal IDs) generated by the ID generation unit 122 of the pointer file processing unit 120, for example. As the user ID 173, for example, the user ID in the account information of each user stored in the user information 160 may be used, and to this, an organization or group to which a user such as a department or a company belongs, and data distribution By adding information for identifying a contract unit of the data management service provided by the management system 1, the ID may be unique within the data distribution management system 1.

[Processing flow (distributed storage)]
FIG. 3 is a diagram showing an outline of an example of processing when the original data 400 and a plurality of distributed data 410 are associated and stored in a distributed manner. In the data distribution apparatus 100, when receiving an instruction to save the original data 400 from the user via the interface unit 140, first, the distributed data processing unit 110 generates one or more distributed data 410 from the original data 400 (S01). ). In the present embodiment, as described above, for example, n pieces of distributed data 410 that cannot be restored without collecting k pieces or more from the original data 400 by (k, n) threshold secret sharing method are generated. To do. As a result, the original data 400 and the n distributed data 410 are associated with each other.

Next, the pointer file processing unit 120 generates identification information 170 for the original data 400 (S02), and further generates a pointer file 150 including the identification information 170 (S03). Here, as described above, for example, the ID generation unit 122 or the like generates information of each ID in the identification information 170, and the identification information generation unit 121 generates the identification information 170 including these IDs. Further, the pointer file processing unit 120 generates a pointer file 150 including the contents of the identification information 170. At this time, for example, by making the file name (excluding the extension) of the pointer file 150 the same as the original data 400, the user can easily identify the pointer file 150 corresponding to the original data 400.

When the original data 400 is already distributed and stored in the past and already has the corresponding pointer file 150 and identification information 170 (that is, after editing the original data 400) In the case where distributed storage is performed again), only the current FID 172 in the existing identification information 170 may be newly generated and updated in step S02 (the original FID 171 is not updated and is left as it is). At this time, the contents of the existing current FID 172 may be left as past version history together with the updated latest FID 172 contents.

Next, after adding or updating the contents of the identification information 170 generated or updated in step S02 to the header or the like of each distributed data 410 generated in step S01, each distributed data is processed by the distribution unit 131 of the distributed processing unit 130. 410 is transmitted to a plurality of different servers 200 (server A (200a) and server B (200b) in the example of FIG. 3) for distributed storage (S04). As described above, the plurality of servers 200 are selected from the servers 200 registered in the server list 133 by rotation or random extraction, for example. In the present embodiment, n servers 200 that store the n distributed data 410 generated by the distributed data processing unit 110 are selected. At this time, a process of inquiring each server 200 as to whether or not the distributed data 410 can be stored may be performed.

In each server 200 that has received the distributed data 410, the distributed storage unit 210 stores the distributed data 410 in the storage device (S05). At this time, if the distributed data 410 corresponding to the past version of the original data 400 exists, the distributed data 410 may be left and stored. In this case, the distributed data 410 corresponding to the past version of the original data 400 is further deleted and organized (S06), and a series of processing results are returned to the data distribution apparatus 100.

In step S06, for example, the distributed storage unit 210 uses the distributed data 410 having the identification information 170 including the original FID 171 identical to the original FID 171 of the identification information 170 included in the header of the latest distributed data 410 to be newly stored (that is, The distributed data 410) corresponding to different versions of the same original data 400 is searched. If the number of retrieved distributed data 410 is greater than a predetermined number (number of storable generations), the oldest distributed data 410 is deleted in order from the oldest distributed data 410 until the predetermined number of generations are reached. In addition, the new and old of the distributed data 410 can be grasped by, for example, a time stamp attached to a file including the distributed data 410.

As described above, the deletion processing of the old distributed data 410 in step S06 may be performed each time the distributed data 410 is stored in step S05, or is periodically started at each server 200 at a predetermined time. Alternatively, all distributed data 410 may be collectively processed by a batch program or the like. A specific version of the distributed data 410 (that is, the distributed data 410 having the identification information 170 including the specific current FID 172) may be locked so as not to be deleted by a procedure similar to the ID locking procedure described later. Is possible.

When the distributed storage in each server 200 is completed, the data distribution apparatus 100 determines whether or not the distributed storage processing has been normally completed by the distribution unit 131 (S07). For example, in the present embodiment, it is determined whether n pieces of distributed data 410 have been normally stored in n servers 200. If there is distributed data 410 that could not be stored normally, another server 200 may be selected and the processes in steps S04 to S06 may be retried until all the distributed data 410 can be stored. Further, when there is no longer a server 200 that can be stored, the distributed storage process may be terminated as an error. At this time, the processing already performed may be rolled back.

When the distributed storage process is normally completed, the data distribution apparatus 100 deletes the original data 400 and the generated distributed data 410 held on the data distribution apparatus 100 (S08), and ends the process. By deleting these data on the data distribution apparatus 100, it is possible to avoid the leakage of the original data 400 (and corresponding distribution data 410) for theft or loss of the data distribution apparatus 100 itself. It becomes.

Further, the pointer file 150 held on the data distribution apparatus 100 has only file ID information for identifying the original data 400 (and corresponding distribution data 410), and information and data related to the data content itself are included. It does not have information related to the server 200 that is actually stored. Therefore, even if a third party knows the contents of the pointer file 150, the distributed data 410 cannot be collected, and the original data 400 cannot be restored (information related to the original data 400 can be obtained).

In the present embodiment, the original data 400 and the distributed data 410 are deleted from the data distribution apparatus 100 in consideration of the security viewpoint as described above. However, the backup of the original data 400 on the data distribution apparatus 100 is used. When the distributed storage service is used, the original data 400 may be left without being deleted.

[Processing flow (original data acquisition)]
FIG. 4 is a diagram showing an outline of an example of processing when collecting a plurality of distributed data 410 and obtaining original data 400 from these. In the data distribution apparatus 100, when an instruction to refer to the original data 400 (including reference for editing) is received by an operation on the pointer file 150 by the user via the interface unit 140, first, the pointer file processing unit 120 The contents of the identification information 170 included in the pointer file 150 are acquired (S11). Next, based on the information of the current FID 172 in the identification information 170, the collection unit 132 of the distributed processing unit 130 inquires each server 200 whether the corresponding distributed data 410 is held (S12).

Specifically, as described above, for example, an inquiry message of the distributed data 410 including information on the current FID 172 is broadcast to each server 200. When the number of servers 200 is large, the load on the network 300 may be reduced by multicasting the servers 200 listed in the server list 133.

Each server 200 that has received the inquiry broadcast message acquires the information of the current FID 172 included in the message by the distributed storage unit 210, and searches the distributed data 410 corresponding to the current FID 172 (S13). Specifically, the distributed data 410 having the identification information 170 including the current FID 172 that matches the current FID 172 included in the message in the header or the like is searched. When the corresponding distributed data 410 is not stored (for example, the server B (200b) in FIG. 4), a response to that effect is sent to the data distribution apparatus 100.

On the other hand, when the corresponding distributed data 410 is stored (for example, server A (200a) in FIG. 4), it is confirmed whether or not the identification information 170 included in the header of the distributed data 410 is locked. (S14). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172 or user ID 173) in the target identification information 170 is registered in a lock list (not shown) held in the server 200. If registered, since the use of the target distributed data 410 is locked, a response to that effect is sent to the data distribution apparatus 100. If not registered, the target distributed data 410 is transmitted to the data distribution apparatus 100 (S15). Registration of IDs in the lock list will be described later.

When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 can acquire the original data 400 from the collected distributed data 410 (distributed data 410 transmitted from each server 200) by the collection unit 132. It is determined whether or not there is (S16). For example, in this embodiment, it is determined whether or not k or more pieces of distributed data 410 that can restore the original data 400 have been collected. When the original data 400 cannot be acquired (restored), that is, when there are less than k pieces of distributed data 410 that can be collected in the present embodiment, the acquisition process of the original data 400 may be terminated as an error. .

If it is determined in step S16 that the original data 400 can be acquired, the distributed data processing unit 110 acquires (restores) the original data 400 from the collected distributed data 410 (S17), and the process ends. In the present embodiment, the original data 400 is restored from the collected k or more pieces of distributed data 410 by the (k, n) threshold secret sharing method. At this time, according to the type of the restored original data 400, an application program associated therewith may be activated to display the restored original data 400.

In this way, the user performs the same processing as that for the original data 400 on the pointer file 150 via the interface unit 140, whereby the data distribution apparatus 100 collects the necessary distributed data 410 and collects the original data 400. Since the distributed data 410 is distributed and stored in the plurality of servers 200, the original data 400 (or the corresponding distributed data 410) can be seamlessly obtained. Can be accessed. The data distribution apparatus 100 can also collect the necessary distributed data 410 without retaining information on which server 200 each distributed data 410 is stored in.

In the example of FIG. 4 described above, each server 200 is inquired as to whether the distributed data 410 is held based on the information of the current FID 172 in the identification information 170. The inquiry may be made using the ID information. For example, it is possible to collect the distributed data 410 corresponding to a plurality of original data 400 of different versions (current FID 172) by inquiring by specifying the original FID 171 based on a user instruction. In addition, by designating and inquiring the user ID 173, all the distributed data 410 corresponding to the original data 400 created and edited by the corresponding user can be collected.

[Processing flow (ID lock)]
In the present embodiment, for example, when the portable terminal that is the data distribution apparatus 100 is stolen or lost, the data distribution apparatus 100 does not hold the original data 400 as described above. Since there is no distributed management information including information related to the storage location (server 200), the risk of leakage of the original data 400 can be reduced.

However, since the pointer file 150 having the identification information 170 including the file ID of the original data 400 (and the corresponding distributed data 410) and the user ID information exists on the data distribution apparatus 100, it is referred to by a third party. obtain. Therefore, in the present embodiment, when the data distribution apparatus 100 is stolen or lost, the risk that the distributed data 410 is acquired from each server 200 based on the information of each ID included in the identification information 170 by a third party. Therefore, it is possible to restrict the use of the corresponding distributed data 410 by locking each ID in the identification information 170.

FIG. 5 is a diagram showing an outline of an example of processing when the use of the distributed data 410 is locked and acquisition of the original data 400 and the corresponding distributed data 410 is restricted. First, in the data distribution apparatus 100, the user specifies an ID value to be locked via the interface unit 140 (S21). Specifically, a value is specified for at least one of the original FID 171, the current FID 172, and the user ID 173 in the identification information 170. Next, based on the specified ID information, the distribution unit 131 of the distribution processing unit 130 instructs each server 200 to lock the ID (S22). Specifically, a lock instruction message including a lock target ID value is broadcast (or multicast) to each server 200.

Each server 200 that has received the lock instruction broadcast message registers the ID information included in the message in a lock list (not shown) or the like (S23). After that, the success or failure of registration is returned to the data distribution apparatus 100. When the registration of the ID to the lock list in each server 200 is completed, the data distribution apparatus 100 determines whether or not the registration of the ID to the lock list has been normally completed in all the target servers 200 (S24). . If there is a server 200 that has failed to register or a server 200 that has failed to receive a response due to timeout, the ID lock processing may be terminated as an error. At this time, the processing already performed may be rolled back.

When all the target servers 200 have successfully registered IDs in the lock list, the ID lock process is terminated. The unlocking of the ID can also be realized by deleting the registration of the target ID from the lock list in each server 200 by the same process as described above.

[Processing flow (pointer file restoration)]
FIG. 6 is a diagram showing an outline of an example of processing when restoring the pointer file 150 when the data distribution apparatus 100 does not exist. In the present embodiment, when the data distribution device 100 is stolen or lost, or when another terminal is used for a business trip or the like, the original data distribution device 100 (the pointer file 150 corresponding to the original data 400 is stored). When an information processing apparatus different from the data distribution apparatus 100) is newly used as the data distribution apparatus 100, the pointer file 150 (and the identification information 170 included therein) is restored to restore the original data 400 or the corresponding distributed data 410 can be accessed.

First, in the data distribution apparatus 100, the user designates the information of the user ID 173 in the identification information 170, which is key information for restoring the pointer file 150, via the interface unit 140 (S31). Next, based on the information of the specified user ID 173, the distribution unit 131 of the distribution processing unit 130 inquires each server 200 about the identification information 170 (S32). Specifically, the inquiry message of the identification information 170 including the user ID 173 having a designated value is broadcast (or multicast) to each server 200.

Each server 200 that has received the broadcast message for inquiry about the identification information 170 acquires the information of the user ID 173 included in the message, and searches for the identification information 170 that matches the user ID 173 (S33). Specifically, the identification information 170 including the user ID 173 that matches the value of the user ID 173 included in the message is searched from the header of each distributed data 410 stored. If there is no corresponding identification information 170 (distributed data 410 having this in the header or the like) (for example, server A (200a) in FIG. 6), a response to that effect is sent to the data distribution apparatus 100.

On the other hand, when the corresponding identification information 170 (distributed data 410 having this in the header or the like) is included (for example, server B (200b) in FIG. 6), it is confirmed whether or not each of the identification information 170 is locked. (S34). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172, and user ID 173) in each corresponding identification information 170 is registered in the lock list of the server 200. If one or more of the corresponding identification information 170 is not locked, this is transmitted to the data distribution apparatus 100. On the other hand, if all the identification information 170 is locked, the corresponding identification information 170 is transmitted. A response indicating that there is no data is returned to the data distribution apparatus 100 (S35).

When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 restores the pointer file 150 including the identification information 170 by the collection unit 132 based on the collected identification information 170 (S36), and ends the processing. . There may be a case where a plurality of pieces of identification information 170 having the same contents corresponding to the same original data 400 are transmitted from a plurality of servers 200. In this case, duplicate information is excluded and combined into one piece of identification information 170.

In addition, since only the ID information can be obtained from the contents of the identification information 170 as shown in FIG. 2, when restoring the pointer file 150, the same file name as the file name of the original data 400 can be set. Can not. Therefore, a dummy file name is automatically set, or the identification information 170 holds not only the ID information as shown in FIG. 2 but also the file name information of the original data 400 for each current FID 172. The file name of the pointer file 150 may be set based on this information.

As described above, according to the data distribution management system 1 according to the first exemplary embodiment of the present invention, the data distribution apparatus 100 does not have distribution management information including information related to the storage destination of the distributed data 410, Further, the original data 400 can be distributed and stored without being affected by which server 200 the distributed data 410 is stored.

That is, when the data distribution apparatus 100 collects necessary distributed data 410 from each server 200, the data distribution apparatus 100 designates all or a part of the identification information 170 related to the original data 400 to each server 200. On the other hand, a message for inquiring whether or not the distributed data 410 related to the original data 400 is held is broadcast. In response to the message, the server 200 holding the target distributed data 410 returns the target distributed data 410 to the data distribution apparatus 100, so that the data distribution apparatus 100 stores the distributed data 410 in the storage location. Necessary distributed data 410 can be collected without requiring such distributed management information.

Thereby, especially when the data distribution apparatus 100 is a portable terminal, the information regarding the storage location of each distributed data 410 is known by the distribution management information being acquired by a third party, It is possible to avoid the risk that the distributed data 410 can be accessed. Further, it is possible to easily change the server 200 that stores each distributed data 410 without depending on which server 200 stores each distributed data 410.

Even if the data distribution apparatus 100 does not have the identification information 170 and the pointer file 150 having the identification information 170, the data distribution apparatus 100 stores the identification information 170 corresponding to each original data 400 and the pointer file 150 having the identification information 170. It can be restored. For example, when the information of the user ID 173 is given by the user, the data distribution apparatus 100 broadcasts a message inquiring whether or not the identification information 170 including the user ID 173 is included. When the server 200 having the distributed data 410 including the target identification information 170 in the header or the like responds to the data distribution device 100 with the target identification information 170, the data distribution device 100 can be used by the user. The identification information 170 corresponding to the original data 400 and the pointer file 150 including the identification information 170 can be acquired and restored.

As a result, when the data distribution apparatus 100 is stolen or lost, or when another terminal is used for a business trip or the like, an information processing apparatus different from the original data distribution apparatus 100 is newly set as the data distribution apparatus 100. Even when used, it is possible to easily restore the pointer file 150, access the original data 400 or the corresponding distributed data 410, and continue the business.

<Embodiment 2>
In the configuration of the first embodiment as shown in FIG. 1, k or more of n servers 200 that store the distributed data 410 corresponding to the original data 400 are operating normally. If k or more distributed data 410 can be collected from each server 200, the original data 400 can be restored. That is, it has high availability in that the original data 400 can be normally restored if the number of servers 200 that cannot acquire the distributed data 410 due to a failure or the like is (n−k) or less.

However, even if the server 200 side has such high availability, in the configuration as in the example of FIG. 1, the data distribution apparatus 100 is a single point, and therefore the data distribution apparatus 100 becomes an obstacle. If this happens, the original data 400 cannot be restored.

Therefore, in this embodiment, for example, the data distribution apparatus 100 having the same configuration as that shown in FIG. 1 is configured as a file server. FIG. 7 is a diagram showing an outline of a configuration example of the data distribution management system 1 according to the second embodiment of the present invention. 7 has a configuration in which a plurality of client terminals 500 are connected to the data distribution apparatus 100 configured as a file server. Furthermore, the data distribution apparatus 100 as a file server is configured by redundancy with a plurality of servers.

As a result, the data distribution apparatus 100 can be configured not to be a single point, and even if one server constituting the data distribution apparatus 100 is stopped due to a failure or the like, the data distribution apparatus 100 can take over to another server. Availability can be improved by continuing processing. At this time, for example, a plurality of servers constituting the data distribution apparatus 100 are configured by a plurality of virtual servers on one or more physical servers 101 as shown in FIG. can do.

At this time, for example, similarly to the configuration of the data distribution apparatus 100 shown in FIG. 1, the data distribution apparatus 100 as a file server has a pointer file 150 and an interface unit 140. From the client terminal 500, the data distribution apparatus By accessing the pointer file 150 on 100 via the network 300, the corresponding original data 400 can be restored on the data distribution device 100 and transmitted locally on the client terminal 500. .

At this time, for access to the same original data 400 (corresponding pointer file 150) on the data distribution apparatus 100 (file server) from a plurality of users, for example, based on the information of the user ID 173 of the identification information 170 Only corresponding users can access (or other users can refer to them while only corresponding users can update), or the pointer file 150 is controlled so that each user has exclusive access. Thus, inconsistency caused by a plurality of users updating the same original data 400 repeatedly is prevented. A part of the functions of the data distribution device 100 such as the pointer file 150 and the interface unit 140 may be provided on each client terminal 500.

<Embodiment 3>
In the first embodiment described above, the distributed processing unit 130 of the data distribution apparatus 100 selects n servers 200 each storing the n distributed data 410 generated from the original data 400 by the distributed data processing unit 110. . As described above, the server 200 is selected by, for example, rotation or random extraction from the servers 200 registered in the server list 133. At this time, a process of inquiring whether each server 200 can store the distributed data 410 (that is, the operating status of the server 200) may be performed.

Here, each server 200 registered in the server list 133 (each server 200 that can be a storage destination of the distributed data 410) is, for example, an operating system including specifications, security, etc., installation location (location such as country and region, , Topographic characteristics, etc.) may be different. That is, there may be a difference in the storage capacity of the distributed data 410 in each server 200. Therefore, it may be impossible to select an appropriate server 200 according to the contents, attributes, and the like of the distributed data 410 with uniform rotation and other selection methods that do not consider such differences.

Therefore, in the present embodiment, for each server 200 that can be a target for storing the distributed data 410, an access right is set according to the storage capability of the distributed data 410, and the access right and the attribute of the distributed data 410 are set. Based on this, the server 200 that stores the distributed data 410 can be determined.

FIG. 8 is a diagram showing an outline of a configuration example of the data distribution management system 1 according to the third embodiment of the present invention. In the example of FIG. 8, the data distribution apparatus 100 includes an access right management server 220 for setting an access right for each server 200 that can be a target for storing the distributed data 410. The access right management server 220 assigns the access right to each server 200 manually or automatically based on the predetermined criteria based on the specifications of each server 200, the operation system including security, and the attribute information such as the installation location. Set. In the example of FIG. 8, the access right setting server 220 is configured as an independent server, but may be configured on the same housing as the data distribution apparatus 100.

FIG. 9 is a diagram showing an example of the contents of the identification information 170 added to the pointer file 150 and the distributed data 410 in the present embodiment. In the example of FIG. 9, attribute information 174 is further added to the identification information 170 in the first embodiment shown in FIG. The attribute information 174 is not particularly limited in format or the like, but includes information for identifying the importance of the corresponding original data 400 (file consisting of the original data 400), the file type, and the like.

When the distribution processing unit 130 of the data distribution apparatus 100 selects a server 200 that is to store the distributed data 410 generated from the original data 400, for example, the server 200 inquires each server 200 about access right information. Is obtained, and based on this and attribute information 174 of the identification information 170 added to the distributed data 410, it is determined whether or not storage of the distributed data 410 is permitted, and storage is permitted. The distributed data 410 is stored in the server 200. Accordingly, it is possible to select the server 200 that should store the target distributed data 410 according to the storage capability (access right) of the server 200, not the selection of the server 200 by simple rotation or the like.

As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

The present invention can be used in a data distribution management system in which one or more data is distributed and stored in different servers.

1 ... Data distribution management system,
DESCRIPTION OF SYMBOLS 100 ... Data distribution apparatus, 110 ... Distributed data processing part, 120 ... Pointer file processing part, 121 ... Identification information generation part, 122 ... ID generation part, 130 ... Distributed processing part, 131 ... Distribution part, 132 ... Collection part, 133 ... server list, 140 ... interface unit, 150 ... pointer file, 160 ... user information, 170 ... identification information, 171 ... original file ID (FID), 172 ... torrent file ID (FID), 173 ... user ID,
200, 200a, b ... server, 210 ... distributed storage unit,
300 ... Network,
400 ... original data, 410 ... distributed data.

Claims

A plurality of information processing devices having a storage device, and one or more distributed data connected to each of the information processing devices via a network and handled collectively in correspondence with the original data; A data distribution management system having a data distribution device for distributed storage in each
The data distribution device includes:
A distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data;
A pointer file processing unit that generates identification information that identifies and identifies the original data, and generates a pointer file that includes the identification information and corresponds to the original data;
A distributed processing unit for transmitting each of the distributed data corresponding to the original data to each of the different information processing devices, to which the identification information corresponding to the original data is added,
Each of the information processing devices
A data distribution management system comprising: a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.
In the data distribution management system according to claim 1,
The distributed processing unit of the data distribution apparatus includes:
All or part of the identification information included in the pointer file designated by the user is designated, and the distributed data corresponding to the designated part of the identification information is held for each information processing apparatus. Broadcast a first message asking whether or not
The distributed storage unit of each information processing apparatus,
A search is performed to determine whether the distributed data including the identification information that matches the specified part of the identification information specified in the first message is stored in its own storage device. Transmitting the distributed data to the data distribution device,
The distributed data processing unit of the data distribution apparatus includes:
A data distribution management system, wherein the corresponding original data is acquired based on the distributed data transmitted from the information processing apparatuses.
In the data distribution management system according to claim 2,
The distributed processing unit of the data distribution apparatus includes:
Specifying all or part of the identification information specified by the user, and broadcasting a second message to the respective information processing devices to limit the use of the corresponding distributed data,
The distributed storage unit of each information processing apparatus,
The information of the specified part of the identification information specified in the second message is registered in a lock list, and the identification that matches the specified part of the identification information specified in the first message When searching for the distributed data including information, if the identification information included in the distributed data includes contents registered in the lock list, the use of the corresponding distributed data is restricted. Distributed data management system.
The data distribution management system according to any one of claims 1 to 3,
The distributed processing unit of the data distribution apparatus includes:
A third message that specifies a value that identifies the user among the identification information specified by the user and inquires of each information processing apparatus whether or not the corresponding identification information is held. Broadcast,
The distributed storage unit of each information processing apparatus,
If the distributed data including the identification information that matches the value specified for the user specified in the third message is stored in the storage device of the third message, and if stored, it corresponds Transmitting the identification information included in the distributed data to the data distribution device;
The pointer file processing unit of the data distribution apparatus is
A data distribution management system which restores the corresponding pointer file based on the identification information transmitted from each information processing apparatus.
The data distribution management system according to any one of claims 1 to 4,
The identification information includes ID information for identifying the entire original data, ID information for identifying the original data for each version when the original data is edited, and a user who created or edited the original data. A data distribution management system comprising ID information for identification.
In the data distribution management system according to any one of claims 1 to 5,
The distributed data processing unit of the data distribution apparatus includes:
A data distribution management system, comprising: generating a plurality of the shared data from the original data by a secret sharing method; and restoring the original data from the plurality of the distributed data by the secret sharing method.
The data distribution management system according to any one of claims 1 to 6,
The distributed storage unit of each information processing apparatus,
When the distributed data corresponding to the original data transmitted from the data distribution device is stored in the storage device, if the past distributed data corresponding to the original data exists, the past distributed data A data distribution management system characterized in that the data is stored after being stored.
In the data distribution management system according to claim 7,
The distributed storage unit of each information processing apparatus,
A data distribution management system, wherein the distributed data corresponding to the original data past a predetermined number of generations is deleted at a predetermined timing.
The data distribution management system according to claim 8,
The distributed processing unit of the data distribution apparatus includes:
Designating information for specifying the original data of the version to be stored designated by the user, and restricting deletion of the distributed data corresponding to the corresponding original data for each information processing apparatus Broadcast 4 messages,
The distributed storage unit of each information processing apparatus,
Registering information that specifies the version of the original data specified in the fourth message in a list, and further deleting the distributed data corresponding to the original data that is past a predetermined number of generations, When the identification information included in the distributed data includes information specifying the original data registered in the list, the data distribution management system is configured to restrict deletion of the corresponding distributed data.
The data distribution management system according to any one of claims 1 to 9,
The data distribution device includes:
A data distribution management system comprising a plurality of file servers or a plurality of virtual file servers.
The data distribution management system according to any one of claims 1 to 10,
The identification information further includes attribute information about the original data,
The distributed processing unit of the data distribution apparatus includes:
The information processing capable of storing the distributed data based on access right information set for each information processing apparatus and the attribute information of the identification information added to the distributed data A data distribution management system characterized by selecting a device.