WO2013065134A1 - Data distribution management system - Google Patents

Data distribution management system Download PDF

Info

Publication number
WO2013065134A1
WO2013065134A1 PCT/JP2011/075211 JP2011075211W WO2013065134A1 WO 2013065134 A1 WO2013065134 A1 WO 2013065134A1 JP 2011075211 W JP2011075211 W JP 2011075211W WO 2013065134 A1 WO2013065134 A1 WO 2013065134A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
distributed
information
identification information
data distribution
Prior art date
Application number
PCT/JP2011/075211
Other languages
French (fr)
Japanese (ja)
Inventor
佐藤 敦
壮一 最首
Original Assignee
株式会社野村総合研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社野村総合研究所 filed Critical 株式会社野村総合研究所
Priority to PCT/JP2011/075211 priority Critical patent/WO2013065134A1/en
Priority to PCT/JP2012/077460 priority patent/WO2013065544A1/en
Priority to JP2013541726A priority patent/JP5667702B2/en
Publication of WO2013065134A1 publication Critical patent/WO2013065134A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6272Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database by registering files or documents with a third party

Definitions

  • the present invention relates to a data storage technique, and more particularly to a technique effective when applied to a data distribution management system that distributes and stores one or more data in different servers.
  • the risk of information leakage due to loss of the terminal due to the so-called thin client that stores data including important data in the terminal in an external data center or server where security measures are taken It is conceivable to reduce.
  • the important data is not stored in an external server or the like as it is, but for example, the so-called secret sharing technique described in Non-Patent Document 1 or the like is used, and the important data alone is meaningless. It has also been proposed to divide into non-critical data (important data cannot be reconstructed / inferred) and to store these non-critical data in a plurality of external servers. Thereby, for example, the risk of information leakage can be reduced even in the case of storage in a virtual data center or virtual server in a cloud computing environment.
  • distributed management information including information on where the data is stored in which server by the information processing apparatus of each user who is the distribution source, a specific management server such as a file server, and the like May be included).
  • Patent Document 1 stores tally folders A, B,... For storing tally files, a restoration destination folder for storing restoration files, and a tally object file by an information management computer.
  • a tally object folder, a tally engine folder containing a restoration engine program and a division engine program, and a tally parameter including information on a decoding boundary, which is a range that can be read by the tally application, are set as tally object files A, B,.
  • the tally file name / storage location and the object information of the restoration destination folder are stored in, the tally file is collected directly based on the tally file storage location and the decoding boundary, the restoration file is generated, and the restoration file is stored and opened.
  • Distributed information file management means for restoring efficiently locate and original data to prevent file is described.
  • an information processing apparatus that is a data distribution source, a specific management server such as a file server, and the like store important data (specifically, Holds distributed management information related to one or more distributed data related to important data), and thus has a problem in terms of security. That is, for example, if a portable terminal that is a data distribution source holds the distributed management information related to important data and is stolen or lost, the distributed management information is viewed by a third party. As a result, information on the location of distributed data related to important data (host name and network address of each server that stores the distributed data, URL (Uniform Resource Locator) etc. information for accessing the distributed data) can be obtained. Have a risk.
  • the distribution destination server may be changed due to a failure in use of the distributed storage destination server or the like.
  • it becomes necessary it becomes necessary to individually rewrite the contents of the distribution management information with the information of the new storage destination server or the like in each information processing apparatus of the data distribution source.
  • the distribution destination server is a virtual server using a cloud computing service, it must be operated in an unknown state when the virtual server is stopped, and the distribution destination virtual server is changed.
  • the user when a user loses a portable terminal or the like that is a distribution source of data, the user tries to access distributed data (original important data with respect to the distributed data) using another information processing apparatus, When accessing distributed data from an information processing device that is not normal at another business location or business trip destination, etc., the user's information processing device is concerned with the target important data (distributed data for important data). There is no distributed management information. For this reason, it is impossible to grasp on which server each distributed data is distributed and stored, and it becomes impossible to access the distributed data.
  • an object of the present invention is to provide distributed storage of data without having distributed management information in an information processing apparatus that is a data distribution source, and without being affected by which server or the like the distributed data is stored.
  • An object of the present invention is to provide a data distribution management system that can be performed.
  • a data distribution management system is connected to a plurality of information processing devices having a storage device and the respective information processing devices via a network, and collectively handles corresponding to the original data.
  • a data distribution management system having a data distribution apparatus that distributes and stores one or more distributed data to be stored in the storage device of the information processing apparatus, and has the following characteristics.
  • the data distribution apparatus generates a distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data, and identification information that can identify and specify the original data.
  • a pointer file processing unit for generating a pointer file including the identification information corresponding to the original data, and each of the distributed data corresponding to the original data to which the identification information corresponding to the original data is added.
  • a distributed processing unit for transmitting to the different information processing apparatuses.
  • Each of the information processing apparatuses includes a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.
  • the information processing apparatus that is the data distribution source does not have the distribution management information, and is not affected by which server or the like the distributed data is stored.
  • data can be distributed and stored.
  • a data distribution management system is a system that distributes and stores a plurality of distributed data that is handled collectively in correspondence with original data such as important data in storage devices such as other data centers and servers.
  • the distributed management information including the information relating to the location of each distributed data stored in which data center or server is not included.
  • the data distribution apparatus that performs distributed storage of data instead of the above distributed management information, the data distribution apparatus that performs distributed storage of data generates and holds identification information for identifying each original data, and header information of each distributed data By adding the identification information to the ID, it is possible to collect necessary distributed data without requiring information relating to the location of the data center or server where each distributed data is stored.
  • one or more distributed data items that are handled in batches corresponding to the original data are collectively acquired in response to a processing request such as one-time storage, browsing, or reference for the target original data. Or one or more data to be processed such as storage, display, etc.
  • a processing request such as one-time storage, browsing, or reference for the target original data.
  • one or more data to be processed such as storage, display, etc.
  • a plurality of pieces of divided data generated by secret sharing processing from important data that is original data are shown as distributed data, but the present invention is not limited to this.
  • a series of related files generated by the business application a series of work files specified by the user, etc. for management data such as projects and projects created by the user, respectively.
  • Such data may be distributed and stored in a server or the like as distributed data.
  • there may be one distributed data for the original data for example, the target original data itself) (use form as remote copy or backup).
  • the data distribution device When the data distribution device collects necessary distributed data from each data center, server, etc., the data distribution device specifies all or part of the identification information related to the original data and sends it to each data center, server, etc. Then, a message for inquiring whether or not the distributed data corresponding to the original data is held is broadcast (or multicast). In response to the message, the data center or server holding the target distributed data responds to the data distribution device with the target distributed data. Necessary distributed data can be collected without requiring management information.
  • each distributed data is acquired by the distribution management information being obtained by a third party when the data distribution device is stolen or lost. It is possible to avoid the risk that the information related to the storage location is known and the distributed data can be accessed. In addition, the storage location of each distributed data can be easily changed without depending on which data center or server stores each distributed data, thereby improving system availability and flexibility. It becomes possible.
  • the data distribution apparatus can restore the identification information of each data. For example, when information identifying a user, such as a user ID, is given by the user, the data distribution apparatus broadcasts (or multicasts) a message for inquiring identification information related to the user. A data center or server having distributed data having target identification information responds to the data distribution apparatus with the target identification information, so that the data distribution apparatus corresponds to each data usable by the user. Identification information can be acquired / restored, and corresponding distributed data can be collected based on this identification information.
  • FIG. 1 is a diagram showing an outline of a configuration example of a data distribution management system according to an embodiment of the present invention.
  • the data distribution management system 1 has a configuration in which a data distribution apparatus 100 and one or more servers 200 are connected to each other via a network 300 such as the Internet and can communicate with each other.
  • a network 300 such as the Internet and can communicate with each other.
  • a configuration having a plurality of data distribution devices 100 may also be possible.
  • the data distribution apparatus 100 is configured by an information processing apparatus such as a PC or a portable terminal.
  • the data distribution apparatus 100 and the pointer file processing unit 120 are implemented by a software program that operates on an operating system (not shown).
  • user information 160 that is data such as a database, a file, and a registry that holds information (for example, account information) related to a user who can use the data distribution management service by the data distribution apparatus 100 or the data distribution management system 1.
  • a pointer file 150 having a function as a pointer that points to the distributed data 410 stored in each server 200 is provided corresponding to each of the plurality of original data 400.
  • the distributed data processing unit 110 performs processing related to the association between the original data 400 and one or more distributed data 410 handled in a lump in correspondence with the original data 400.
  • n pieces of divided data to be distributed data 410 are generated by the (k, n) threshold secret sharing method for the specified original data 400, and conversely, A known secret sharing library that restores the original data 400 by using (k, n) threshold secret sharing method with k or more pieces of shared data 410 as divided data.
  • the distributed data 410 is not limited to data generated from the original data 400 or generated based on the original data 400 as in the present embodiment. It may be a plurality of data. Further, the distributed data 410 may be one (for example, the original data 400 itself).
  • the pointer file processing unit 120 generates a pointer file 150 having a function as a pointer that points to the distributed data 410 corresponding to each of the plurality of original data 400. Further, processing is performed on the original data 400 (or the corresponding distributed data 410) based on an instruction from the user to the pointer file 150 via the interface unit 140 described later.
  • the pointer file 150 has a function of pointing to the original data 400 (and corresponding distributed data 410), but does not have the entity of the original data 400.
  • the contents of the pointer file 150 are as described below. It has identification information that identifies and identifies the corresponding distributed data 410). That is, the pointer file 150 is similar to a so-called shortcut, symbolic link, alias, or the like for the original data 400 (and corresponding distributed data 410). This identification information is also added as header information or the like to each distributed data 410 generated by the distributed data processing unit 110.
  • the pointer file processing unit 120 further includes an identification information generation unit 121 in order to generate this identification information.
  • an ID generation unit 122 is provided to generate various ID values included in the identification information.
  • the ID generation unit 122 includes a library having a known function that can generate a unique ID (universal ID) that does not overlap with a plurality of different data distribution apparatuses 100.
  • the distributed processing unit 130 adds identification information to the distributed data 410 associated with the original data 400 by the distributed data processing unit 110, and distributes and stores the distributed data in each server 200 based on a predetermined rule, and the original data 400 includes a collection unit 132 that collects the distributed data 410 associated with each of the servers 200 from each server 200. Further, it may have a server list 133 including a list of servers 200 that can be storage destinations of the distributed data 410.
  • the distribution unit 131 is generated by, for example, the (k, n) threshold secret sharing method by the distributed data processing unit 110, and n pieces of distributed data to which identification information is added by the pointer file processing unit 120.
  • 410 is distributed and stored in n different servers 200 selected from the server list 133.
  • n servers 200 that store the distributed data 410 are selected from among them by, for example, rotation or random extraction.
  • the collection unit 132 inquires of each server 200 whether or not it has the distributed data 410 associated with the original data 400, and collects the distributed data 410 transmitted from the server 200 that has it. .
  • the distributed data processing unit 110 collects k or more pieces of distributed data 410 necessary for restoring the original data 400 by the (k, n) threshold secret sharing method.
  • a message including all or part of the identification information included in the pointer file 150 corresponding to the target original data 400 is broadcast to all the servers 200 (or listed in the server list 133). Multicast to each of the servers 200 being configured.
  • a broadcast (multicast) protocol a known technique can be used as appropriate.
  • the interface unit 140 has an input / output function such as a user interface such as a screen display in the data distribution apparatus 100.
  • the user can use the functions of the data distribution management system 1 by using, for example, a file management screen or application provided in a general OS.
  • the original data 400 is moved to a specific folder or the like by a simple operation such as drag and drop.
  • the distributed data processing unit 110 generates the distributed data 410
  • the distributed processing unit 130 stores the distributed data in the servers 200 in a distributed manner.
  • the pointer file processing unit 120 generates a pointer file 150 corresponding to the original data 400 and replaces the original data 400 such as a specific folder. Thereafter, access such as reference to the original data 400 from the user is performed on the pointer file 150 arranged in a specific folder or the like.
  • the pointer file processing unit 120 causes the distributed data 410 associated with the original data 400 specified by the pointer file 150 to be Collected from each server 200 by the distributed processing unit 130. Further, when necessary as in the present embodiment, the original data 400 is restored from the collected distributed data 410 by the distributed data processing unit 110. Thereafter, the original data 400 or the distributed data 410 is displayed by a related application program or the like. Thereby, it is possible to provide the user with an interface equivalent to processing such as storage / reference for the original data 400, and to conceal the processing related to the distributed data 410.
  • the data distribution device 100 itself has a user interface by the interface unit 140 and further performs a series of processes related to distributed storage.
  • the functions may be separated.
  • the data distribution apparatus 100 is configured as a file server, and a client terminal having each unit other than the interface unit 140 and the pointer file 150 and having the pointer file 150 generated by the interface unit 140 and the data distribution apparatus 100 is a data distribution apparatus. It is also possible to adopt a configuration in which a plurality of connections are made to 100.
  • the server 200 is an information processing apparatus having a storage device such as an HDD (Hard Disk Disk Drive) (not shown) that can store the distributed data 410 transmitted from the data distribution apparatus 100, such as a file server or a storage server. Consists of. Moreover, the data center which has these information processing apparatuses may be sufficient. Further, it may be a virtual server or a virtual data center by a cloud computing service.
  • a storage device such as an HDD (Hard Disk Disk Drive) (not shown) that can store the distributed data 410 transmitted from the data distribution apparatus 100, such as a file server or a storage server. Consists of.
  • the data center which has these information processing apparatuses may be sufficient. Further, it may be a virtual server or a virtual data center by a cloud computing service.
  • the server 200 includes, for example, a distributed storage unit 210 that is implemented by a software program that runs on an OS (not shown).
  • the distributed storage unit 210 stores the distributed data 410 transmitted from the data distribution apparatus 100 in a storage device. Further, in response to a broadcast (or multicast) message from the data distribution apparatus 100, when the distributed data 410 including the identification information matching the identification information included in the message is searched for and the corresponding distributed data 410 is included. The identification information contained in the distributed data 410 or its header is returned to the data distribution apparatus 100.
  • FIG. 2 is a diagram showing an example of the contents of identification information generated by the identification information generation unit 121 of the pointer file processing unit 120 and added to the pointer file 150 and the distributed data 410.
  • the identification information 170 includes information such as an original file ID (FID) 171, a current file ID (FID) 172, and a user ID 173, for example.
  • the original FID 171 is an ID for uniquely identifying the entire original data 400 (a file made up of the original data 400) including each version (generation).
  • the original FID 171 is used when the original data 400 is first distributed and stored, that is, when the distributed data 410 is first generated from the original data 400 and distributed and stored in each server 200. Assigned to identify data 400 and corresponding distributed data 410.
  • the current FID 172 is an ID for uniquely identifying each version (generation) of the original data 400 (a file including the original data 400).
  • the current FID 172 is an ID assigned to the latest version (generation) of original data 400 when the original data 400 is first distributedly stored and then edited or updated. . That is, initially, the value of the current FID 172 is the same as the value of the original FID 171, and thereafter, the distributed data 410 necessary for editing the original data 400 is collected, and the latest original data 400 after editing is again collected.
  • the ID is assigned every time the distributed data 410 is associated with the distributed data 410 and distributedly stored in each server 200. It is assumed that the value of original FID 171 is not updated as it was initially assigned.
  • the current FID 172 is not only an ID for specifying the latest original data 400 and the corresponding distributed data 410, but also has a role as version information of the original data 400. That is, when the distributed storage unit 210 of each server 200 stores the distributed data 410 for the latest original data 400 after editing, the distributed data 410 for the previous version of the original data 400 (the latest is a header or the like) (The current FID 172 of the identification information 170 included in is different) is left as a history. As a result, each server 200 stores the distributed data 410 corresponding to a plurality of versions of the original data 400, so that the version of the original data 400 designated by the user and the corresponding distributed data 410 can be obtained.
  • a plurality of distributed data 410 having different current FIDs 172 but the same original FIDs 171 can be determined to be of different versions of the same original data 400.
  • the user ID 173 is an ID that identifies a user corresponding to the identification information 170, that is, a user who created or edited the original data 400 corresponding to the identification information 170. This ID information can be associated with the ID information of each user registered in the user information 160, for example.
  • Each ID of the identification information 170 needs to be a unique ID that does not overlap in the data distribution management system 1. Accordingly, these IDs can be IDs (universal IDs) generated by the ID generation unit 122 of the pointer file processing unit 120, for example.
  • IDs universal IDs
  • the user ID 173 for example, the user ID in the account information of each user stored in the user information 160 may be used, and to this, an organization or group to which a user such as a department or a company belongs, and data distribution By adding information for identifying a contract unit of the data management service provided by the management system 1, the ID may be unique within the data distribution management system 1.
  • FIG. 3 is a diagram showing an outline of an example of processing when the original data 400 and a plurality of distributed data 410 are associated and stored in a distributed manner.
  • the distributed data processing unit 110 when receiving an instruction to save the original data 400 from the user via the interface unit 140, first, the distributed data processing unit 110 generates one or more distributed data 410 from the original data 400 (S01). ).
  • the distributed data processing unit 110 when receiving an instruction to save the original data 400 from the user via the interface unit 140, first, the distributed data processing unit 110 generates one or more distributed data 410 from the original data 400 (S01). ).
  • n pieces of distributed data 410 that cannot be restored without collecting k pieces or more from the original data 400 by (k, n) threshold secret sharing method are generated. To do.
  • the original data 400 and the n distributed data 410 are associated with each other.
  • the pointer file processing unit 120 generates identification information 170 for the original data 400 (S02), and further generates a pointer file 150 including the identification information 170 (S03).
  • the ID generation unit 122 or the like generates information of each ID in the identification information 170
  • the identification information generation unit 121 generates the identification information 170 including these IDs.
  • the pointer file processing unit 120 generates a pointer file 150 including the contents of the identification information 170. At this time, for example, by making the file name (excluding the extension) of the pointer file 150 the same as the original data 400, the user can easily identify the pointer file 150 corresponding to the original data 400.
  • each distributed data is processed by the distribution unit 131 of the distributed processing unit 130.
  • 410 is transmitted to a plurality of different servers 200 (server A (200a) and server B (200b) in the example of FIG. 3) for distributed storage (S04).
  • the plurality of servers 200 are selected from the servers 200 registered in the server list 133 by rotation or random extraction, for example.
  • n servers 200 that store the n distributed data 410 generated by the distributed data processing unit 110 are selected.
  • a process of inquiring each server 200 as to whether or not the distributed data 410 can be stored may be performed.
  • the distributed storage unit 210 stores the distributed data 410 in the storage device (S05). At this time, if the distributed data 410 corresponding to the past version of the original data 400 exists, the distributed data 410 may be left and stored. In this case, the distributed data 410 corresponding to the past version of the original data 400 is further deleted and organized (S06), and a series of processing results are returned to the data distribution apparatus 100.
  • step S06 the distributed storage unit 210 uses the distributed data 410 having the identification information 170 including the original FID 171 identical to the original FID 171 of the identification information 170 included in the header of the latest distributed data 410 to be newly stored (that is, The distributed data 410) corresponding to different versions of the same original data 400 is searched. If the number of retrieved distributed data 410 is greater than a predetermined number (number of storable generations), the oldest distributed data 410 is deleted in order from the oldest distributed data 410 until the predetermined number of generations are reached. In addition, the new and old of the distributed data 410 can be grasped by, for example, a time stamp attached to a file including the distributed data 410.
  • the deletion processing of the old distributed data 410 in step S06 may be performed each time the distributed data 410 is stored in step S05, or is periodically started at each server 200 at a predetermined time. Alternatively, all distributed data 410 may be collectively processed by a batch program or the like.
  • a specific version of the distributed data 410 (that is, the distributed data 410 having the identification information 170 including the specific current FID 172) may be locked so as not to be deleted by a procedure similar to the ID locking procedure described later. Is possible.
  • the data distribution apparatus 100 determines whether or not the distributed storage processing has been normally completed by the distribution unit 131 (S07). For example, in the present embodiment, it is determined whether n pieces of distributed data 410 have been normally stored in n servers 200. If there is distributed data 410 that could not be stored normally, another server 200 may be selected and the processes in steps S04 to S06 may be retried until all the distributed data 410 can be stored. Further, when there is no longer a server 200 that can be stored, the distributed storage process may be terminated as an error. At this time, the processing already performed may be rolled back.
  • the data distribution apparatus 100 deletes the original data 400 and the generated distributed data 410 held on the data distribution apparatus 100 (S08), and ends the process. By deleting these data on the data distribution apparatus 100, it is possible to avoid the leakage of the original data 400 (and corresponding distribution data 410) for theft or loss of the data distribution apparatus 100 itself. It becomes.
  • the pointer file 150 held on the data distribution apparatus 100 has only file ID information for identifying the original data 400 (and corresponding distribution data 410), and information and data related to the data content itself are included. It does not have information related to the server 200 that is actually stored. Therefore, even if a third party knows the contents of the pointer file 150, the distributed data 410 cannot be collected, and the original data 400 cannot be restored (information related to the original data 400 can be obtained).
  • the original data 400 and the distributed data 410 are deleted from the data distribution apparatus 100 in consideration of the security viewpoint as described above.
  • the backup of the original data 400 on the data distribution apparatus 100 is used.
  • the original data 400 may be left without being deleted.
  • FIG. 4 is a diagram showing an outline of an example of processing when collecting a plurality of distributed data 410 and obtaining original data 400 from these.
  • the pointer file processing unit 120 When an instruction to refer to the original data 400 (including reference for editing) is received by an operation on the pointer file 150 by the user via the interface unit 140, first, the pointer file processing unit 120 The contents of the identification information 170 included in the pointer file 150 are acquired (S11). Next, based on the information of the current FID 172 in the identification information 170, the collection unit 132 of the distributed processing unit 130 inquires each server 200 whether the corresponding distributed data 410 is held (S12).
  • an inquiry message of the distributed data 410 including information on the current FID 172 is broadcast to each server 200.
  • the load on the network 300 may be reduced by multicasting the servers 200 listed in the server list 133.
  • Each server 200 that has received the inquiry broadcast message acquires the information of the current FID 172 included in the message by the distributed storage unit 210, and searches the distributed data 410 corresponding to the current FID 172 (S13). Specifically, the distributed data 410 having the identification information 170 including the current FID 172 that matches the current FID 172 included in the message in the header or the like is searched. When the corresponding distributed data 410 is not stored (for example, the server B (200b) in FIG. 4), a response to that effect is sent to the data distribution apparatus 100.
  • the corresponding distributed data 410 is stored (for example, server A (200a) in FIG. 4), it is confirmed whether or not the identification information 170 included in the header of the distributed data 410 is locked. (S14). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172 or user ID 173) in the target identification information 170 is registered in a lock list (not shown) held in the server 200. If registered, since the use of the target distributed data 410 is locked, a response to that effect is sent to the data distribution apparatus 100. If not registered, the target distributed data 410 is transmitted to the data distribution apparatus 100 (S15). Registration of IDs in the lock list will be described later.
  • the data distribution apparatus 100 can acquire the original data 400 from the collected distributed data 410 (distributed data 410 transmitted from each server 200) by the collection unit 132. It is determined whether or not there is (S16). For example, in this embodiment, it is determined whether or not k or more pieces of distributed data 410 that can restore the original data 400 have been collected. When the original data 400 cannot be acquired (restored), that is, when there are less than k pieces of distributed data 410 that can be collected in the present embodiment, the acquisition process of the original data 400 may be terminated as an error. .
  • the distributed data processing unit 110 acquires (restores) the original data 400 from the collected distributed data 410 (S17), and the process ends.
  • the original data 400 is restored from the collected k or more pieces of distributed data 410 by the (k, n) threshold secret sharing method.
  • an application program associated therewith may be activated to display the restored original data 400.
  • the user performs the same processing as that for the original data 400 on the pointer file 150 via the interface unit 140, whereby the data distribution apparatus 100 collects the necessary distributed data 410 and collects the original data 400. Since the distributed data 410 is distributed and stored in the plurality of servers 200, the original data 400 (or the corresponding distributed data 410) can be seamlessly obtained. Can be accessed. The data distribution apparatus 100 can also collect the necessary distributed data 410 without retaining information on which server 200 each distributed data 410 is stored in.
  • each server 200 is inquired as to whether the distributed data 410 is held based on the information of the current FID 172 in the identification information 170.
  • the inquiry may be made using the ID information.
  • current FID 172 current FID 172
  • the distributed data 410 corresponding to the original data 400 created and edited by the corresponding user can be collected.
  • the pointer file 150 having the identification information 170 including the file ID of the original data 400 (and the corresponding distributed data 410) and the user ID information exists on the data distribution apparatus 100, it is referred to by a third party. obtain. Therefore, in the present embodiment, when the data distribution apparatus 100 is stolen or lost, the risk that the distributed data 410 is acquired from each server 200 based on the information of each ID included in the identification information 170 by a third party. Therefore, it is possible to restrict the use of the corresponding distributed data 410 by locking each ID in the identification information 170.
  • FIG. 5 is a diagram showing an outline of an example of processing when the use of the distributed data 410 is locked and acquisition of the original data 400 and the corresponding distributed data 410 is restricted.
  • the user specifies an ID value to be locked via the interface unit 140 (S21). Specifically, a value is specified for at least one of the original FID 171, the current FID 172, and the user ID 173 in the identification information 170.
  • the distribution unit 131 of the distribution processing unit 130 instructs each server 200 to lock the ID (S22). Specifically, a lock instruction message including a lock target ID value is broadcast (or multicast) to each server 200.
  • Each server 200 that has received the lock instruction broadcast message registers the ID information included in the message in a lock list (not shown) or the like (S23). After that, the success or failure of registration is returned to the data distribution apparatus 100.
  • the data distribution apparatus 100 determines whether or not the registration of the ID to the lock list has been normally completed in all the target servers 200 (S24). . If there is a server 200 that has failed to register or a server 200 that has failed to receive a response due to timeout, the ID lock processing may be terminated as an error. At this time, the processing already performed may be rolled back.
  • the ID lock process is terminated.
  • the unlocking of the ID can also be realized by deleting the registration of the target ID from the lock list in each server 200 by the same process as described above.
  • FIG. 6 is a diagram showing an outline of an example of processing when restoring the pointer file 150 when the data distribution apparatus 100 does not exist.
  • the original data distribution device 100 the pointer file 150 corresponding to the original data 400 is stored.
  • the pointer file 150 (and the identification information 170 included therein) is restored to restore the original data 400 or the corresponding distributed data 410 can be accessed.
  • the user designates the information of the user ID 173 in the identification information 170, which is key information for restoring the pointer file 150, via the interface unit 140 (S31).
  • the distribution unit 131 of the distribution processing unit 130 inquires each server 200 about the identification information 170 (S32). Specifically, the inquiry message of the identification information 170 including the user ID 173 having a designated value is broadcast (or multicast) to each server 200.
  • Each server 200 that has received the broadcast message for inquiry about the identification information 170 acquires the information of the user ID 173 included in the message, and searches for the identification information 170 that matches the user ID 173 (S33). Specifically, the identification information 170 including the user ID 173 that matches the value of the user ID 173 included in the message is searched from the header of each distributed data 410 stored. If there is no corresponding identification information 170 (distributed data 410 having this in the header or the like) (for example, server A (200a) in FIG. 6), a response to that effect is sent to the data distribution apparatus 100.
  • the corresponding identification information 170 distributed data 410 having this in the header or the like
  • server B 200b
  • the corresponding identification information 170 is included (for example, server B (200b) in FIG. 6)
  • S34 it is confirmed whether or not the value of each ID (original FID 171, current FID 172, and user ID 173) in each corresponding identification information 170 is registered in the lock list of the server 200. If one or more of the corresponding identification information 170 is not locked, this is transmitted to the data distribution apparatus 100. On the other hand, if all the identification information 170 is locked, the corresponding identification information 170 is transmitted. A response indicating that there is no data is returned to the data distribution apparatus 100 (S35).
  • the data distribution apparatus 100 restores the pointer file 150 including the identification information 170 by the collection unit 132 based on the collected identification information 170 (S36), and ends the processing. .
  • the data distribution apparatus 100 restores the pointer file 150 including the identification information 170 by the collection unit 132 based on the collected identification information 170 (S36), and ends the processing.
  • S36 collected identification information 170
  • the same file name as the file name of the original data 400 can be set. Can not. Therefore, a dummy file name is automatically set, or the identification information 170 holds not only the ID information as shown in FIG. 2 but also the file name information of the original data 400 for each current FID 172.
  • the file name of the pointer file 150 may be set based on this information.
  • the data distribution apparatus 100 does not have distribution management information including information related to the storage destination of the distributed data 410, Further, the original data 400 can be distributed and stored without being affected by which server 200 the distributed data 410 is stored.
  • the data distribution apparatus 100 collects necessary distributed data 410 from each server 200
  • the data distribution apparatus 100 designates all or a part of the identification information 170 related to the original data 400 to each server 200.
  • a message for inquiring whether or not the distributed data 410 related to the original data 400 is held is broadcast.
  • the server 200 holding the target distributed data 410 returns the target distributed data 410 to the data distribution apparatus 100, so that the data distribution apparatus 100 stores the distributed data 410 in the storage location. Necessary distributed data 410 can be collected without requiring such distributed management information.
  • the information regarding the storage location of each distributed data 410 is known by the distribution management information being acquired by a third party, It is possible to avoid the risk that the distributed data 410 can be accessed. Further, it is possible to easily change the server 200 that stores each distributed data 410 without depending on which server 200 stores each distributed data 410.
  • the data distribution apparatus 100 stores the identification information 170 corresponding to each original data 400 and the pointer file 150 having the identification information 170. It can be restored. For example, when the information of the user ID 173 is given by the user, the data distribution apparatus 100 broadcasts a message inquiring whether or not the identification information 170 including the user ID 173 is included. When the server 200 having the distributed data 410 including the target identification information 170 in the header or the like responds to the data distribution device 100 with the target identification information 170, the data distribution device 100 can be used by the user. The identification information 170 corresponding to the original data 400 and the pointer file 150 including the identification information 170 can be acquired and restored.
  • the present invention can be used in a data distribution management system in which one or more data is distributed and stored in different servers.
  • Data distribution management system DESCRIPTION OF SYMBOLS 100 ... Data distribution apparatus, 110 ... Distributed data processing part, 120 ... Pointer file processing part, 121 ... Identification information generation part, 122 ... ID generation part, 130 ... Distributed processing part, 131 ... Distribution part, 132 ... Collection part, 133 ... server list, 140 ... interface unit, 150 ... pointer file, 160 ... user information, 170 ... identification information, 171 ... original file ID (FID), 172 ... torrent file ID (FID), 173 ... user ID, 200, 200a, b ... server, 210 ... distributed storage unit, 300 ... Network, 400 ... original data, 410 ... distributed data.
  • FID original file ID
  • FIG. server 210 ... distributed storage unit, 300 ... Network, 400 ... original data, 410 ... distributed data.

Abstract

A data distribution management system capable of distributed storage of data without being affected by which server, etc., the distributed data is stored in, and that does not have distribution management information in an information processing device being the data distribution source. A typical embodiment of this invention comprises a data distribution device and an information processing device having a distributed storage unit that stores, in a storage device, distributed data sent from the data distribution device. The data distribution device has: a distribution data processing unit that performs processing related to associating source data and at least one distributed data; a pointer file processing unit that identifies source data and generates specifiable identification information, and generates a pointer file including identification information corresponding to the source data; and a distribution processing unit that sends each distributed data, corresponding to source data and each having identification information corresponding to the source data added thereto, to different information processing devices.

Description

データ分散管理システムData distribution management system
 本発明は、データの保管技術に関し、特に、1つ以上のデータを異なるサーバ等に分散保管するデータ分散管理システムに適用して有効な技術に関するものである。 The present invention relates to a data storage technique, and more particularly to a technique effective when applied to a data distribution management system that distributes and stores one or more data in different servers.
 近年では、情報セキュリティの観点から、ユーザが利用するPC(Personal Computer)等の情報処理装置において保持や処理されるファイル等のデータの取り扱いが重要視されている。特に、ノート型PCに加えて、ビジネス上での利用が拡がりつつあるいわゆるスマートフォンやタブレット型PCなどの携帯型端末では、これらの端末自体の盗難や紛失等に伴う情報漏洩のリスクを考慮する必要がある。 In recent years, from the viewpoint of information security, the handling of data such as files held and processed in information processing apparatuses such as PCs (Personal Computers) used by users has been regarded as important. In particular, in addition to notebook PCs, portable terminals such as so-called smartphones and tablet PCs that are increasingly used in business need to consider the risk of information leakage due to theft or loss of these terminals themselves. There is.
 これに対して、端末内の重要データを含むデータを、セキュリティ対策が施された外部のデータセンターやサーバ等に保管するようないわゆるシンクライアント化等により、端末の紛失等に伴う情報漏洩のリスクを低減することが考えられる。このとき、重要データをそのまま外部のサーバ等に保管するのではなく、例えば、非特許文献1等に記載されているようないわゆる秘密分散の技術を利用して、重要データをそれだけでは意味のない(重要データを復元・推測できない)非重要データに分割し、これら非重要データを外部の複数のサーバ等に分散保管するようにすることも提案されている。これにより、例えば、クラウドコンピューティング環境における仮想データセンターや仮想サーバなどに保管するような場合においても情報漏洩のリスクを低減させることが可能である。 On the other hand, the risk of information leakage due to loss of the terminal due to the so-called thin client that stores data including important data in the terminal in an external data center or server where security measures are taken It is conceivable to reduce. At this time, the important data is not stored in an external server or the like as it is, but for example, the so-called secret sharing technique described in Non-Patent Document 1 or the like is used, and the important data alone is meaningless. It has also been proposed to divide into non-critical data (important data cannot be reconstructed / inferred) and to store these non-critical data in a plurality of external servers. Thereby, for example, the risk of information leakage can be reduced even in the case of storage in a virtual data center or virtual server in a cloud computing environment.
 また、秘密分散の技術により重要データを複数のデータに分割した場合、分割データの一部が滅失した場合でも、所定の個数以上の分割データを集めることができれば元の重要データを復元できることから、データの可用性を向上させることもできる。例えば、いわゆる(k,n)閾値型の秘密分散により、重要データからn個の分割データを生成した場合、k個以上の分割データを集めることができれば重要データを復元することができる。換言すれば、(n-k)個までの分割データの滅失には耐えることが可能である。このような可用性の高さを利用して、分割データを遠隔地の複数の拠点に分散保管することで元の重要データのバックアップとして利用するということも検討されている。 In addition, when important data is divided into a plurality of data by secret sharing technology, even if a part of the divided data is lost, the original important data can be restored if it can collect a predetermined number of pieces of divided data, Data availability can also be improved. For example, when n pieces of divided data are generated from important data by so-called (k, n) threshold type secret sharing, the important data can be restored if k or more pieces of divided data can be collected. In other words, it is possible to withstand the loss of up to (n−k) pieces of divided data. Utilizing such high availability, it is also considered that the divided data is distributed and stored in a plurality of remote locations to be used as a backup of the original important data.
 このように、例えば秘密分散によって生成された分割データなど、一括して取り扱われる複数のデータを他の複数のサーバ等にセキュリティの観点やバックアップの観点等から分散保管する場合、通常は、データの分散元である各ユーザの情報処理装置や、ファイルサーバなどの特定の管理サーバなどが、どのデータをどのサーバ等に保管したかという所在の情報を含む管理情報(以下では「分散管理情報」と記載する場合がある)を保持する。各サーバ等に分散保管された分散データを収集する際には、この分散管理情報を参照することで、必要な分散データがどのサーバ等に保管されているかを特定し、直接対象のサーバ等にアクセスして必要な分散データを収集する。 In this way, for example, when a plurality of data that are handled in a batch, such as divided data generated by secret sharing, are distributed and stored in other servers from the viewpoint of security, backup, etc. Management information (hereinafter referred to as “distributed management information”) including information on where the data is stored in which server by the information processing apparatus of each user who is the distribution source, a specific management server such as a file server, and the like May be included). When collecting distributed data stored in a distributed manner on each server, etc., by referring to this distributed management information, it is possible to identify which server is storing the necessary distributed data, and directly to the target server. Access and collect the necessary distributed data.
 例えば、特開2007-213405号公報(特許文献1)には、情報管理コンピュータで、割符ファイルを納める割符フォルダA、B、・・と、復元ファイルを納める復元先フォルダと、割符オブジェクトファイルを納める割符オブジェクトフォルダと、復元エンジンプログラムと分割エンジンプログラムを納めた割符エンジンフォルダを備え、割符アプリケーションにそれが読込める範囲であるデコード境界の情報を含む割符パラメータを、割符オブジェクトファイルA、B、・・に割符ファイル名称・格納位置と復元先フォルダのオブジェクト情報を納め、割符ファイルの格納位置とデコード境界に基づいて割符ファイルを直接収集して復元ファイルを生成し、復元先フォルダに格納してオープンすることで、秘密分散法による分散ファイルを効率的に探し出して元データを復元する分散情報ファイル管理手段が記載されている。 For example, Japanese Patent Laid-Open No. 2007-213405 (Patent Document 1) stores tally folders A, B,... For storing tally files, a restoration destination folder for storing restoration files, and a tally object file by an information management computer. A tally object folder, a tally engine folder containing a restoration engine program and a division engine program, and a tally parameter including information on a decoding boundary, which is a range that can be read by the tally application, are set as tally object files A, B,. The tally file name / storage location and the object information of the restoration destination folder are stored in, the tally file is collected directly based on the tally file storage location and the decoding boundary, the restoration file is generated, and the restoration file is stored and opened. By the secret sharing method Distributed information file management means for restoring efficiently locate and original data to prevent file is described.
、特開2007-213405号公報JP 2007-213405 A
 しかしながら、特許文献1などに記載されたような、従来のデータの分散保管の手法では、データの分散元の情報処理装置や、ファイルサーバ等の特定の管理サーバなどが、重要データ(具体的には、重要データに関連する1つ以上の分散データ)に係る分散管理情報を保持することから、セキュリティの観点で課題を有する。すなわち、例えばデータの分散元である携帯型端末等が、重要データに係る分散管理情報を保持している状態で盗難や紛失等にあった場合、第三者に分散管理情報が閲覧されてしまうことで、重要データに関連する分散データの所在に関する情報(分散データを保管する各サーバ等のホスト名やネットワークアドレス、URL(Uniform Resource Locator)等、分散データにアクセスするための情報)を得られてしまうリスクを有する。 However, in the conventional method of distributed storage of data as described in Patent Document 1 and the like, an information processing apparatus that is a data distribution source, a specific management server such as a file server, and the like store important data (specifically, Holds distributed management information related to one or more distributed data related to important data), and thus has a problem in terms of security. That is, for example, if a portable terminal that is a data distribution source holds the distributed management information related to important data and is stolen or lost, the distributed management information is viewed by a third party. As a result, information on the location of distributed data related to important data (host name and network address of each server that stores the distributed data, URL (Uniform Resource Locator) etc. information for accessing the distributed data) can be obtained. Have a risk.
 また、データの分散元の情報処理装置において分散管理情報を保持する場合、例えば、分散保管する先のサーバ等が障害で使用不可になった等の理由で分散先のサーバ等を変更することが必要となった場合に、データの分散元の各情報処理装置において、新たな保管先のサーバ等の情報により、個別に分散管理情報の内容をそれぞれ書き換える必要が生じる。例えば、分散先のサーバ等を、クラウドコンピューティングサービスによる仮想サーバとするような場合には、仮想サーバがいつ停止されるか不明な状態で運用せざるを得ず、分散先の仮想サーバを変更する都度、分散元の各情報処理端末において分散管理情報の内容を書き換えることは運用負荷が高くなってしまう。 In addition, when the distribution management information is held in the data distribution source information processing apparatus, for example, the distribution destination server may be changed due to a failure in use of the distributed storage destination server or the like. When it becomes necessary, it becomes necessary to individually rewrite the contents of the distribution management information with the information of the new storage destination server or the like in each information processing apparatus of the data distribution source. For example, when the distribution destination server is a virtual server using a cloud computing service, it must be operated in an unknown state when the virtual server is stopped, and the distribution destination virtual server is changed. Each time the information is distributed, rewriting the contents of the distribution management information in each information processing terminal as a distribution source increases the operational load.
 また、例えばデータの分散元である携帯型端末等をユーザが紛失等したために、他の情報処理装置を利用して分散データ(分散データに対する元の重要データ)にアクセスしようとする場合や、ユーザが別な事業所や出張先などで通常とは異なる情報処理装置から分散データにアクセスしようとする場合などでは、ユーザの情報処理装置上には対象の重要データ(重要データに対する分散データ)に係る分散管理情報がないことになる。このため、各分散データがどのサーバ等に分散保管されているかを把握することができず、分散データにアクセスすることができなくなってしまい、柔軟性を欠く。 In addition, for example, when a user loses a portable terminal or the like that is a distribution source of data, the user tries to access distributed data (original important data with respect to the distributed data) using another information processing apparatus, When accessing distributed data from an information processing device that is not normal at another business location or business trip destination, etc., the user's information processing device is concerned with the target important data (distributed data for important data). There is no distributed management information. For this reason, it is impossible to grasp on which server each distributed data is distributed and stored, and it becomes impossible to access the distributed data.
 そこで本発明の目的は、データの分散元である情報処理装置に分散管理情報を有さず、また、分散データがいずれのサーバ等に保管されているかに影響を受けずにデータの分散保管を行うことを可能とするデータ分散管理システムを提供することにある。本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 Therefore, an object of the present invention is to provide distributed storage of data without having distributed management information in an information processing apparatus that is a data distribution source, and without being affected by which server or the like the distributed data is stored. An object of the present invention is to provide a data distribution management system that can be performed. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.
 本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.
 本発明の代表的な実施の形態によるデータ分散管理システムは、記憶装置を有する複数の情報処理装置と、前記各情報処理装置とネットワークを介して接続され、元データに対応して一括して取り扱われる1つ以上の分散データを前記情報処理装置の前記記憶装置にそれぞれ分散保管するデータ分散装置とを有するデータ分散管理システムであって、以下の特徴を有するものである。 A data distribution management system according to a representative embodiment of the present invention is connected to a plurality of information processing devices having a storage device and the respective information processing devices via a network, and collectively handles corresponding to the original data. A data distribution management system having a data distribution apparatus that distributes and stores one or more distributed data to be stored in the storage device of the information processing apparatus, and has the following characteristics.
 すなわち、前記データ分散装置は、前記元データと1つ以上の前記分散データとの対応付けに係る処理を行う分散データ処理部と、前記元データを識別して特定可能とする識別情報を生成し、前記元データに対応する、前記識別情報を含むポインタファイルを生成するポインタファイル処理部と、前記元データに対応する前記識別情報がそれぞれ付加された、前記元データに対応する前記各分散データを、それぞれ異なる前記情報処理装置に送信する分散処理部とを有することを特徴とする。また、前記各情報処理装置は、前記データ分散装置から送信された前記分散データを、前記記憶装置に格納する分散保管部を有することを特徴とする。 That is, the data distribution apparatus generates a distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data, and identification information that can identify and specify the original data. A pointer file processing unit for generating a pointer file including the identification information corresponding to the original data, and each of the distributed data corresponding to the original data to which the identification information corresponding to the original data is added. And a distributed processing unit for transmitting to the different information processing apparatuses. Each of the information processing apparatuses includes a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.
 本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.
 本発明の代表的な実施の形態によれば、データの分散元である情報処理装置に分散管理情報を有さず、また、分散データがいずれのサーバ等に保管されているかに影響を受けずにデータの分散保管を行うことが可能となる。 According to the representative embodiment of the present invention, the information processing apparatus that is the data distribution source does not have the distribution management information, and is not affected by which server or the like the distributed data is stored. In addition, data can be distributed and stored.
本発明の一実施の形態であるデータ分散管理システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the data distribution management system which is one embodiment of this invention. 本発明の一実施の形態におけるポインタファイルおよび分散データに付加される識別情報の内容について例を示した図である。It is the figure which showed the example about the content of the identification information added to the pointer file and distributed data in one embodiment of this invention. 本発明の一実施の形態における元データと複数の分散データを対応付けしてこれらを分散保管する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of matching the original data and several shared data in one embodiment of this invention, and carrying out the distributed storage of these. 本発明の一実施の形態における複数の分散データを収集して、これらから元データを得る際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of collecting the some distributed data in one embodiment of this invention, and obtaining original data from these. 本発明の一実施の形態における分散データの使用をロックして元データおよび対応する分散データの取得を制限する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of locking use of the distributed data in one embodiment of this invention, and restrict | limiting acquisition of original data and corresponding distributed data. 本発明の一実施の形態におけるデータ分散装置上にポインタファイルを有さない場合にこれを復元する際の処理の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the process at the time of restoring this, when not having a pointer file on the data distribution apparatus in one embodiment of this invention.
 以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.
 本発明の一実施の形態であるデータ分散管理システムは、重要データなどの元データに対応して一括して取り扱われる複数の分散データを他のデータセンターやサーバ等の記憶装置に分散保管するシステムであり、各分散データがいずれのデータセンターやサーバ等に保管されているかという所在に係る情報を含む分散管理情報を有さないものである。本実施の形態では、上記のような分散管理情報の代わりに、データの分散保管を行うデータ分散装置が、各元データを識別する識別情報を生成して保持するとともに、各分散データのヘッダ情報に当該識別情報を付加することで、各分散データが保管されているデータセンターやサーバ等の所在に係る情報を要さずに、必要な分散データの収集を可能とするものである。 A data distribution management system according to an embodiment of the present invention is a system that distributes and stores a plurality of distributed data that is handled collectively in correspondence with original data such as important data in storage devices such as other data centers and servers. The distributed management information including the information relating to the location of each distributed data stored in which data center or server is not included. In the present embodiment, instead of the above distributed management information, the data distribution apparatus that performs distributed storage of data generates and holds identification information for identifying each original data, and header information of each distributed data By adding the identification information to the ID, it is possible to collect necessary distributed data without requiring information relating to the location of the data center or server where each distributed data is stored.
 ここで、元データに対応して一括して取り扱われる1つ以上の分散データとは、対象の元データに対するユーザからの一回の保存や閲覧・参照等の処理要求に対して、まとめて取得や保存、表示などの処理が行われる1つ以上のデータを指す。本実施の形態では、例えば、元データである重要データから秘密分散処理によって生成された複数の分割データをそれぞれ分散データとする場合の例を示しているが、これに限るものではない。 Here, one or more distributed data items that are handled in batches corresponding to the original data are collectively acquired in response to a processing request such as one-time storage, browsing, or reference for the target original data. Or one or more data to be processed such as storage, display, etc. In the present embodiment, for example, a plurality of pieces of divided data generated by secret sharing processing from important data that is original data are shown as distributed data, but the present invention is not limited to this.
 例えば、業務アプリケーション等において、ユーザにより作成されたプロジェクトや案件等の管理データに対し、当該業務アプリケーションにより生成された一連の関連ファイル群や、ユーザにより指定された一連の作業ファイル群等を、それぞれ分散データとしてサーバ等に分散保管するようなものであってもよい。なお、元データに対する分散データが1つ(例えば対象の元データそのもの)であってもよい(リモートコピーやバックアップとしての利用形態)。 For example, in a business application etc., a series of related files generated by the business application, a series of work files specified by the user, etc. for management data such as projects and projects created by the user, respectively. Such data may be distributed and stored in a server or the like as distributed data. Note that there may be one distributed data for the original data (for example, the target original data itself) (use form as remote copy or backup).
 データ分散装置が各データセンターやサーバ等から必要な分散データを収集する際は、データ分散装置は、元データに係る識別情報の全部もしくは一部を指定して、各データセンターやサーバ等に対して、当該元データに対応する分散データを保持しているか否かを問い合わせるメッセージをブロードキャスト(もしくはマルチキャスト)する。当該メッセージに対して、対象の分散データを保持しているデータセンターやサーバ等が、対象の分散データをデータ分散装置に応答することで、データ分散装置は、各分散データの保管場所に係る分散管理情報を要さずに必要な分散データを収集することができる。 When the data distribution device collects necessary distributed data from each data center, server, etc., the data distribution device specifies all or part of the identification information related to the original data and sends it to each data center, server, etc. Then, a message for inquiring whether or not the distributed data corresponding to the original data is held is broadcast (or multicast). In response to the message, the data center or server holding the target distributed data responds to the data distribution device with the target distributed data. Necessary distributed data can be collected without requiring management information.
 これにより、データ分散装置が携帯型端末であるような場合は特に、データ分散装置が盗難や紛失等にあった場合に、分散管理情報が第三者に取得されてしまうことによって、各分散データの保管場所に係る情報が知られてしまい、分散データにアクセス可能となってしまうリスクを回避することができる。また、各分散データがいずれのデータセンターやサーバ等に保管されているかという点に依存せず、容易に各分散データの保管場所を変更することができるため、システムの可用性・柔軟性を向上させることが可能となる。 As a result, especially when the data distribution device is a portable terminal, each distributed data is acquired by the distribution management information being obtained by a third party when the data distribution device is stolen or lost. It is possible to avoid the risk that the information related to the storage location is known and the distributed data can be accessed. In addition, the storage location of each distributed data can be easily changed without depending on which data center or server stores each distributed data, thereby improving system availability and flexibility. It becomes possible.
 また、本実施の形態では、データ分散装置上に識別情報を有していなくても、データ分散装置は、各データの識別情報を復元することができる。例えば、ユーザID等のユーザを識別する情報がユーザにより与えられると、データ分散装置は、当該ユーザに係る識別情報を問い合わせるメッセージをブロードキャスト(もしくはマルチキャスト)する。対象の識別情報を有する分散データを有しているデータセンターやサーバ等が、対象の識別情報をデータ分散装置に応答することで、データ分散装置は、当該ユーザが使用可能な各データに対応する識別情報を取得・復元することができ、この識別情報に基づいて、対応する分散データを収集することが可能となる。 In this embodiment, even if the data distribution apparatus does not have identification information, the data distribution apparatus can restore the identification information of each data. For example, when information identifying a user, such as a user ID, is given by the user, the data distribution apparatus broadcasts (or multicasts) a message for inquiring identification information related to the user. A data center or server having distributed data having target identification information responds to the data distribution apparatus with the target identification information, so that the data distribution apparatus corresponds to each data usable by the user. Identification information can be acquired / restored, and corresponding distributed data can be collected based on this identification information.
 これにより、データ分散装置が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置とは異なる情報処理装置を新たにデータ分散装置として利用する場合であっても、容易に識別情報を復元して分散データにアクセスし、業務を継続することが可能となる。 As a result, when an information processing device different from the original data distribution device is newly used as a data distribution device, such as when the data distribution device is stolen or lost, or when using another terminal for business trips, etc. Even so, it is possible to easily restore the identification information, access the distributed data, and continue the business.
 <システム構成>
 図1は、本発明の一実施の形態であるデータ分散管理システムの構成例について概要を示した図である。データ分散管理システム1は、データ分散装置100と、1つ以上のサーバ200とがインターネット等のネットワーク300を介して互いに接続され通信可能な構成を有する。データ分散装置100を複数有する構成であってもよい。
<System configuration>
FIG. 1 is a diagram showing an outline of a configuration example of a data distribution management system according to an embodiment of the present invention. The data distribution management system 1 has a configuration in which a data distribution apparatus 100 and one or more servers 200 are connected to each other via a network 300 such as the Internet and can communicate with each other. A configuration having a plurality of data distribution devices 100 may also be possible.
 データ分散装置100は、PCや携帯型端末等の情報処理装置によって構成され、例えば、図示しないOS(Operating System)上で動作するソフトウェアプログラムによって実装される分散データ処理部110、ポインタファイル処理部120、分散処理部130、およびインタフェース部140などの各部を有する。また、データ分散装置100もしくはデータ分散管理システム1によるデータの分散管理サービスを利用することができるユーザに係る情報(例えばアカウント情報)を保持するデータベースやファイル、レジストリ等のデータであるユーザ情報160を有する。また、複数の元データ400にそれぞれ対応して、各サーバ200に保管されている分散データ410を指し示すポインタとしての機能を有するポインタファイル150を有する。 The data distribution apparatus 100 is configured by an information processing apparatus such as a PC or a portable terminal. For example, the data distribution apparatus 100 and the pointer file processing unit 120 are implemented by a software program that operates on an operating system (not shown). , A distributed processing unit 130, and an interface unit 140. In addition, user information 160 that is data such as a database, a file, and a registry that holds information (for example, account information) related to a user who can use the data distribution management service by the data distribution apparatus 100 or the data distribution management system 1. Have. In addition, a pointer file 150 having a function as a pointer that points to the distributed data 410 stored in each server 200 is provided corresponding to each of the plurality of original data 400.
 分散データ処理部110は、元データ400と、これに対応して一括して取り扱われる1つ以上の分散データ410との対応付けに係る処理を行う。本実施の形態では、例えば、指定された元データ400に対して、(k,n)閾値秘密分散法により、分散データ410となるn個の分割データを生成し、また逆に、指定されたk個以上の分散データ410を分割データとして、これらから(k,n)閾値秘密分散法により元データ400を復元する公知の秘密分散ライブラリなどである。 The distributed data processing unit 110 performs processing related to the association between the original data 400 and one or more distributed data 410 handled in a lump in correspondence with the original data 400. In the present embodiment, for example, n pieces of divided data to be distributed data 410 are generated by the (k, n) threshold secret sharing method for the specified original data 400, and conversely, A known secret sharing library that restores the original data 400 by using (k, n) threshold secret sharing method with k or more pieces of shared data 410 as divided data.
 なお、上述したように、分散データ410は、本実施の形態のように元データ400から生成される、もしくは元データ400に基づいて生成されるデータに限らず、元データ400に関連付けられる既存の複数のデータであってもよい。また、分散データ410は1つ(例えば元データ400そのもの)であってもよい。 As described above, the distributed data 410 is not limited to data generated from the original data 400 or generated based on the original data 400 as in the present embodiment. It may be a plurality of data. Further, the distributed data 410 may be one (for example, the original data 400 itself).
 ポインタファイル処理部120は、複数の元データ400にそれぞれ対応して、これに対する分散データ410を指し示すポインタとしての機能を有するポインタファイル150を生成する。また、後述するインタフェース部140を介した、ポインタファイル150に対するユーザからの指示に基づいて、元データ400(もしくは対応する分散データ410)に対する処理を行う。 The pointer file processing unit 120 generates a pointer file 150 having a function as a pointer that points to the distributed data 410 corresponding to each of the plurality of original data 400. Further, processing is performed on the original data 400 (or the corresponding distributed data 410) based on an instruction from the user to the pointer file 150 via the interface unit 140 described later.
 このポインタファイル150は、元データ400(ひいては対応する分散データ410)を指し示す機能を有するが、元データ400の実体は有しておらず、その内容として、後述するような、元データ400(ひいては対応する分散データ410)を識別して特定可能とする識別情報を有している。すなわち、ポインタファイル150は、元データ400(および対応する分散データ410)に対するいわゆるショートカットやシンボリックリンク、エイリアスなどに類似するものである。なお、この識別情報は、分散データ処理部110によって生成された各分散データ410に対しても、ヘッダ情報等として付加する。 The pointer file 150 has a function of pointing to the original data 400 (and corresponding distributed data 410), but does not have the entity of the original data 400. The contents of the pointer file 150 are as described below. It has identification information that identifies and identifies the corresponding distributed data 410). That is, the pointer file 150 is similar to a so-called shortcut, symbolic link, alias, or the like for the original data 400 (and corresponding distributed data 410). This identification information is also added as header information or the like to each distributed data 410 generated by the distributed data processing unit 110.
 ポインタファイル処理部120は、この識別情報を生成するため、さらに、識別情報生成部121を有する。また、識別情報に含まれる各種IDの値を生成するため、ID生成部122を有する。このID生成部122は、異なる複数のデータ分散装置100との間でも重複しないユニークなID(ユニバーサルID)を生成することができる公知の機能を有するライブラリ等からなる。 The pointer file processing unit 120 further includes an identification information generation unit 121 in order to generate this identification information. In addition, an ID generation unit 122 is provided to generate various ID values included in the identification information. The ID generation unit 122 includes a library having a known function that can generate a unique ID (universal ID) that does not overlap with a plurality of different data distribution apparatuses 100.
 分散処理部130は、分散データ処理部110により元データ400と対応付けられた分散データ410に識別情報を付加し、所定のルールに基づいて各サーバ200に分散保管する分散部131、および元データ400に対応付けられ分散データ410を各サーバ200から収集する収集部132を有する。また、分散データ410の保管先となり得るサーバ200のリストからなるサーバリスト133を有していてもよい。 The distributed processing unit 130 adds identification information to the distributed data 410 associated with the original data 400 by the distributed data processing unit 110, and distributes and stores the distributed data in each server 200 based on a predetermined rule, and the original data 400 includes a collection unit 132 that collects the distributed data 410 associated with each of the servers 200 from each server 200. Further, it may have a server list 133 including a list of servers 200 that can be storage destinations of the distributed data 410.
 本実施の形態では、分散部131は、例えば、分散データ処理部110によって(k,n)閾値秘密分散法により生成され、さらにポインタファイル処理部120によって識別情報が付加されたn個の分散データ410を、サーバリスト133から選択したn個の異なるサーバ200に分散保管する。サーバ200の数がn個よりも多い場合は、これらの中から分散データ410を保管するn個のサーバ200を、例えばローテーションやランダム抽出などにより選択する。 In the present embodiment, the distribution unit 131 is generated by, for example, the (k, n) threshold secret sharing method by the distributed data processing unit 110, and n pieces of distributed data to which identification information is added by the pointer file processing unit 120. 410 is distributed and stored in n different servers 200 selected from the server list 133. When the number of servers 200 is larger than n, n servers 200 that store the distributed data 410 are selected from among them by, for example, rotation or random extraction.
 一方、収集部132は、各サーバ200に対して、元データ400に対応付けられた分散データ410を有しているか否かを問い合わせ、有しているサーバ200から送信された分散データ410収集する。本実施の形態では、例えば、分散データ処理部110によって(k,n)閾値秘密分散法により元データ400を復元するために必要となるk個以上の分散データ410を収集する。 On the other hand, the collection unit 132 inquires of each server 200 whether or not it has the distributed data 410 associated with the original data 400, and collects the distributed data 410 transmitted from the server 200 that has it. . In this embodiment, for example, the distributed data processing unit 110 collects k or more pieces of distributed data 410 necessary for restoring the original data 400 by the (k, n) threshold secret sharing method.
 各サーバ200への問い合わせに際しては、対象の元データ400に対応するポインタファイル150に含まれる識別情報の全部または一部を含むメッセージを、全てのサーバ200に対してブロードキャスト(もしくはサーバリスト133にリストされている各サーバ200に対してマルチキャスト)する。なお、ブロードキャスト(マルチキャスト)のプロトコルとしては、公知の技術を適宜利用することができる。 When inquiring each server 200, a message including all or part of the identification information included in the pointer file 150 corresponding to the target original data 400 is broadcast to all the servers 200 (or listed in the server list 133). Multicast to each of the servers 200 being configured. As a broadcast (multicast) protocol, a known technique can be used as appropriate.
 インタフェース部140は、データ分散装置100における画面表示等のユーザインタフェースなどの入出力機能を有する。ユーザは、例えば、一般的なOSが有するファイル管理用の画面やアプリケーション等を利用して、データ分散管理システム1の機能を利用することができる。 The interface unit 140 has an input / output function such as a user interface such as a screen display in the data distribution apparatus 100. The user can use the functions of the data distribution management system 1 by using, for example, a file management screen or application provided in a general OS.
 例えば、ファイル管理用のアプリケーションにおいて元データ400を特定のフォルダ等にドラッグ&ドロップなどの簡易な操作により移動する。これをトリガとして、分散データ処理部110により分散データ410を生成し、これを分散処理部130によって各サーバ200に分散保管する。さらに、ポインタファイル処理部120により、当該元データ400に対応するポインタファイル150を生成し、特定のフォルダ等の元データ400と置き換える。その後は、ユーザからの元データ400に対する参照等のアクセスは、特定のフォルダ等に配置されたポインタファイル150に対して行われる。 For example, in the file management application, the original data 400 is moved to a specific folder or the like by a simple operation such as drag and drop. With this as a trigger, the distributed data processing unit 110 generates the distributed data 410, and the distributed processing unit 130 stores the distributed data in the servers 200 in a distributed manner. Further, the pointer file processing unit 120 generates a pointer file 150 corresponding to the original data 400 and replaces the original data 400 such as a specific folder. Thereafter, access such as reference to the original data 400 from the user is performed on the pointer file 150 arranged in a specific folder or the like.
 ユーザが、特定のフォルダ等においてポインタファイル150に対して参照等の指示を行うと、ポインタファイル処理部120により、当該ポインタファイル150によって特定される元データ400に対応付けられた分散データ410を、分散処理部130によって各サーバ200から収集する。さらに、本実施の形態のように必要な場合には、収集した分散データ410から分散データ処理部110により元データ400を復元する。その後、元データ400もしくは分散データ410を関連するアプリケーションプログラム等により表示等する。これにより、ユーザに対して、あたかも元データ400に対して保存・参照等の処理を行っているのと同等のインタフェースを提供し、分散データ410に係る処理を隠蔽することができる。 When the user gives an instruction to refer to the pointer file 150 in a specific folder or the like, the pointer file processing unit 120 causes the distributed data 410 associated with the original data 400 specified by the pointer file 150 to be Collected from each server 200 by the distributed processing unit 130. Further, when necessary as in the present embodiment, the original data 400 is restored from the collected distributed data 410 by the distributed data processing unit 110. Thereafter, the original data 400 or the distributed data 410 is displayed by a related application program or the like. Thereby, it is possible to provide the user with an interface equivalent to processing such as storage / reference for the original data 400, and to conceal the processing related to the distributed data 410.
 なお、図1の例では、データ分散装置100自体がインタフェース部140によるユーザインタフェースを有し、さらに分散保管に係る一連の処理まで行う構成としているが、これに限らず、複数の情報処理装置に機能が分離していてもよい。例えば、データ分散装置100をファイルサーバとして構成し、インタフェース部140およびポインタファイル150以外の各部を有するとともに、インタフェース部140およびデータ分散装置100により生成されたポインタファイル150を有するクライアント端末がデータ分散装置100に対して複数接続する構成とすることも可能である。 In the example of FIG. 1, the data distribution device 100 itself has a user interface by the interface unit 140 and further performs a series of processes related to distributed storage. The functions may be separated. For example, the data distribution apparatus 100 is configured as a file server, and a client terminal having each unit other than the interface unit 140 and the pointer file 150 and having the pointer file 150 generated by the interface unit 140 and the data distribution apparatus 100 is a data distribution apparatus. It is also possible to adopt a configuration in which a plurality of connections are made to 100.
 サーバ200は、データ分散装置100から送信された分散データ410を格納することができる図示しないHDD(Hard Disk Drive)等の記憶装置を有する情報処理装置であり、例えば、ファイルサーバや、ストレージサーバなどにより構成される。また、これらの情報処理装置を有するデータセンターであってもよい。また、クラウドコンピューティングサービスによる仮想サーバや仮想データセンター等であってもよい。 The server 200 is an information processing apparatus having a storage device such as an HDD (Hard Disk Disk Drive) (not shown) that can store the distributed data 410 transmitted from the data distribution apparatus 100, such as a file server or a storage server. Consists of. Moreover, the data center which has these information processing apparatuses may be sufficient. Further, it may be a virtual server or a virtual data center by a cloud computing service.
 サーバ200は、例えば、図示しないOS上で動作するソフトウェアプログラムによって実装される分散保管部210を有する。分散保管部210は、データ分散装置100から送信された分散データ410を記憶装置に格納する。また、データ分散装置100からのブロードキャスト(もしくはマルチキャスト)メッセージに対して、メッセージに含まれる識別情報に合致する識別情報をヘッダ等に含む分散データ410を検索し、該当する分散データ410を有する場合は、当該分散データ410もしくはそのヘッダ等に含まれる識別情報をデータ分散装置100に応答する。 The server 200 includes, for example, a distributed storage unit 210 that is implemented by a software program that runs on an OS (not shown). The distributed storage unit 210 stores the distributed data 410 transmitted from the data distribution apparatus 100 in a storage device. Further, in response to a broadcast (or multicast) message from the data distribution apparatus 100, when the distributed data 410 including the identification information matching the identification information included in the message is searched for and the corresponding distributed data 410 is included. The identification information contained in the distributed data 410 or its header is returned to the data distribution apparatus 100.
 図2は、ポインタファイル処理部120の識別情報生成部121により生成され、ポインタファイル150および分散データ410に付加される識別情報の内容について例を示した図である。識別情報170は、例えば、オリジナルファイルID(FID)171、カレントファイルID(FID)172、およびユーザID173などの情報を有する。オリジナルFID171は、各バージョン(世代)を含む元データ400(元データ400からなるファイル)全体を一意に識別するIDである。このオリジナルFID171は、元データ400を最初に分散保管する際、すなわち、元データ400から最初に分散データ410を生成等して、各分散データ410を各サーバ200に分散保管する際に、当該元データ400およびこれに対応する分散データ410を識別するために割り当てられる。 FIG. 2 is a diagram showing an example of the contents of identification information generated by the identification information generation unit 121 of the pointer file processing unit 120 and added to the pointer file 150 and the distributed data 410. The identification information 170 includes information such as an original file ID (FID) 171, a current file ID (FID) 172, and a user ID 173, for example. The original FID 171 is an ID for uniquely identifying the entire original data 400 (a file made up of the original data 400) including each version (generation). The original FID 171 is used when the original data 400 is first distributed and stored, that is, when the distributed data 410 is first generated from the original data 400 and distributed and stored in each server 200. Assigned to identify data 400 and corresponding distributed data 410.
 カレントFID172は、各バージョン(世代)の元データ400(元データ400からなるファイル)をそれぞれ一意に識別するIDである。このカレントFID172は、元データ400を最初に分散保管して以降、当該元データに対して編集や更新を行った際に、最新のバージョン(世代)の元データ400に対して割り当てられるIDである。すなわち、当初はカレントFID172の値はオリジナルFID171の値と同じであり、その後、元データ400を編集等するために必要な分散データ410を収集し、編集後の最新の元データ400に対して再度分散データ410を対応付けして、各分散データ410を各サーバ200に分散保管する毎に割り当てられるIDである。なお、オリジナルFID171の値は、最初に割り当てられた値のまま更新されないものとする。 The current FID 172 is an ID for uniquely identifying each version (generation) of the original data 400 (a file including the original data 400). The current FID 172 is an ID assigned to the latest version (generation) of original data 400 when the original data 400 is first distributedly stored and then edited or updated. . That is, initially, the value of the current FID 172 is the same as the value of the original FID 171, and thereafter, the distributed data 410 necessary for editing the original data 400 is collected, and the latest original data 400 after editing is again collected. The ID is assigned every time the distributed data 410 is associated with the distributed data 410 and distributedly stored in each server 200. It is assumed that the value of original FID 171 is not updated as it was initially assigned.
 従って、カレントFID172は、最新の元データ400および対応する分散データ410を特定するためのIDであるだけでなく、当該元データ400のバージョン情報としての役割を有する。すなわち、各サーバ200の分散保管部210において、編集後の最新の元データ400に対する分散データ410を格納する際に、当該元データ400の以前のバージョンに対する分散データ410(最新のものとはヘッダ等に含まれる識別情報170のカレントFID172が異なる)を履歴として残しておく。これにより、各サーバ200において複数バージョンの元データ400に対する分散データ410をそれぞれ保管することになるため、ユーザに指定されたバージョンの元データ400および対応する分散データ410を得ることが可能となる。 Therefore, the current FID 172 is not only an ID for specifying the latest original data 400 and the corresponding distributed data 410, but also has a role as version information of the original data 400. That is, when the distributed storage unit 210 of each server 200 stores the distributed data 410 for the latest original data 400 after editing, the distributed data 410 for the previous version of the original data 400 (the latest is a header or the like) (The current FID 172 of the identification information 170 included in is different) is left as a history. As a result, each server 200 stores the distributed data 410 corresponding to a plurality of versions of the original data 400, so that the version of the original data 400 designated by the user and the corresponding distributed data 410 can be obtained.
 なお、カレントFID172は異なるがオリジナルFID171が同じである複数の分散データ410は、それぞれ、同一の元データ400の異なるバージョンのものであると判断することができる。 It should be noted that a plurality of distributed data 410 having different current FIDs 172 but the same original FIDs 171 can be determined to be of different versions of the same original data 400.
 ユーザID173は、当該識別情報170に対応するユーザ、すなわち、当該識別情報170に対応する元データ400を作成・編集等したユーザを特定するIDである。このIDの情報は、例えば、ユーザ情報160に登録されている各ユーザのIDの情報と対応させることができる。 The user ID 173 is an ID that identifies a user corresponding to the identification information 170, that is, a user who created or edited the original data 400 corresponding to the identification information 170. This ID information can be associated with the ID information of each user registered in the user information 160, for example.
 なお、識別情報170の各IDは、それぞれ、データ分散管理システム1内で重複しないユニークなIDである必要がある。従って、これらのIDは、例えば、ポインタファイル処理部120のID生成部122によって生成されたID(ユニバーサルID)とすることができる。なお、ユーザID173については、例えば、ユーザ情報160に格納された各ユーザのアカウント情報におけるユーザのIDを利用してもよいし、これに、部署や企業等のユーザが属する組織やグループ、データ分散管理システム1によって提供されるデータ管理サービスの契約単位などを識別する情報を付加することで、データ分散管理システム1内でユニークなIDとなるようにしてもよい。 Each ID of the identification information 170 needs to be a unique ID that does not overlap in the data distribution management system 1. Accordingly, these IDs can be IDs (universal IDs) generated by the ID generation unit 122 of the pointer file processing unit 120, for example. As the user ID 173, for example, the user ID in the account information of each user stored in the user information 160 may be used, and to this, an organization or group to which a user such as a department or a company belongs, and data distribution By adding information for identifying a contract unit of the data management service provided by the management system 1, the ID may be unique within the data distribution management system 1.
 <処理フロー(分散保管)>
 図3は、元データ400と複数の分散データ410を対応付けしてこれらを分散保管する際の処理の例について概要を示した図である。データ分散装置100において、インタフェース部140を介してユーザから元データ400の保存の指示を受けると、まず、分散データ処理部110によって、元データ400から1つ以上の分散データ410を生成する(S01)。本実施の形態では、上述したように例えば、元データ400から(k,n)閾値秘密分散法により、k個以上集めなければ元データ400を復元することができないn個の分散データ410を生成する。これにより、元データ400とn個の分散データ410が対応付けられることになる。
<Processing flow (distributed storage)>
FIG. 3 is a diagram showing an outline of an example of processing when the original data 400 and a plurality of distributed data 410 are associated and stored in a distributed manner. In the data distribution apparatus 100, when receiving an instruction to save the original data 400 from the user via the interface unit 140, first, the distributed data processing unit 110 generates one or more distributed data 410 from the original data 400 (S01). ). In the present embodiment, as described above, for example, n pieces of distributed data 410 that cannot be restored without collecting k pieces or more from the original data 400 by (k, n) threshold secret sharing method are generated. To do. As a result, the original data 400 and the n distributed data 410 are associated with each other.
 次に、ポインタファイル処理部120によって、当該元データ400に対する識別情報170を生成し(S02)、さらに当該識別情報170を含むポインタファイル150を生成する(S03)。ここでは、上述したように例えばID生成部122等によって、識別情報170における各IDの情報を生成し、識別情報生成部121によって、これら各IDからなる識別情報170を生成する。さらに、ポインタファイル処理部120が、当該識別情報170の内容を含むポインタファイル150を生成する。このとき例えば、ポインタファイル150のファイル名(拡張子除く)を元データ400と同じファイル名とする等により、ユーザが元データ400に対応するポインタファイル150を容易に識別できるようにする。 Next, the pointer file processing unit 120 generates identification information 170 for the original data 400 (S02), and further generates a pointer file 150 including the identification information 170 (S03). Here, as described above, for example, the ID generation unit 122 or the like generates information of each ID in the identification information 170, and the identification information generation unit 121 generates the identification information 170 including these IDs. Further, the pointer file processing unit 120 generates a pointer file 150 including the contents of the identification information 170. At this time, for example, by making the file name (excluding the extension) of the pointer file 150 the same as the original data 400, the user can easily identify the pointer file 150 corresponding to the original data 400.
 なお、当該元データ400が過去に既に分散保管されており、対応するポインタファイル150および識別情報170を既に有しているものである場合(すなわち、当該元データ400に対して編集を行った後に再度分散保管を行う場合)には、ステップS02において既存の識別情報170内のカレントFID172のみを新たに生成して更新する(オリジナルFID171は更新せずにそのままとする)ようにしてもよい。このとき、更新した最新のカレントFID172の内容と合わせて、既存のカレントFID172の内容を、過去のバージョン履歴として残すようにしてもよい。 When the original data 400 is already distributed and stored in the past and already has the corresponding pointer file 150 and identification information 170 (that is, after editing the original data 400) In the case where distributed storage is performed again), only the current FID 172 in the existing identification information 170 may be newly generated and updated in step S02 (the original FID 171 is not updated and is left as it is). At this time, the contents of the existing current FID 172 may be left as past version history together with the updated latest FID 172 contents.
 次に、ステップS01で生成した各分散データ410のヘッダ等に、ステップS02で生成もしくは更新した識別情報170の内容を付加もしくは更新した上で、分散処理部130の分散部131により、各分散データ410をそれぞれ異なる複数のサーバ200(図3の例ではサーバA(200a)とサーバB(200b))に分散保管のため送信する(S04)。複数のサーバ200の選択は、上述したように、例えば、サーバリスト133に登録されたサーバ200からローテーションやランダム抽出などにより選択する。本実施の形態では、分散データ処理部110によって生成されたn個の分散データ410を保管するn個のサーバ200を選択する。このとき、各サーバ200に対して分散データ410の保管が可能か否かを問い合わせる処理を行ってもよい。 Next, after adding or updating the contents of the identification information 170 generated or updated in step S02 to the header or the like of each distributed data 410 generated in step S01, each distributed data is processed by the distribution unit 131 of the distributed processing unit 130. 410 is transmitted to a plurality of different servers 200 (server A (200a) and server B (200b) in the example of FIG. 3) for distributed storage (S04). As described above, the plurality of servers 200 are selected from the servers 200 registered in the server list 133 by rotation or random extraction, for example. In the present embodiment, n servers 200 that store the n distributed data 410 generated by the distributed data processing unit 110 are selected. At this time, a process of inquiring each server 200 as to whether or not the distributed data 410 can be stored may be performed.
 分散データ410を受信した各サーバ200では、それぞれ、分散保管部210により記憶装置に分散データ410を格納する(S05)。このとき、過去のバージョンの元データ400に対応する分散データ410が存在する場合は、これを残した上で格納するようにしてもよい。この場合、さらに、過去のバージョンの元データ400に対応する分散データ410を削除して整理し(S06)、一連の処理結果をデータ分散装置100に応答する。 In each server 200 that has received the distributed data 410, the distributed storage unit 210 stores the distributed data 410 in the storage device (S05). At this time, if the distributed data 410 corresponding to the past version of the original data 400 exists, the distributed data 410 may be left and stored. In this case, the distributed data 410 corresponding to the past version of the original data 400 is further deleted and organized (S06), and a series of processing results are returned to the data distribution apparatus 100.
 ステップS06では、例えば、分散保管部210により、新たに格納する最新の分散データ410のヘッダ等に含まれる識別情報170のオリジナルFID171と同じオリジナルFID171を含む識別情報170を有する分散データ410(すなわち、同一の元データ400の異なるバージョンに対応する分散データ410)を検索する。検索された分散データ410の数が所定の数(保管可能な世代数)よりも多い場合は、最古の分散データ410から順に所定の世代数になるまで削除する。なお、分散データ410の新旧は、例えば、分散データ410からなるファイルに付されたタイムスタンプ等により把握することができる。 In step S06, for example, the distributed storage unit 210 uses the distributed data 410 having the identification information 170 including the original FID 171 identical to the original FID 171 of the identification information 170 included in the header of the latest distributed data 410 to be newly stored (that is, The distributed data 410) corresponding to different versions of the same original data 400 is searched. If the number of retrieved distributed data 410 is greater than a predetermined number (number of storable generations), the oldest distributed data 410 is deleted in order from the oldest distributed data 410 until the predetermined number of generations are reached. In addition, the new and old of the distributed data 410 can be grasped by, for example, a time stamp attached to a file including the distributed data 410.
 ステップS06での古い分散データ410の削除処理は、上述したように、ステップS05での分散データ410の格納の都度行うようにしてもよいし、各サーバ200において所定の時刻に定期的に起動されるバッチプログラム等により、全ての分散データ410に対して一括して行うようにしてもよい。なお、後述するIDのロックの手順と同様の手順により、分散データ410の特定のバージョン(すなわち、特定のカレントFID172を含む識別情報170を有する分散データ410)については、削除されないようロックすることも可能である。 As described above, the deletion processing of the old distributed data 410 in step S06 may be performed each time the distributed data 410 is stored in step S05, or is periodically started at each server 200 at a predetermined time. Alternatively, all distributed data 410 may be collectively processed by a batch program or the like. A specific version of the distributed data 410 (that is, the distributed data 410 having the identification information 170 including the specific current FID 172) may be locked so as not to be deleted by a procedure similar to the ID locking procedure described later. Is possible.
 各サーバ200での分散保管が完了すると、データ分散装置100は、分散部131により、分散保管処理が正常に完了したか否かを判定する(S07)。例えば、本実施の形態では、n個の分散データ410をn個のサーバ200に正常に保管できたか否かを判定する。正常に保管できなかった分散データ410がある場合は、別なサーバ200を選択して全ての分散データ410が保管できるまでステップS04~S06の処理を再試行するようにしてもよい。また、保管が可能なサーバ200がもはや存在しなくなった場合は、分散保管処理をエラーとして終了させるようにしてもよい。なお、このとき、既に行った処理をロールバックするようにしてもよい。 When the distributed storage in each server 200 is completed, the data distribution apparatus 100 determines whether or not the distributed storage processing has been normally completed by the distribution unit 131 (S07). For example, in the present embodiment, it is determined whether n pieces of distributed data 410 have been normally stored in n servers 200. If there is distributed data 410 that could not be stored normally, another server 200 may be selected and the processes in steps S04 to S06 may be retried until all the distributed data 410 can be stored. Further, when there is no longer a server 200 that can be stored, the distributed storage process may be terminated as an error. At this time, the processing already performed may be rolled back.
 分散保管処理が正常に完了すると、データ分散装置100は、データ分散装置100上に保持する元データ400および生成された分散データ410を削除し(S08)、処理を終了する。データ分散装置100上のこれらのデータを削除することで、データ分散装置100自体の盗難や紛失等に対して、元データ400(および対応する分散データ410)が漏洩することを回避することが可能となる。 When the distributed storage process is normally completed, the data distribution apparatus 100 deletes the original data 400 and the generated distributed data 410 held on the data distribution apparatus 100 (S08), and ends the process. By deleting these data on the data distribution apparatus 100, it is possible to avoid the leakage of the original data 400 (and corresponding distribution data 410) for theft or loss of the data distribution apparatus 100 itself. It becomes.
 また、データ分散装置100上に保持するポインタファイル150には、元データ400(および対応する分散データ410)を識別するファイルIDの情報しか有さず、データの内容自体に係る情報や、データが実際に保管されているサーバ200に係る情報を有していない。従って、ポインタファイル150の内容を第三者が知った場合でも、分散データ410を収集することはできず、元データ400を復元する(元データ400に係る情報を得る)ことはできない。 Further, the pointer file 150 held on the data distribution apparatus 100 has only file ID information for identifying the original data 400 (and corresponding distribution data 410), and information and data related to the data content itself are included. It does not have information related to the server 200 that is actually stored. Therefore, even if a third party knows the contents of the pointer file 150, the distributed data 410 cannot be collected, and the original data 400 cannot be restored (information related to the original data 400 can be obtained).
 なお、本実施の形態では、上記のようなセキュリティの観点を考慮して元データ400および分散データ410をデータ分散装置100から削除するものとしているが、データ分散装置100上の元データ400に対するバックアップとして当該分散保管サービスを利用する場合は、元データ400を削除せずに残しておいてもよい。 In the present embodiment, the original data 400 and the distributed data 410 are deleted from the data distribution apparatus 100 in consideration of the security viewpoint as described above. However, the backup of the original data 400 on the data distribution apparatus 100 is used. When the distributed storage service is used, the original data 400 may be left without being deleted.
 <処理フロー(元データ取得)>
 図4は、複数の分散データ410を収集して、これらから元データ400を得る際の処理の例について概要を示した図である。データ分散装置100において、インタフェース部140を介したユーザによるポインタファイル150への操作によって、元データ400の参照(編集のための参照含む)の指示を受けると、まず、ポインタファイル処理部120により、当該ポインタファイル150に含まれる識別情報170の内容を取得する(S11)。次に、識別情報170内のカレントFID172の情報に基づいて、分散処理部130の収集部132により、各サーバ200に対して対応する分散データ410を保持しているかを問い合わせる(S12)。
<Processing flow (original data acquisition)>
FIG. 4 is a diagram showing an outline of an example of processing when collecting a plurality of distributed data 410 and obtaining original data 400 from these. In the data distribution apparatus 100, when an instruction to refer to the original data 400 (including reference for editing) is received by an operation on the pointer file 150 by the user via the interface unit 140, first, the pointer file processing unit 120 The contents of the identification information 170 included in the pointer file 150 are acquired (S11). Next, based on the information of the current FID 172 in the identification information 170, the collection unit 132 of the distributed processing unit 130 inquires each server 200 whether the corresponding distributed data 410 is held (S12).
 具体的には、上述したように、例えば、カレントFID172の情報を含む分散データ410の問い合わせメッセージを各サーバ200に対してブロードキャストする。サーバ200の数が多い場合は、サーバリスト133にリストされているサーバ200に対してマルチキャストするようにして、ネットワーク300に対する負荷を低減するようにしてもよい。 Specifically, as described above, for example, an inquiry message of the distributed data 410 including information on the current FID 172 is broadcast to each server 200. When the number of servers 200 is large, the load on the network 300 may be reduced by multicasting the servers 200 listed in the server list 133.
 問い合わせのブロードキャストメッセージを受信した各サーバ200では、分散保管部210により、メッセージに含まれるカレントFID172の情報を取得し、当該カレントFID172に対応する分散データ410を検索する(S13)。具体的には、メッセージに含まれるカレントFID172と合致するカレントFID172を含む識別情報170をヘッダ等に有する分散データ410を検索する。該当する分散データ410を保管していない場合(例えば、図4のサーバB(200b))は、その旨をデータ分散装置100に応答する。 Each server 200 that has received the inquiry broadcast message acquires the information of the current FID 172 included in the message by the distributed storage unit 210, and searches the distributed data 410 corresponding to the current FID 172 (S13). Specifically, the distributed data 410 having the identification information 170 including the current FID 172 that matches the current FID 172 included in the message in the header or the like is searched. When the corresponding distributed data 410 is not stored (for example, the server B (200b) in FIG. 4), a response to that effect is sent to the data distribution apparatus 100.
 一方、該当する分散データ410を保管している場合(例えば、図4のサーバA(200a))は、当該分散データ410のヘッダ等に含まれる識別情報170がロックされているか否かを確認する(S14)。具体的には、対象の識別情報170内の各ID(オリジナルFID171、カレントFID172もしくはユーザID173)の値が、サーバ200に保持する図示しないロックリストに登録されているか否かを確認する。登録されている場合には、対象の分散データ410については使用がロックされていることから、その旨をデータ分散装置100に応答する。登録されていない場合は、対象の分散データ410をデータ分散装置100に対して送信する(S15)。なお、ロックリストへのIDの登録については後述する。 On the other hand, when the corresponding distributed data 410 is stored (for example, server A (200a) in FIG. 4), it is confirmed whether or not the identification information 170 included in the header of the distributed data 410 is locked. (S14). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172 or user ID 173) in the target identification information 170 is registered in a lock list (not shown) held in the server 200. If registered, since the use of the target distributed data 410 is locked, a response to that effect is sent to the data distribution apparatus 100. If not registered, the target distributed data 410 is transmitted to the data distribution apparatus 100 (S15). Registration of IDs in the lock list will be described later.
 各サーバ200でのブロードキャストメッセージに対する処理が完了すると、データ分散装置100は、収集部132により、収集した分散データ410(各サーバ200から送信された分散データ410)により元データ400の取得が可能であるか否かを判定する(S16)。例えば、本実施の形態では、元データ400を復元可能なk個以上の分散データ410を収集することができたか否かを判定する。元データ400を取得(復元)できない場合、すなわち、本実施の形態では収集できた分散データ410がk個未満であった場合は、元データ400の取得処理をエラーとして終了させるようにしてもよい。 When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 can acquire the original data 400 from the collected distributed data 410 (distributed data 410 transmitted from each server 200) by the collection unit 132. It is determined whether or not there is (S16). For example, in this embodiment, it is determined whether or not k or more pieces of distributed data 410 that can restore the original data 400 have been collected. When the original data 400 cannot be acquired (restored), that is, when there are less than k pieces of distributed data 410 that can be collected in the present embodiment, the acquisition process of the original data 400 may be terminated as an error. .
 ステップS16で元データ400の取得が可能であると判定した場合は、収集した分散データ410から分散データ処理部110によって元データ400を取得(復元)し(S17)、処理を終了する。本実施の形態では、収集したk個以上の分散データ410から(k,n)閾値秘密分散法により元データ400を復元する。なおこのとき、復元した元データ400の種別に応じて、これに関連付けられたアプリケーションプログラムを起動し、復元した元データ400を表示させるようにしてもよい。 If it is determined in step S16 that the original data 400 can be acquired, the distributed data processing unit 110 acquires (restores) the original data 400 from the collected distributed data 410 (S17), and the process ends. In the present embodiment, the original data 400 is restored from the collected k or more pieces of distributed data 410 by the (k, n) threshold secret sharing method. At this time, according to the type of the restored original data 400, an application program associated therewith may be activated to display the restored original data 400.
 このように、ユーザは、インタフェース部140を介してポインタファイル150に対して元データ400に対するものと同様の処理を行うことで、データ分散装置100が必要な分散データ410を収集して元データ400を取得(復元)した上で表示等することができるため、分散データ410が複数のサーバ200に分散保管されていることを意識することなく、シームレスに元データ400(もしくは対応する分散データ410)に対するアクセスを行うことが可能である。また、データ分散装置100にとっても、各分散データ410がどのサーバ200に保管されているかという情報を保持することなく、必要な分散データ410を収集することができる。 In this way, the user performs the same processing as that for the original data 400 on the pointer file 150 via the interface unit 140, whereby the data distribution apparatus 100 collects the necessary distributed data 410 and collects the original data 400. Since the distributed data 410 is distributed and stored in the plurality of servers 200, the original data 400 (or the corresponding distributed data 410) can be seamlessly obtained. Can be accessed. The data distribution apparatus 100 can also collect the necessary distributed data 410 without retaining information on which server 200 each distributed data 410 is stored in.
 なお、上記の図4の例では、識別情報170内のカレントFID172の情報に基づいて、各サーバ200に対して分散データ410を保持しているかを問い合わせているが、識別情報170内の他のID情報を用いて問い合わせを行うようにしてもよい。例えば、ユーザの指示に基づいて、オリジナルFID171を指定して問い合わせることで、異なるバージョン(カレントFID172)の複数の元データ400に対応する分散データ410を収集することができる。また、ユーザID173を指定して問い合わせることで、対応するユーザが作成・編集した元データ400に対応する分散データ410を全て収集することができる。 In the example of FIG. 4 described above, each server 200 is inquired as to whether the distributed data 410 is held based on the information of the current FID 172 in the identification information 170. The inquiry may be made using the ID information. For example, it is possible to collect the distributed data 410 corresponding to a plurality of original data 400 of different versions (current FID 172) by inquiring by specifying the original FID 171 based on a user instruction. In addition, by designating and inquiring the user ID 173, all the distributed data 410 corresponding to the original data 400 created and edited by the corresponding user can be collected.
 <処理フロー(IDロック)>
 本実施の形態では、例えば、データ分散装置100である携帯型端末の盗難や紛失等などに際して、上述したように、データ分散装置100に元データ400を保持せず、また、各分散データ410の保管場所(サーバ200)に係る情報を含む分散管理情報も有さないことから、元データ400の漏洩のリスクを低減することができる。
<Processing flow (ID lock)>
In the present embodiment, for example, when the portable terminal that is the data distribution apparatus 100 is stolen or lost, the data distribution apparatus 100 does not hold the original data 400 as described above. Since there is no distributed management information including information related to the storage location (server 200), the risk of leakage of the original data 400 can be reduced.
 しかしながら、元データ400(および対応する分散データ410)についてのファイルIDや、ユーザIDの情報を含む識別情報170を有するポインタファイル150はデータ分散装置100上に存在するため、第三者に参照され得る。そこで、本実施の形態では、データ分散装置100の盗難や紛失等の際に、第三者によって識別情報170に含まれる各IDの情報に基づいて各サーバ200から分散データ410が取得されるリスクを極力低減させるため、識別情報170内の各IDをロックすることで対応する分散データ410の使用を制限することを可能とする。 However, since the pointer file 150 having the identification information 170 including the file ID of the original data 400 (and the corresponding distributed data 410) and the user ID information exists on the data distribution apparatus 100, it is referred to by a third party. obtain. Therefore, in the present embodiment, when the data distribution apparatus 100 is stolen or lost, the risk that the distributed data 410 is acquired from each server 200 based on the information of each ID included in the identification information 170 by a third party. Therefore, it is possible to restrict the use of the corresponding distributed data 410 by locking each ID in the identification information 170.
 図5は、分散データ410の使用をロックして元データ400および対応する分散データ410の取得を制限する際の処理の例について概要を示した図である。まず、データ分散装置100において、ユーザは、インタフェース部140を介してロックする対象となるIDの値を指定する(S21)。具体的には、識別情報170内のオリジナルFID171、カレントFID172もしくはユーザID173のうち少なくとも1つ以上について値を指定する。次に、指定されたIDの情報に基づいて、分散処理部130の分散部131により、各サーバ200に対してIDのロックの指示を行う(S22)。具体的には、ロック対象のIDの値を含むロック指示のメッセージを各サーバ200に対してブロードキャスト(もしくはマルチキャスト)する。 FIG. 5 is a diagram showing an outline of an example of processing when the use of the distributed data 410 is locked and acquisition of the original data 400 and the corresponding distributed data 410 is restricted. First, in the data distribution apparatus 100, the user specifies an ID value to be locked via the interface unit 140 (S21). Specifically, a value is specified for at least one of the original FID 171, the current FID 172, and the user ID 173 in the identification information 170. Next, based on the specified ID information, the distribution unit 131 of the distribution processing unit 130 instructs each server 200 to lock the ID (S22). Specifically, a lock instruction message including a lock target ID value is broadcast (or multicast) to each server 200.
 ロック指示のブロードキャストメッセージを受信した各サーバ200では、メッセージに含まれるIDの情報を、ロックリスト(図示しない)等に登録する(S23)。その後、登録の成否をデータ分散装置100に応答する。各サーバ200でのロックリストへのIDの登録が完了すると、データ分散装置100は、対象の全てのサーバ200において正常にロックリストへのIDの登録が完了したか否かを判定する(S24)。登録が失敗したサーバ200や、タイムアウトで応答を受信できなかったサーバ200がある場合は、IDのロック処理をエラーとして終了させるようにしてもよい。なお、このとき、既に行った処理をロールバックするようにしてもよい。 Each server 200 that has received the lock instruction broadcast message registers the ID information included in the message in a lock list (not shown) or the like (S23). After that, the success or failure of registration is returned to the data distribution apparatus 100. When the registration of the ID to the lock list in each server 200 is completed, the data distribution apparatus 100 determines whether or not the registration of the ID to the lock list has been normally completed in all the target servers 200 (S24). . If there is a server 200 that has failed to register or a server 200 that has failed to receive a response due to timeout, the ID lock processing may be terminated as an error. At this time, the processing already performed may be rolled back.
 対象の全てのサーバ200において正常にロックリストへのIDの登録が完了した場合、IDのロック処理を終了する。なお、IDのロックの解除についても上記と同様の処理により、各サーバ200においてロックリストから対象のIDの登録を削除することで実現することができる。 When all the target servers 200 have successfully registered IDs in the lock list, the ID lock process is terminated. The unlocking of the ID can also be realized by deleting the registration of the target ID from the lock list in each server 200 by the same process as described above.
 <処理フロー(ポインタファイル復元)>
 図6は、データ分散装置100上にポインタファイル150を有さない場合にこれを復元する際の処理の例について概要を示した図である。本実施の形態では、データ分散装置100が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置100(元データ400に対応するポインタファイル150を有するデータ分散装置100)とは異なる情報処理装置を新たにデータ分散装置100として利用する場合に、ポインタファイル150(およびこれに含まれる識別情報170)を復元して元データ400もしくは対応する分散データ410へのアクセスを可能とする。
<Processing flow (pointer file restoration)>
FIG. 6 is a diagram showing an outline of an example of processing when restoring the pointer file 150 when the data distribution apparatus 100 does not exist. In the present embodiment, when the data distribution device 100 is stolen or lost, or when another terminal is used for a business trip or the like, the original data distribution device 100 (the pointer file 150 corresponding to the original data 400 is stored). When an information processing apparatus different from the data distribution apparatus 100) is newly used as the data distribution apparatus 100, the pointer file 150 (and the identification information 170 included therein) is restored to restore the original data 400 or the corresponding distributed data 410 can be accessed.
 まず、データ分散装置100において、ユーザは、インタフェース部140を介して、ポインタファイル150を復元するためのキー情報となる、識別情報170におけるユーザID173の情報を指定する(S31)。次に、指定されたユーザID173の情報に基づいて、分散処理部130の分散部131により、各サーバ200に対して識別情報170の問い合わせを行う(S32)。具体的には、指定された値のユーザID173を含む識別情報170の問い合わせのメッセージを各サーバ200に対してブロードキャスト(もしくはマルチキャスト)する。 First, in the data distribution apparatus 100, the user designates the information of the user ID 173 in the identification information 170, which is key information for restoring the pointer file 150, via the interface unit 140 (S31). Next, based on the information of the specified user ID 173, the distribution unit 131 of the distribution processing unit 130 inquires each server 200 about the identification information 170 (S32). Specifically, the inquiry message of the identification information 170 including the user ID 173 having a designated value is broadcast (or multicast) to each server 200.
 識別情報170の問い合わせのブロードキャストメッセージを受信した各サーバ200では、メッセージに含まれるユーザID173の情報を取得し、当該ユーザID173に合致する識別情報170を検索する(S33)。具体的には、メッセージに含まれるユーザID173の値と合致するユーザID173を含む識別情報170を、保管している各分散データ410のヘッダ等から検索する。該当する識別情報170(これをヘッダ等に有する分散データ410)がない場合(例えば、図6のサーバA(200a))は、その旨をデータ分散装置100に応答する。 Each server 200 that has received the broadcast message for inquiry about the identification information 170 acquires the information of the user ID 173 included in the message, and searches for the identification information 170 that matches the user ID 173 (S33). Specifically, the identification information 170 including the user ID 173 that matches the value of the user ID 173 included in the message is searched from the header of each distributed data 410 stored. If there is no corresponding identification information 170 (distributed data 410 having this in the header or the like) (for example, server A (200a) in FIG. 6), a response to that effect is sent to the data distribution apparatus 100.
 一方、該当する識別情報170(これをヘッダ等に有する分散データ410)を有する場合(例えば、図6のサーバB(200b))は、当該識別情報170について、それぞれロックされているか否かを確認する(S34)。具体的には、該当する各識別情報170内の各ID(オリジナルFID171、カレントFID172およびユーザID173)の値が、サーバ200のロックリストに登録されているか否かを確認する。該当の識別情報170のうちロックされていないものが1つ以上存在する場合は、これをデータ分散装置100に送信する一方、全ての識別情報170がロックされている場合は、該当する識別情報170がない旨をデータ分散装置100に応答する(S35)。 On the other hand, when the corresponding identification information 170 (distributed data 410 having this in the header or the like) is included (for example, server B (200b) in FIG. 6), it is confirmed whether or not each of the identification information 170 is locked. (S34). Specifically, it is confirmed whether or not the value of each ID (original FID 171, current FID 172, and user ID 173) in each corresponding identification information 170 is registered in the lock list of the server 200. If one or more of the corresponding identification information 170 is not locked, this is transmitted to the data distribution apparatus 100. On the other hand, if all the identification information 170 is locked, the corresponding identification information 170 is transmitted. A response indicating that there is no data is returned to the data distribution apparatus 100 (S35).
 各サーバ200でのブロードキャストメッセージに対する処理が完了すると、データ分散装置100は、収集部132により、収集した識別情報170に基づいて、これを含むポインタファイル150を復元し(S36)、処理を終了する。なお、複数のサーバ200から同じ元データ400に対応する同一内容の識別情報170が複数送信される場合があり得るが、この場合は、重複するものを排除して1つの識別情報170にまとめる。 When the processing for the broadcast message in each server 200 is completed, the data distribution apparatus 100 restores the pointer file 150 including the identification information 170 by the collection unit 132 based on the collected identification information 170 (S36), and ends the processing. . There may be a case where a plurality of pieces of identification information 170 having the same contents corresponding to the same original data 400 are transmitted from a plurality of servers 200. In this case, duplicate information is excluded and combined into one piece of identification information 170.
 また、図2に示したような識別情報170の内容からは、IDの情報しか得られないため、ポインタファイル150を復元する際に、元データ400のファイル名と同じファイル名を設定することができない。従って、ダミーのファイル名を自動設定するか、識別情報170に、図2に示したようなIDの情報だけでなく、カレントFID172毎に元データ400のファイル名の情報も合わせて保持するようにし、この情報に基づいてポインタファイル150のファイル名を設定するようにしてもよい。 In addition, since only the ID information can be obtained from the contents of the identification information 170 as shown in FIG. 2, when restoring the pointer file 150, the same file name as the file name of the original data 400 can be set. Can not. Therefore, a dummy file name is automatically set, or the identification information 170 holds not only the ID information as shown in FIG. 2 but also the file name information of the original data 400 for each current FID 172. The file name of the pointer file 150 may be set based on this information.
 以上に説明したように、本発明の一実施の形態であるデータ分散管理システム1によれば、データ分散装置100上に分散データ410の保管先に係る情報を含む分散管理情報を有さず、また、分散データ410がいずれのサーバ200に保管されているかに影響を受けずに元データ400の分散保管を行うことが可能となる。 As described above, according to the data distribution management system 1 which is an embodiment of the present invention, the data distribution apparatus 100 does not have distribution management information including information related to the storage destination of the distributed data 410, Further, the original data 400 can be distributed and stored without being affected by which server 200 the distributed data 410 is stored.
 すなわち、データ分散装置100が各サーバ200から必要な分散データ410を収集する際は、データ分散装置100は、元データ400に係る識別情報170の全部もしくは一部を指定して、各サーバ200に対して、当該元データ400に係る分散データ410を保持しているか否かを問い合わせるメッセージをブロードキャスト等する。当該メッセージに対して、対象の分散データ410を保持しているサーバ200が、対象の分散データ410をデータ分散装置100に応答することで、データ分散装置100は、各分散データ410の保管場所に係る分散管理情報を要さずに必要な分散データ410を収集することが可能となる。 That is, when the data distribution apparatus 100 collects necessary distributed data 410 from each server 200, the data distribution apparatus 100 designates all or a part of the identification information 170 related to the original data 400 to each server 200. On the other hand, a message for inquiring whether or not the distributed data 410 related to the original data 400 is held is broadcast. In response to the message, the server 200 holding the target distributed data 410 returns the target distributed data 410 to the data distribution apparatus 100, so that the data distribution apparatus 100 stores the distributed data 410 in the storage location. Necessary distributed data 410 can be collected without requiring such distributed management information.
 これにより、データ分散装置100が携帯型端末であるような場合は特に、分散管理情報が第三者に取得されてしまうことによって、各分散データ410の保管場所に係る情報が知られてしまい、分散データ410にアクセス可能となってしまうリスクを回避することが可能となる。また、各分散データ410がいずれのサーバ200に保管されているかという点に依存せず、容易に各分散データ410を保管するサーバ200を変更することが可能となる。 Thereby, especially when the data distribution apparatus 100 is a portable terminal, the information regarding the storage location of each distributed data 410 is known by the distribution management information being acquired by a third party, It is possible to avoid the risk that the distributed data 410 can be accessed. Further, it is possible to easily change the server 200 that stores each distributed data 410 without depending on which server 200 stores each distributed data 410.
 また、データ分散装置100上に識別情報170およびこれを有するポインタファイル150を有していなくても、データ分散装置100は、各元データ400に対応する識別情報170およびこれを有するポインタファイル150を復元することが可能となる。例えば、ユーザID173の情報がユーザにより与えられると、データ分散装置100は、当該ユーザID173を含む識別情報170を有するか否かを問い合わせるメッセージをブロードキャスト等する。対象の識別情報170をヘッダ等に含む分散データ410を有しているサーバ200が、対象の識別情報170をデータ分散装置100に応答することで、データ分散装置100は、当該ユーザが使用可能な元データ400に対応する識別情報170およびこれを含むポインタファイル150を取得・復元することが可能となる。 Even if the data distribution apparatus 100 does not have the identification information 170 and the pointer file 150 having the identification information 170, the data distribution apparatus 100 stores the identification information 170 corresponding to each original data 400 and the pointer file 150 having the identification information 170. It can be restored. For example, when the information of the user ID 173 is given by the user, the data distribution apparatus 100 broadcasts a message inquiring whether or not the identification information 170 including the user ID 173 is included. When the server 200 having the distributed data 410 including the target identification information 170 in the header or the like responds to the data distribution device 100 with the target identification information 170, the data distribution device 100 can be used by the user. The identification information 170 corresponding to the original data 400 and the pointer file 150 including the identification information 170 can be acquired and restored.
 これにより、データ分散装置100が盗難や紛失等にあった場合や、出張等で他の端末を利用する場合など、当初のデータ分散装置100とは異なる情報処理装置を新たにデータ分散装置100として利用する場合にも容易にポインタファイル150を復元して元データ400もしくは対応する分散データ410にアクセスし、業務を継続することが可能となる。 As a result, when the data distribution apparatus 100 is stolen or lost, or when another terminal is used for a business trip or the like, an information processing apparatus different from the original data distribution apparatus 100 is newly set as the data distribution apparatus 100. Even when used, it is possible to easily restore the pointer file 150, access the original data 400 or the corresponding distributed data 410, and continue the business.
 以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.
 本発明は、1つ以上のデータを異なるサーバ等に分散保管するデータ分散管理システムに利用可能である。 The present invention can be used in a data distribution management system in which one or more data is distributed and stored in different servers.
 1…データ分散管理システム、
 100…データ分散装置、110…分散データ処理部、120…ポインタファイル処理部、121…識別情報生成部、122…ID生成部、130…分散処理部、131…分散部、132…収集部、133…サーバリスト、140…インタフェース部、150…ポインタファイル、160…ユーザ情報、170…識別情報、171…オリジナルファイルID(FID)、172…トレントファイルID(FID)、173…ユーザID、
 200、200a、b…サーバ、210…分散保管部、
 300…ネットワーク、
 400…元データ、410…分散データ。
 
 
 
 
 
1 ... Data distribution management system,
DESCRIPTION OF SYMBOLS 100 ... Data distribution apparatus, 110 ... Distributed data processing part, 120 ... Pointer file processing part, 121 ... Identification information generation part, 122 ... ID generation part, 130 ... Distributed processing part, 131 ... Distribution part, 132 ... Collection part, 133 ... server list, 140 ... interface unit, 150 ... pointer file, 160 ... user information, 170 ... identification information, 171 ... original file ID (FID), 172 ... torrent file ID (FID), 173 ... user ID,
200, 200a, b ... server, 210 ... distributed storage unit,
300 ... Network,
400 ... original data, 410 ... distributed data.




Claims (9)

  1.  記憶装置を有する複数の情報処理装置と、前記各情報処理装置とネットワークを介して接続され、元データに対応して一括して取り扱われる1つ以上の分散データを前記情報処理装置の前記記憶装置にそれぞれ分散保管するデータ分散装置とを有するデータ分散管理システムであって、
     前記データ分散装置は、
     前記元データと1つ以上の前記分散データとの対応付けに係る処理を行う分散データ処理部と、
     前記元データを識別して特定可能とする識別情報を生成し、前記元データに対応する、前記識別情報を含むポインタファイルを生成するポインタファイル処理部と、
     前記元データに対応する前記識別情報がそれぞれ付加された、前記元データに対応する前記各分散データを、それぞれ異なる前記情報処理装置に送信する分散処理部とを有し、
     前記各情報処理装置は、
     前記データ分散装置から送信された前記分散データを、前記記憶装置に格納する分散保管部を有することを特徴とするデータ分散管理システム。
    A plurality of information processing devices having a storage device, and one or more distributed data connected to each of the information processing devices via a network and handled collectively in correspondence with the original data; A data distribution management system having a data distribution device for distributed storage in each
    The data distribution device includes:
    A distributed data processing unit that performs processing related to the association between the original data and one or more of the distributed data;
    A pointer file processing unit that generates identification information that identifies and identifies the original data, and generates a pointer file that includes the identification information and corresponds to the original data;
    A distributed processing unit for transmitting each of the distributed data corresponding to the original data to each of the different information processing devices, to which the identification information corresponding to the original data is added,
    Each of the information processing devices
    A data distribution management system comprising: a distributed storage unit that stores the distributed data transmitted from the data distribution apparatus in the storage device.
  2.  請求項1に記載のデータ分散管理システムにおいて、
     前記データ分散装置の前記分散処理部は、
     ユーザにより指定された前記ポインタファイルが有する前記識別情報の全部または一部を指定して、前記各情報処理装置に対して、前記識別情報の指定された部分に対応する前記分散データを保持しているか否かを問い合わせる第1のメッセージをブロードキャストし、
     前記各情報処理装置の前記分散保管部は、
     前記第1のメッセージに指定された前記識別情報の指定された部分に合致する前記識別情報を含む前記分散データが自身の前記記憶装置に保管されているかを検索し、保管されている場合は該当する前記分散データを前記データ分散装置に送信し、
     前記データ分散装置の前記分散データ処理部は、
     前記各情報処理装置から送信された前記分散データに基づいて対応する前記元データを取得することを特徴とするデータ分散管理システム。
    In the data distribution management system according to claim 1,
    The distributed processing unit of the data distribution apparatus includes:
    All or part of the identification information included in the pointer file designated by the user is designated, and the distributed data corresponding to the designated part of the identification information is held for each information processing apparatus. Broadcast a first message asking whether or not
    The distributed storage unit of each information processing apparatus,
    A search is performed to determine whether the distributed data including the identification information that matches the specified part of the identification information specified in the first message is stored in its own storage device. Transmitting the distributed data to the data distribution device,
    The distributed data processing unit of the data distribution apparatus includes:
    A data distribution management system, wherein the corresponding original data is acquired based on the distributed data transmitted from the information processing apparatuses.
  3.  請求項2に記載のデータ分散管理システムにおいて、
     前記データ分散装置の前記分散処理部は、
     ユーザにより指定された前記識別情報の全部または一部の値を指定して、前記各情報処理装置に対して、対応する前記分散データの使用を制限する旨の第2のメッセージをブロードキャストし、
     前記各情報処理装置の前記分散保管部は、
     前記第2のメッセージに指定された前記識別情報の指定された部分の情報をロックリストに登録し、さらに、前記第1のメッセージに指定された前記識別情報の指定された部分に合致する前記識別情報を含む前記分散データを検索する際に、前記分散データに含まれる前記識別情報が前記ロックリストに登録された内容を含む場合には、該当する前記分散データの使用を制限することを特徴とするデータ分散管理システム。
    In the data distribution management system according to claim 2,
    The distributed processing unit of the data distribution apparatus includes:
    Specifying all or part of the identification information specified by the user, and broadcasting a second message to the respective information processing devices to limit the use of the corresponding distributed data,
    The distributed storage unit of each information processing apparatus,
    The information of the specified part of the identification information specified in the second message is registered in a lock list, and the identification that matches the specified part of the identification information specified in the first message When searching for the distributed data including information, if the identification information included in the distributed data includes contents registered in the lock list, the use of the corresponding distributed data is restricted. Distributed data management system.
  4.  請求項1~3のいずれか1項に記載のデータ分散管理システムにおいて、
     前記データ分散装置の前記分散処理部は、
     ユーザにより指定された前記識別情報のうちの前記ユーザを特定する値を指定して、前記各情報処理装置に対して、対応する前記識別情報を保持しているか否かを問い合わせる第3のメッセージをブロードキャストし、
     前記各情報処理装置の前記分散保管部は、
     前記第3のメッセージに指定された前記ユーザを特定する値に合致する前記識別情報を含む前記分散データが自身の前記記憶装置に保管されているかを検索し、保管されている場合は、該当する前記分散データに含まれる前記識別情報を前記データ分散装置に送信し、
     前記データ分散装置の前記ポインタファイル処理部は、
     前記各情報処理装置から送信された前記識別情報に基づいて対応する前記ポインタファイルを復元することを特徴とするデータ分散管理システム。
    The data distribution management system according to any one of claims 1 to 3,
    The distributed processing unit of the data distribution apparatus includes:
    A third message that specifies a value that identifies the user among the identification information specified by the user and inquires of each information processing apparatus whether or not the corresponding identification information is held. Broadcast,
    The distributed storage unit of each information processing apparatus,
    If the distributed data including the identification information that matches the value specified for the user specified in the third message is stored in the storage device of the third message, and if stored, it corresponds Transmitting the identification information included in the distributed data to the data distribution device;
    The pointer file processing unit of the data distribution apparatus is
    A data distribution management system which restores the corresponding pointer file based on the identification information transmitted from each information processing apparatus.
  5.  請求項1~4のいずれか1項に記載のデータ分散管理システムにおいて、
     前記識別情報は、前記元データ全体を識別するID情報と、前記元データが編集された際のバージョン毎の前記元データを識別するID情報と、前記元データの作成もしくは編集を行ったユーザを識別するID情報とを含むことを特徴とするデータ分散管理システム
    The data distribution management system according to any one of claims 1 to 4,
    The identification information includes ID information for identifying the entire original data, ID information for identifying the original data for each version when the original data is edited, and a user who created or edited the original data. Data distribution management system including ID information for identification
  6.  請求項1~5のいずれか1項に記載のデータ分散管理システムにおいて、
     前記データ分散装置の前記分散データ処理部は、
     前記元データから秘密分散法により複数の前記分散データを生成し、また、複数の前記分散データから前記秘密分散法により前記元データを復元することを特徴とするデータ分散管理システム。
    In the data distribution management system according to any one of claims 1 to 5,
    The distributed data processing unit of the data distribution apparatus includes:
    A data distribution management system, comprising: generating a plurality of the shared data from the original data by a secret sharing method; and restoring the original data from the plurality of the distributed data by the secret sharing method.
  7.  請求項1~6のいずれか1項に記載のデータ分散管理システムにおいて、
     前記各情報処理装置の前記分散保管部は、
     前記データ分散装置から送信された、前記元データに対応する前記分散データを前記記憶装置に格納する際に、前記元データに対応する過去の前記分散データが存在する場合は、過去の前記分散データを残した上で格納することを特徴とするデータ分散管理システム。
    The data distribution management system according to any one of claims 1 to 6,
    The distributed storage unit of each information processing apparatus,
    When the distributed data corresponding to the original data transmitted from the data distribution device is stored in the storage device, if the past distributed data corresponding to the original data exists, the past distributed data A data distribution management system characterized in that the data is stored after being stored.
  8.  請求項7に記載のデータ分散管理システムにおいて、
     前記各情報処理装置の前記分散保管部は、
     所定のタイミングで、所定の世代数よりも過去の前記元データに対応する前記分散データを削除することを特徴とするデータ分散管理システム。
    In the data distribution management system according to claim 7,
    The distributed storage unit of each information processing apparatus,
    A data distribution management system, wherein the distributed data corresponding to the original data past a predetermined number of generations is deleted at a predetermined timing.
  9.  請求項8に記載のデータ分散管理システムにおいて、
     前記データ分散装置の前記分散処理部は、
     ユーザにより指定された保存対象のバージョンの前記元データを特定する情報を指定して、前記各情報処理装置に対して、該当する前記元データに対応する前記分散データの削除を制限する旨の第4のメッセージをブロードキャストし、
     前記各情報処理装置の前記分散保管部は、
     前記第4のメッセージに指定されたバージョンの前記元データを特定する情報をリストに登録し、さらに、所定の世代数よりも過去の前記元データに対応する前記分散データを削除する際に、前記分散データに含まれる前記識別情報が前記リストに登録された前記元データを特定する情報を含む場合には、該当する前記分散データの削除を制限することを特徴とするデータ分散管理システム。
     
     
    The data distribution management system according to claim 8,
    The distributed processing unit of the data distribution apparatus includes:
    Designating information for specifying the original data of the version to be stored designated by the user, and restricting deletion of the distributed data corresponding to the corresponding original data for each information processing apparatus Broadcast 4 messages,
    The distributed storage unit of each information processing apparatus,
    Registering information that specifies the version of the original data specified in the fourth message in a list, and further deleting the distributed data corresponding to the original data that is past a predetermined number of generations, When the identification information included in the distributed data includes information specifying the original data registered in the list, the data distribution management system is configured to restrict deletion of the corresponding distributed data.

PCT/JP2011/075211 2011-11-01 2011-11-01 Data distribution management system WO2013065134A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2011/075211 WO2013065134A1 (en) 2011-11-01 2011-11-01 Data distribution management system
PCT/JP2012/077460 WO2013065544A1 (en) 2011-11-01 2012-10-24 Data distribution management system
JP2013541726A JP5667702B2 (en) 2011-11-01 2012-10-24 Data distribution management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2011/075211 WO2013065134A1 (en) 2011-11-01 2011-11-01 Data distribution management system

Publications (1)

Publication Number Publication Date
WO2013065134A1 true WO2013065134A1 (en) 2013-05-10

Family

ID=48191528

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2011/075211 WO2013065134A1 (en) 2011-11-01 2011-11-01 Data distribution management system
PCT/JP2012/077460 WO2013065544A1 (en) 2011-11-01 2012-10-24 Data distribution management system

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/077460 WO2013065544A1 (en) 2011-11-01 2012-10-24 Data distribution management system

Country Status (1)

Country Link
WO (2) WO2013065134A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946894B2 (en) * 2014-06-27 2018-04-17 Panasonic Intellectual Property Management Co., Ltd. Data processing method and data processing device
JP7277624B2 (en) * 2017-09-14 2023-05-19 株式会社日立システムズ Secret sharing management system, secret sharing management device and program
JP2019054363A (en) * 2017-09-14 2019-04-04 株式会社日立システムズ Server device, secret dispersion management system and secret dispersion management device
JP2020194462A (en) * 2019-05-29 2020-12-03 株式会社ミウラ Virus-free/restoration system, virus-free/restoration method, virus-free/restoration program and recording medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09128380A (en) * 1995-10-30 1997-05-16 Matsushita Electric Ind Co Ltd Document storing and managing system
JP2007334417A (en) * 2006-06-12 2007-12-27 Nippon Telegr & Teleph Corp <Ntt> Distributed information sharing method and terminal equipment
JP2008046860A (en) * 2006-08-16 2008-02-28 Fuji Xerox Co Ltd File management system and file management method
JP2011198325A (en) * 2010-03-24 2011-10-06 Hitachi Solutions Ltd Method and system for performing safe bringing-out of file data to outside

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4729683B2 (en) * 2004-03-26 2011-07-20 株式会社エヌ・ティ・ティ ネオメイト Data distribution storage device, data configuration management server, client terminal, and business consignment system including data distribution storage device
JP4594078B2 (en) * 2004-12-28 2010-12-08 株式会社オリコム Personal information management system and personal information management program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09128380A (en) * 1995-10-30 1997-05-16 Matsushita Electric Ind Co Ltd Document storing and managing system
JP2007334417A (en) * 2006-06-12 2007-12-27 Nippon Telegr & Teleph Corp <Ntt> Distributed information sharing method and terminal equipment
JP2008046860A (en) * 2006-08-16 2008-02-28 Fuji Xerox Co Ltd File management system and file management method
JP2011198325A (en) * 2010-03-24 2011-10-06 Hitachi Solutions Ltd Method and system for performing safe bringing-out of file data to outside

Also Published As

Publication number Publication date
WO2013065544A1 (en) 2013-05-10

Similar Documents

Publication Publication Date Title
US10983868B2 (en) Epoch based snapshot summary
EP2803006B1 (en) Cloud-based distributed data system
US11074132B2 (en) Post backup catalogs
JP4446738B2 (en) System and method for efficiently backing up computer files
US9195685B2 (en) Multi-tier recovery
US9021264B2 (en) Method and system for cloud based storage
US9542280B2 (en) Optimized recovery
US20110307451A1 (en) System and method for distributed objects storage, management, archival, searching, retrieval and mining in private and public clouds and deep invisible webs
US20160292045A1 (en) Disaster recovery as a service
US9858152B2 (en) Collaborative information source recovery
WO2013065544A1 (en) Data distribution management system
US7827145B1 (en) Leveraging client redundancy on restore
WO2013065545A1 (en) Data sharing system
JP5667702B2 (en) Data distribution management system
US9165019B2 (en) Self recovery
US20240020207A1 (en) Intelligent destination target selection for remote backups with awareness of temporary backup target for data restores
US11438295B1 (en) Efficient backup and recovery of electronic mail objects
US20240020200A1 (en) Restoring from a temporary backup target in an intelligent destination target selection system for remote backups
US9195549B1 (en) Unified recovery
US8713364B1 (en) Unified recovery
US8949661B1 (en) Federation of indices
JP2019079281A (en) Synchronization processor, file synchronization system and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11875138

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/08/2014)

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 11875138

Country of ref document: EP

Kind code of ref document: A1