WO2013018808A1 - 分散ストレージシステムおよび方法 - Google Patents
分散ストレージシステムおよび方法 Download PDFInfo
- Publication number
- WO2013018808A1 WO2013018808A1 PCT/JP2012/069499 JP2012069499W WO2013018808A1 WO 2013018808 A1 WO2013018808 A1 WO 2013018808A1 JP 2012069499 W JP2012069499 W JP 2012069499W WO 2013018808 A1 WO2013018808 A1 WO 2013018808A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- information
- access
- unit
- node
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/184—Distributed file systems implemented as replicated file system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0617—Improving the reliability of storage systems in relation to availability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2094—Redundant storage or storage space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Definitions
- the present invention is based on the priority claim of Japanese patent application: Japanese Patent Application No. 2011-169588 (filed on August 2, 2011), the entire description of which is incorporated herein by reference. Shall.
- the present invention relates to distributed storage, and more particularly to a distributed storage system, method and apparatus capable of controlling a data structure.
- a storage system Distributed to realize a system in which multiple computers (data nodes, or simply “nodes”) are connected to a network, and data is stored and used in a data storage unit (HDD (Hard Disk Drive), memory, etc.) of each computer
- a storage system Distributed Storage System
- a client who wants to access data since data is distributed to a plurality of nodes, a client who wants to access data first needs to know which node holds the data. Further, when there are a plurality of nodes having the data, the client who wants to access the data needs to know which node (one or more) is accessed.
- a file management In a distributed storage system, generally, as a file management, a method of separately storing a file body and metadata of the file (file storage location, file size, owner, etc.) is used.
- the metaserver method is known as one of techniques for a client to know a node holding data.
- a meta server configured by one or a plurality of (however, a small number) computers for managing data location information is provided.
- the processing performance of the metaserver that detects the position of the node holding the data is insufficient (managed per metaserver).
- the number of nodes to be processed becomes enormous and the processing performance of the meta server cannot catch up), and the introduced meta server may become a bottleneck in access performance.
- ⁇ Dispersed KVS> As another method (technique) for knowing the position of a node holding data, there is a technique for obtaining the position of data using a dispersion function (for example, a hash function). This type of technique is used in, for example, distributed KVS (Key Value Store).
- Distributed KVS is a type of distributed storage system that realizes the storage function of a simple data model consisting of a pair of “Key (key)” and “Value (value)” like an associative array with a plurality of nodes. is there.
- a distributed storage system also referred to as a distributed KVS system
- all clients share a distributed function and a list of nodes participating in the system (node list).
- the stored data is divided into fixed-length or arbitrary-length data fragments (Value).
- Each data fragment is given an identifier that can uniquely identify the data fragment, and the location of the data fragment is determined using the identifier and the distribution function.
- the storage destination nodes servers
- the storage destination based on the same key is always the same, so that the accessing client can easily grasp the data access destination.
- a key is used as an identifier, and a value corresponding to the key is used as a unit of stored data, thereby realizing a data access function based on the key and value.
- each client uses the key as an input value of the distribution function when accessing the data, and the position of the node storing the data based on the output value of the distribution function and the node list Is calculated arithmetically.
- the distribution function basically does not change over time (time invariant).
- the contents of the node list are changed as needed due to the failure or addition of nodes. For this reason, it is necessary for the client to be able to access such information by an arbitrary method.
- ⁇ Replication> In a distributed storage system, in order to ensure availability (availability: the ability of the system to operate continuously), data replication is generally held in multiple nodes, and data replication is used for load balancing. It has been broken.
- Patent Document 1 discloses a technique for realizing load distribution using a copy of data to be created.
- a server defines an information structure definition body in an information structure definition section, and a registration client constructs a database using the information structure definition body, generates a database access tool, and uses this tool to create a database.
- a configuration for registering information is disclosed.
- Patent Document 3 discloses a storage node that stores a copy of an object that can be accessed through a unique locator value in a distributed storage system, and a key map that stores a key map entry for each object. There is disclosed a configuration in which each key map entry includes a copy of an object, a corresponding key value, and each locator for a predetermined object.
- Patent Document 4 including the inventor of the present application as a joint inventor, every time data is updated, the changes are saved in time series, and data writing to the storage is tracked and captured, and the data is updated.
- journaling the changes to the secondary storage allows the data at any time in the past to be reproduced (Any Point In Time (APIT) Recovery).
- a CDP Continuous Data Protection
- Patent Document 4 is a storage system equipped with a data protection function that allows data at a past time to be restored by recording changes in a time series as a log when data update occurs.
- a predetermined trigger regarding data access is extracted, and data corresponding to the extracted predetermined trigger is It is created from data stored and held in the storage and log information, and the created data is stored in the storage as data corresponding to the predetermined trigger.
- JP 2006-12005 A Patent No. 4528039
- JP-A-11-195044 Patent No. 3911810
- data replication is held at multiple nodes to maintain availability, but the same physical structure is held at multiple nodes. As a result, access response performance and availability are guaranteed in the distributed storage system.
- the replicated data is held in the same physical structure in a plurality of nodes, for example, among the applications that read and analyze the data, the data is different from the data structure of the retained replicated data.
- the inventors of the present application can expect a special improvement in performance by performing special contrivances regarding, for example, write (write, update) of data and execution of conversion of the data into the target data structure. Now that I know, I propose this.
- An object of the present invention is to provide a distributed storage system and method that ensure availability in data replication in a distributed storage and improve both write performance and read-side processing performance.
- the following configuration is generally used (but is not limited to the following).
- each has a data storage unit and a plurality of data nodes that are network-coupled.
- the data node that is the replication destination of the data temporarily holds the data to be updated.
- An access history recording unit for storing history information of access to the data node;
- a storage system is provided.
- a data replication method for distributed storage in which trigger information that triggers execution of conversion to the target data structure performed asynchronously at the data node is varied based on access history information of the data node.
- FIG. 7 is a diagram schematically illustrating a write process and an analysis system process in FIG. 6.
- FIG. 7 is a diagram (part 1) illustrating an operation sequence of a write process according to an exemplary embodiment of the present invention. It is FIG. (2) explaining the operation
- each includes a data storage unit, and includes a plurality of data nodes that are network-coupled.
- the data node to be updated temporarily stores data to be updated.
- the data node Stored in an intermediate structure for holding write data (Queue (Queue), FIFO (First In First Out), Log (Log), etc.), asynchronous to update requests, converted to the target data structure Store in the data storage unit (12).
- the data node includes an access history recording unit (71) for storing a history of access frequency to the data node.
- access history information (access frequency) stored in the access history recording unit (71) is triggered information that triggers execution of conversion to the target data structure that is performed asynchronously in the data node. Based on the above, it is set to be variable.
- each of the replication target data nodes holds the data in the intermediate structure, returns a response, and changes the data structure held in the intermediate structure to the data to be updated. It is good also as a structure which stores in the said data storage part, after converting asynchronously to the target data structure when the time prescribed
- the data arrangement destination data node and the target data structure in the arrangement destination data node may be controlled in a predetermined table unit.
- a replica identifier that identifies a replica in association with a table identifier that is an identifier for identifying data to be stored, data structure information that identifies a type of data structure corresponding to the replica identifier, Data structure management information (921: FIG. 3 in FIG. 2) including trigger information, which is timer information until conversion to a specified data structure and stored, corresponding to the number of types of the data structure
- data placement specifying information (922 in FIG. 2: FIG.
- the access destination of the update process and the reference process is specified with reference to the structure information management device (9) having the structure information holding unit (92) for storing and managing the data, the data structure management information and the data arrangement specifying information
- a client function realization unit (61) having a data access unit, a plurality of data storage units (12) each connected to the structural information management device (9) and the client function realization unit (61)
- the data nodes (1 to 4) are provided.
- the data node When the data node performs update processing based on an access request from the client function realization unit (61), the data node temporarily holds data in an intermediate structure and then returns a response to the client function realization unit (61).
- the data management / processing unit (11) may include a data structure conversion unit (113) that performs processing for conversion into a structure.
- the access information recorded in the access history recording unit (71) or another access information obtained by processing the access information is used to store the data structure of the structure information holding unit. It is determined whether or not the update trigger information of the management information (921) is to be changed, and when the update trigger information of the data structure management information (921) is to be changed, a change determination unit (72) for notifying the structure information management apparatus
- the structure information management device (9) receives the notification of the change of the update opportunity information from the change determination unit (72) and changes the update opportunity information of the data structure management information (91). Is provided.
- the access frequency may be recorded as access information in the access history recording unit (71).
- the access information recorded in the access history recording unit (71) includes frequency information of read access from the data storage unit and data write access to the intermediate structure (or It may be information indicating an access occurrence pattern, a tendency of access occurrence, or the like.
- the data node includes an access reception unit (111), an access processing unit (112), and a data structure conversion unit (113).
- the data storage unit (12) of the data node includes structure-specific data storage units (121 to 123), and the access reception unit (111) receives an update request from the client function realization unit, and the data arrangement The update request is forwarded to the data node specified in correspondence with the replica identifier in the specific information, and the access request is logged in the access history recording unit.
- the access processing unit (112) of the data node receives the update request The update request is processed, and the update process is executed with reference to the information of the data structure management information.
- the update data is converted into the data structure specified in the data structure management information, and the data stored by structure
- the update data is once written in the intermediate structure, and the process completion is responded.
- the access receiving unit (111) Completion notification from the access processing unit (FIG. 14), or Completion notification from the access processing unit, and completion notification from each replica data node (FIG. 13),
- the data structure conversion unit (113) converts the data held in the intermediate structure into a data structure specified in the data structure management information, and stores the data in the structure-specific data storage units (121 to 123). You may make it store.
- FIG. 1 is a diagram illustrating an example of a system configuration according to an exemplary embodiment of the present invention.
- Data nodes 1 to 4 a network 5, a client node 6, and structure information management means (structure information management apparatus) 9 are provided.
- Data nodes 1 to 4 are data storage nodes constituting the distributed storage, and are configured by one or more arbitrary numbers.
- the network 5 realizes communication between network nodes including the data nodes 1 to 4.
- the client node 6 is a computer node that accesses the distributed storage. The client node 6 does not necessarily exist independently. An example in which the data nodes 1 to 4 also serve as client computers will be described later with reference to FIG.
- the data nodes 1 to 4 include data management / processing means (data management / processing units) 11, 21, 31, 41, data storage units 12, 22, 32, 42, and access history recording units 71-1 to 71-, respectively. 4 is provided.
- the client node 6 includes client function realization means (client function realization unit) 61.
- the client function realization means 61 accesses the distributed storage constituted by the data nodes 1 to 4.
- the client function implementation unit 61 includes a data access unit (data access unit) 611.
- the data access means (data access unit) 611 acquires structure information (data structure management information and data arrangement specifying information) from the structure information management means 9, and uses the structure information to specify an access destination data node.
- the structure information stored in the structure information holding unit 92 of the structure information management means 9 is stored in the own device in each data node 1 to 4 or an arbitrary device (switch, intermediate node) in the network 5. Or you may make it hold
- the access to the structure information stored in the structure information holding unit 92 may be made to access a cache (not shown) provided in the device itself or at a predetermined location.
- the synchronization of the structure information stored in the cache (not shown) can be applied to a well-known distributed system technique, and the details are omitted here. As is well known, storage performance can be increased by using a cache.
- the structure information management means (structure information management apparatus) 9 includes a structure information change means 91 for changing structure information, and a structure information holding unit 92 for holding structure information.
- the structure information holding unit 92 includes data structure management information 921 (see FIG. 2) and data arrangement specifying information 922 (see FIG. 4).
- the data structure management information 921 will be described later with reference to FIG. 3, but with respect to the table identifier, a replica identifier that identifies a replica, and data structure information that identifies the type of data structure corresponding to the replica identifier; , It has entries for the number of times of data replication, each of which is an update trigger that is time information until the data structure is stored.
- the data arrangement specifying information 922 will be described later with reference to FIG. 5, but the replica identifier and one or more data arrangement destination data node information corresponding to the replica identifier are associated with the table identifier. Have.
- the access history recording units 71-1 to 7-4 record log information of Read access and Write access of the data nodes 1 to 4.
- access log information frequency information corresponding to the number of accesses within a predetermined period may be stored.
- the client node 6 is provided independently (separately) from the data nodes 1 to 4, but the client node 6 is not necessarily provided separately (separated) from the data nodes 1 to 4. Not needed. That is, as will be described below as a modification, a configuration in which the client function realizing unit 61 is provided in any one or more of the data nodes 1 to 4 may be employed.
- FIG. 2 is a diagram for explaining the configuration example of FIG. 1 in detail.
- FIG. 2 shows a configuration centered on data nodes 1 to 4 in FIG. Since the data nodes 1 to 4 in FIG. 1 basically have the same configuration, in FIG. 2, the data management / processing unit 11, the data storage unit 12, and the access history recording unit 71 (71- 1).
- the structure information stored in the structure information holding unit 92 may be referred to by the reference numeral 92 for simplification.
- the data management / processing unit 11 of the data node 1 includes an access receiving unit (access receiving unit) 111, an access processing unit (access processing unit) 112, and a data structure converting unit (data structure converting unit) 113.
- the data management / processing means 21, 31, 41 of the other data nodes 2 to 4 have the same configuration.
- the access receiving unit 111 receives an access request from the data access unit 611 and returns a response to the data access unit 611 after the processing is completed.
- the access receiving means 111 records the information of the access request (access command) in the access history recording unit 71 together with the reception time information, for example.
- the data storage unit 12 includes a plurality of types of structure-specific data storage units. Although not particularly limited, FIG. 2 includes a structure-specific data storage unit 121 (data structure A), a structure-specific data storage unit 122 (data structure B), and a structure-specific data storage unit 123 (data structure C).
- the structure-specific data storage unit 121 (for example, the data structure A) has a structure specialized in response performance to processing (data addition or update) involving data writing. Specifically, the data change contents are queued (for example, FIFO (First In First Out)) on a high-speed memory (dual port RAM (Random Access Memory) etc.), the access request processing contents are stored in any storage medium Software to be added as a log is implemented.
- the data structure B and the data structure C are data structures different from the data structure A and have different data access characteristics. Note that the data storage unit 12 is not necessarily a single storage medium.
- the data storage unit 12 of FIG. 4 may be realized as a distributed storage system including a plurality of data placement nodes, and the structure-specific data storage units 12X may be distributed and stored.
- the data arrangement specifying information 922 is information (and means for storing and acquiring information) for specifying the storage location of data or data fragments stored in the distributed storage. As described above, for example, a meta server method or a distributed KVS method is used as the data distribution and arrangement method.
- information for managing data location information (for example, a block address and a corresponding data node address) is the data arrangement specifying information 922.
- the meta server can know where to place the necessary data.
- a list of nodes participating in the system corresponds to this data arrangement specifying information.
- the data node of the data storage destination can be determined.
- the data access means 611 is the data node 1 to 4 to be accessed using the data arrangement specifying information 922 in the structure information managing means 9 or the cache information of the data arrangement specifying information 922 stored in a predetermined location. And issues an access request to the access accepting means 111 of the data node.
- the data structure management information 921 in FIG. 2 is parameter information for specifying a data storage method for each data set.
- FIG. 3 is a diagram showing an example of the data structure management information 921 in FIG.
- the unit for controlling the data storage method is a table. Then, for each table (for each table identifier), a replica identifier, a data structure type, and an update trigger information are prepared for the number of data replications.
- each table holds three replicas for availability (retention) (however, the number of replicas is not limited to three).
- the replica identifier is information for identifying each replica, and is given as 0, 1, and 2 in FIG.
- the data structure is information indicating a data storage method.
- different types of data structures (A, B, C) are designated for each replica identifier.
- FIG. 3B shows an example of a data storage system of the data structures A, B, and C (however, the storage system is not limited to these storage systems).
- the replica identifier 0 of the table identifier “Stocks” is stored as a data structure B (row store).
- Each data structure is a method for storing data, A:
- a queue is a linked list.
- the row store (ROW STORE) stores the records of the table in the order of rows (ROW).
- FIG. 4 is a diagram schematically illustrating an example of the data holding structure of the table.
- the table in FIG. 4A includes a Key column and three Value columns, and each row includes a set of Key and three Value.
- the column store and the row store are stored in a row (row) base and a column (column) base in the storage order on the storage medium, respectively.
- the data of the replica identifiers 0 and 1 is held in the data structure B (row store) (see (B) and (C) of FIG. 4),
- the data of the replica identifier 2 is held as a data structure C (column store) (see FIG. 4D).
- the update trigger in the data structure management information 921 is a time trigger until data is stored as a designated data structure.
- the replica identifier 0 of Stocks 30 sec is specified. Therefore, in the data node storing the data structure B (row store) of the replica identifier 0 of Stocks, it is indicated that the update of data is reflected in the data storage unit 122 by structure of the row store method for 30 sec. . Until the data update is reflected, the data is held as an intermediate structure such as a queue. In the data node, a response from the client is also stored in the intermediate structure. In the present embodiment, the conversion to the designated data structure is performed asynchronously with respect to the update request.
- the update target data is transferred between data nodes in a synchronous manner, and the data structure is converted to the target structure asynchronously.
- An example in which a timer is used as update opportunity information for asynchronously converting the data structure will be described (however, the present invention is not limited to the following implementation).
- FIG. 5 is a diagram showing an example of the data arrangement specifying information 922 of FIG.
- An arrangement node data storage destination data node
- the data arrangement specifying information 922 corresponds to node list information (not shown) participating in the distributed storage.
- the placement node can be specified by a consistent hashing method using “table identifier” + “replica identifier” as key information. Further, it can be stored in an adjacent node in the consistent hashing method as a replica placement destination.
- FIG. 6 is a diagram schematically illustrating the basic format of table data retention and asynchronous update.
- FIG. 6 is a diagram for explaining a problem to be solved by the present invention, and therefore also a diagram for explaining a comparative example of the present invention.
- each data node When the value of the update opportunity information is larger than 0, each data node has a structure excellent in response speed of Write (update request) as an intermediate structure (also referred to as “Write priority structure” or “Write intermediate structure”). And accept the updated content. At the time of writing to the Write intermediate structure, a response indicating completion of processing is returned to the update request source client.
- Update data written to the Write intermediate structure of each data node is asynchronously updated to the conversion target data structure in each data node.
- the data structure A is stored and held in the write intermediate structure in the data node having the replica identifier 0 by Write, and the data node having the replica identifiers 1 and 2 is synchronized with the data node in the synchronous manner (Synchronous).
- the data of the data structure A held in the Write intermediate structure is replicated (replicated).
- the write intermediate structure temporarily stores and holds the data of the data structure A transferred from the data nodes with the replica identifiers 0 and 1, respectively.
- the conversion to the target data structures B and C is performed by updating the data structure management information 921 as shown in FIG. Specified by.
- a timer is started for the data structure A from Write, and when 30 sec (seconds) elapses (timeout: occurrence of update trigger), the data structure A is converted to data structure B (Row-Store).
- the timer In the data node with the replica identifier 1, when receiving the data structure A transferred from the data node with the replica identifier 0 by the synchronization method (Sync), the timer is started, and when 60 seconds elapse (timeout: occurrence of update trigger) , Converted to a data structure B (Row-Store).
- the timer In the data node of the replica identifier 2, when the data structure A transferred from the data node of the replica identifier 1 by the synchronization method (Sync) is received, the timer is started, and when 60 seconds elapse (timeout: occurrence of update trigger) The data structure is converted to C (Column-Store).
- replication between data nodes of update data (data structure A) written in the write intermediate structure of one data node is performed in synchronization with the write (update). Is called.
- the data structure is already converted to the data structure required for the READ access, so that the processing speed is increased by processing the READ access using the converted data structure. be able to. Further, depending on the type of READ access, an appropriate data structure can be selected and the access destination node can be used properly.
- the number of data structure types is three, A, B, and C, for the sake of simplicity of explanation.
- the number of data structure types is not limited to three.
- any plural types having different characteristics may be used.
- three types of queue, column store, and row store are illustrated as examples of the data structure, it is needless to say that the data structure is not limited to this example. For example, -Whether there is an index in the row store structure, ⁇ Difference in the type of column that created the index, -Row store format for storing updates in an appending structure, Etc.
- the duplication of different data structures in the synchronous (Sync) method uses a data structure such as a first-in first-out (FIFO) method queue / log as a write intermediate structure with a large overhead, and once the data is stored in the intermediate structure If this is reflected later, the efficiency of the conversion process is improved and the influence on the access performance of the system is small.
- a data structure such as a first-in first-out (FIFO) method queue / log as a write intermediate structure with a large overhead
- the trigger for performing data conversion asynchronously (setting value of the asynchronous timer (Async (timer)) in FIG. 6) is always optimal in accordance with the data usage status. Is not limited.
- the setting value of the asynchronous timer in FIG. 6 is short, and frequent data structure conversion may adversely affect the write performance of the system. Conversely, when the setting value (timeout time: update opportunity information) of the asynchronous timer in FIG. 6 is long and the frequency of data structure conversion is low, the system (analysis system) that uses the converted data structure has the latest Analyzing data is not guaranteed, and problems may arise in the reliability of analysis results.
- the data node stores the data in the Write intermediate structure.
- the time until conversion to the target data structure (column store format in FIG. 7) is increased. That is, in the data node, the conversion to the target data structure and the storage in the data storage unit are hardly performed, and the data is exclusively stored in the Write intermediate structure. In this case, it is advantageous for the performance of the write system.
- the data structure for example, column store format
- the conversion process by the data structure conversion means becomes efficient.
- the batch processing client target data structure that operates in batch processing or the like at a predetermined time or time zone
- Analyzing the data converted into (1) by batch processing analyzes the old data (old data) whose data structure has been converted.
- the data stored in the Write intermediate structure of the data node data structure conversion waiting
- the data structure is converted to the column store format that is the target data structure ( New data)
- the difference between the old and new data in the column store format is reflected in the analysis.
- the load on the client side increases.
- the set value (timeout time: update trigger information) of the asynchronous timer (Async (timer)) that defines the trigger of data structure conversion is relatively small in the data node, the data node It must be converted to the desired data structure little by little at short time intervals. For this reason, when the set value of the asynchronous timer (Async (timer)) is small, the write performance of the data node is disadvantageous compared to when the set value of the asynchronous timer (Async (timer)) is large.
- a client that analyzes data in batch processing can always refer to new data.
- the data whose data structure has been converted asynchronously is relatively recent data, even when the client refers to newer data, the amount of data read out from the Write intermediate structure is small, and the client side The load is also small.
- the trigger of the data structure conversion by the asynchronous method in each data node depends on, for example, the way of data reference (Read access) from the client side.
- the data structure conversion opportunity (update opportunity information in FIG. 3A) is adjusted in association with the access frequency. If the access frequency (Read access frequency) is less than or equal to a predetermined threshold value, the setting value (timeout time) of the asynchronous timer (Async (timer)) is increased / decreased. That is, the value of the update trigger information (asynchronous timer: update trigger information of FIG. 3A) of the data structure management information 921 (FIG. 2) is adjusted according to the access frequency.
- the setting value (timeout time) of the asynchronous timer (async) is increased / reduced. That is, when the frequency of Write access is larger than the frequency of Read access, the setting value (timeout time) of the asynchronous timer is increased.
- the reference timing (Read access date / time, time zone, etc.) is adjusted.
- the data accumulated in the Write intermediate structure may be converted into a target data structure and stored, and after the conversion, the setting value (timeout time) of the asynchronous timer (Async) may be increased.
- the number of conversions of the data structure is reduced by increasing the setting value (timeout time) of the asynchronous timer (Async (timer)).
- the value of the update trigger information (timeout time of the asynchronous timer) of the data structure management information 921 (FIG. 2) may be adjusted in synchronization (interlocking) with this change.
- the performance balance of the online processing write system and the batch processing analysis system is optimized only by adjusting the value of the update trigger information (asynchronous timer). I can do it.
- the access frequency is shown in order to clearly show the relationship with the change of the setting value of the asynchronous (Async) timer, and the configuration in which the access frequency information is stored and held in the data node is shown.
- the data node access frequency information may be provided outside the data node.
- the data node access frequency information may be stored and managed in a common storage for a plurality of data nodes.
- the data node takes an access history (log), calculates an access frequency based on the access history information, and changes a setting value (update opportunity information) of an asynchronous (Async) timer based on the access frequency. It may be.
- the setting value (update trigger information) of the asynchronous (Async) timer may be changed using an access pattern indicating access tendency, characteristics, or the like instead of the access frequency (the number of applications for access in a unit period). Good.
- FIG. 9 is a diagram illustrating an example of a configuration for adjusting the update trigger information of the data structure management information 921.
- a change determination unit (change determination unit) 72 that determines whether or not to update the update trigger information of the data structure management information 921 based on the access information of the access history recording unit 71 is provided. .
- the access receiving unit 111 of each data node records the received access request in the access history recording unit 71.
- the access history recording unit 71 records the access request (including the table identifier of FIG. 3A, the replica identification value of the data node, etc.) in association with the time information (date / time information) when the access request is received.
- the access history recording unit 71 is provided for each data node. However, the access history recording unit 71 may be provided for each data node group including a plurality of data nodes, or may be provided for the entire system. Alternatively, the access history recording unit 71 may be provided in each data node, and a mechanism for collecting the access frequency information individually collected in each data node by an arbitrary method may be provided.
- the change determination means (change determination unit) 72 uses the access history information stored in the access history recording unit 71, for example, the magnitude of the access frequency (threshold value) within a period of a predetermined length in the past (most recent), for example. Depending on the comparison result), whether or not to update the update opportunity information related to the corresponding data node may be changed. Alternatively, the access frequency within the most recent period of the predetermined length is calculated, and the magnitude of variation (threshold value and threshold value) from the value of the access frequency information in the period of the predetermined length immediately before that is calculated. The update opportunity information related to the corresponding data node may be changed according to the comparison result.
- the change determination unit 72 sets the asynchronous timer to the structure information change unit 91 when it is necessary to change the update trigger information (set value of the timeout time of the asynchronous timer) for asynchronous conversion in the related data node. Issue a value change request.
- the change determination unit 72 derives a change value of the set value of the asynchronous timer, sets the change value in the change request, and the structure information change unit 91 replaces the current set value of the asynchronous timer with the change value. It is good also as a structure.
- the relationship between the table identifier information, replica identifier, and data node information (arrangement node number) is defined in the data arrangement specifying information 922, and the structure information changing unit 91 responds to the change request from the change determining unit 72. Then, the update trigger information of the corresponding table identifier information and replica identifier in the data structure management information 921 is changed from the data node information (ID), replica identifier, and table identifier information.
- the change determination means 72 is provided separately from the data management / processing means 11 of the data node 1, but the change determination means 72 is mounted in the data management / processing means of each data node.
- the access history recording unit 71 may hold the access frequency information calculated by the change determination unit 72.
- the access frequency information is not necessarily limited to the number of occurrences of read access requests per unit period / the number of occurrences of write access requests, and the like.
- the occurrence patterns of read and write access requests read, write
- it may be information on the timetable).
- FIG. 10 is a flowchart for explaining the operation of the client function realization means 61 in which the client function realization means 61 in FIG. 1 issues a command to the update destination data node and waits.
- the client access flow will be described with reference to FIG.
- the client function realization means 61 acquires the information in the structure information holding unit 92 by accessing the master data (master file) or a cache at an arbitrary location (a cache memory storing a copy of a part of the master data). (Step S101 in FIG. 10).
- the client function realizing unit 61 identifies whether the command issued by the client is a WRITE process or a reference process (Read) (step S102).
- SQL SQL command to add a record to the table
- WRITE processing If it is a SELECT command (a SQL command for referring to or retrieving a record from the table), a reference processing, It is.
- the client function realization means 61 may be used to explicitly specify when calling an instruction (preparing such an API (Application Program Interface)).
- step S102 If the result of step S102 is a WRITE process, the process proceeds to step S103 and subsequent steps.
- the client function realizing unit 61 specifies a node that needs to be updated using the information of the data arrangement specifying information 922.
- the client function realization means 61 issues a command execution request (update request) to the identified data node (step S103).
- the client function realization means 61 waits for a response notification from the data node to which the update request is issued, and confirms that the update request is held in each data node (step S104).
- step S102 If the result of step S102 is a reference process, the process proceeds to step S105.
- step S105 the client function realization means 61 identifies (recognizes) the characteristics of the processing content (step S105).
- the client function realization means 61 performs a process of selecting an access target data node and issuing a command request based on the specified processing characteristics and other system conditions (step S106).
- the client function realization means 61 then receives the access processing result from the data node (step S107).
- the client function realization means 61 can know the type of the data structure holding the access target data from the information stored in the data structure management information 921.
- the replica identifiers 0 and 1 are the data structure B
- the replica identifier 2 is the data structure C.
- access to the WORKERS table is recorded in association with the replica identifier of the data node.
- the client function realization means 61 determines which data structure is suitable for the data access performed on the data node, and selects a suitable data structure. More specifically, for example, the client function realization means 61 analyzes an SQL statement that is an access request, and if the access is the summation of a certain column in the table whose table identifier is “WORKERS”, the data structure C ( Select Column Store. When the SQL statement is an access for retrieving a specific record, the client function realizing unit 61 determines that the data structure B (row store) is suitable.
- the client function realization means 61 may select either of the replica identifiers 0 and 1. In addition, it is desirable to use the replica identifier 1 in which the update trigger information is set to a large value, when it is not always necessary to perform processing with the latest data.
- the command passed to the client function implementing means 61 may be in a format that explicitly specifies the data structure to be used and the information specifying the required data freshness (data freshness).
- the client function realization means 61 calculates the data node to be accessed after specifying the replica identifier (data structure) to be accessed. At this time, the selection of the access node may be changed according to the situation of the distributed storage system. For example, when a certain table is stored in the data nodes 1 and 2 as the same data structure B and the access load of the data node 1 is large, the client function realization means 61 selects the data node 2. You may change to the operation.
- the access content to be processed is the data structure B when the access load of the data node 3 is smaller than that of the data nodes 1 and 2. Even if it is suitable, the client function realization means 61 may issue an access request to the data node 3 (data structure C).
- the client function realization means 61 issues an access request to the data node calculated and selected in this way (S106), and receives an access processing result from the data node (S107).
- FIG. 11 is a flowchart for explaining access processing in the data node of FIG. The operation of the data node will be described in detail with reference to FIGS.
- the access receiving unit 111 of the data management / processing unit 11 of the data node receives an access processing request (step S201 in FIG. 11).
- the access receiving unit 111 of the data management / processing unit 11 of the data node determines whether the content of the received processing request is a write process or a read (reference) process (step S202).
- step S203 the access processing unit 112 of the data management / processing unit 11 of the data node acquires information on the data structure management information 921 in the structure information holding unit 92 (step S203).
- the information acquisition of the data structure management information 921 may be performed by accessing the master data or by accessing cache data (data in a cache memory storing a copy of a part of the master data) at an arbitrary location.
- the client function realization means 61 in FIG. 1 gives information (access to master data or cache data) to the request issued to the data node, and the access processing means 112 uses the information. You may make it access.
- the access processing means 112 determines from the information of the data structure management information 921 whether or not the process update trigger for the data node is “0” (zero) (step S204).
- the access processing unit 112 stores the update data in the write intermediate structure (structure-specific data storage unit 121) (step S206).
- steps S205 and S206 after the processing is completed, the access receiving unit 111 returns a processing completion notification to the requesting client function realizing unit 61 (step S207).
- step S208 If the result of step S202 is a data reference process, the reference process is executed (step S208).
- the execution method of the Read (reference) processing is not particularly limited, but representatively, the following three types of methods can be exemplified.
- processing is performed using data in the data storage unit having the data structure specified in the data structure management information 921.
- This has the best performance, but when the time (cycle) of the update opportunity is large, there is a possibility that the data of the write intermediate structure is not reflected in the reference process. For this reason, data inconsistency may occur.
- the application developer recognizes and uses it in advance, or if it is known that data reading does not occur within the update timing after Write, or if new data access is required, the update trigger is There is no particular problem if it is decided to access the replica identifier data of “0”.
- the second method is a method of performing processing after waiting for the application of conversion processing performed separately. This is easy to implement, but the response performance deteriorates. For applications that do not require response performance, there is no problem.
- the third method reads and processes both the data structure specified in the data structure management information 921 and the data held in the Write intermediate structure. In this case, the latest data can always be responded, but the performance is deteriorated as compared with the first method.
- Any of the first to third methods may be used. Also, a method to be executed may be specified in a processing command issued from the client function realizing unit 61 that realizes a plurality of types and is described as a system setting file.
- FIG. 12 is a flowchart showing the operation of the data conversion process in the data structure conversion means 113 of FIG. The data conversion process will be described with reference to FIGS.
- the data structure conversion means 113 waits for a call due to the occurrence of a timeout in a timer (not shown) in the data node in order to periodically determine whether or not conversion processing is necessary (step S301 in FIG. 12).
- This timer may be provided in the data structure conversion means 113 as a dedicated timer.
- the timeout time of the timer corresponds to the set value of the update trigger information (sec) in FIG. 3A (Aync (timer) timeout time in FIG. 6).
- the structure information (data information) of the structure information holding unit 92 is acquired (step S302), and it is determined whether there is a data structure that needs to be converted (step S303). For example, when the determination is performed every 10 seconds by the timer, the data structure whose update opportunity is 20 seconds executes the conversion process every 20 seconds, so the conversion process does not need to be performed at the 10 second time point. If conversion processing is not necessary, the process returns to the timer call waiting (waiting until the timer is called due to the occurrence of a timeout) (step S301).
- FIG. 13 is a diagram illustrating a sequence of a write process (a process involving data update).
- the client function realization means 61 (client computer) of the client node 6 acquires the information of the data arrangement specifying information 922 (FIG. 2) held in the structure information holding unit 92 of the structure information management means 9 (or at an arbitrary place). Get information from cache memory).
- the client computer uses the acquired information to issue a write access command to the data node (data node 1 with replica identifier 0) where data is to be written.
- the access accepting means 111 of the data node 1 accepts the write access request and transfers the write access to the data nodes 2 and 3 designated by the replica identifiers 1 and 2.
- the data node 1 may access the structure information holding unit 92 (or an appropriate cache), or a write access command issued by the client function realization means 61. All or a part of the data structure management information 921 may be delivered together.
- the access processing means 112 of each data node processes the received write access request.
- the access processing means 112 refers to the information of the data structure management information 921 and executes the write process.
- the write processing content is stored in the structure-specific data storage unit 121 of the data structure A.
- the access processing unit 112 issues a completion notification to the access receiving unit 111 after the completion of the write process, and returns a completion response to the client computer.
- the replica destination data node (2, 3) returns a Write completion response to the access receiving means 111 of the replica source data node 1.
- the access receiving unit 111 waits for the completion notification from the access processing unit 112 of the data node 1 and the completion notifications of the data nodes 2 and 3 of each replica destination, and after receiving all of them, returns a response to the client computer.
- the data structure conversion means 113 (see FIG. 2) of the data node 1 converts the data stored in the write intermediate structure (structure-specific data storage unit 121 (data structure A)) into structure-specific data according to the timeout of the asynchronous timer.
- the data is converted into the storage unit 12X (final storage destination data structure specified in the data structure management information 921) and stored.
- the data nodes 2 and 3 perform conversion to the target data structure in accordance with the timeout of the asynchronous timer.
- ⁇ Write sequence 2> the data node 1 transfers the write request to the data nodes 2 and 3 that are the replica destinations. However, as shown in FIG. A Write request may be issued to all of the above.
- the client computer waits for a write access request.
- the client computer issues a write request to each of the storage destination data nodes 0, 1, and 2 and receives a completion response from each of the storage destination data nodes 0, 1, and 2.
- FIG. 15 is a diagram illustrating a modification of the configuration of FIG.
- the data node 3 in the column store (Column Store) format of FIG. 8 is composed of two data nodes 3A and 3B, and the column store (Column Store) format from the Write intermediate structure in one data node 3A.
- the analysis client uses the data of the other data node 3B (data before conversion stored in the Write intermediate structure and data converted to the column store format). Perform analysis with reference to.
- the setting of the asynchronous timer in the data nodes 3A and 3B is 20 seconds (Aync (20 seconds)), but the data structure conversion in the data node 3B is 10 seconds than the data structure conversion in the data node 3A.
- Running late For example, in the data node 3A, the data structure is converted in the time interval of 0 to 20 seconds, and the data is analyzed by the client (Client) that performs Read access in the subsequent time interval of 20 to 40 seconds.
- client Clientt
- data analysis is performed by a client (Client) that performs Read access in a time interval of 10 seconds to 30 seconds
- data structure conversion is performed in the subsequent time interval of 30 to 50 seconds.
- the data node 3A performs data structure conversion, and the data node 3B performs data analysis.
- the asynchronous timer setting in the data nodes 3A and 3B is set based on the access history information (access frequency) of the data nodes 3A and 3B.
- FIG. 16 shows an example in which an ETL (Extract / Transform / Load) is arranged between an online processing (online processing system that performs Write processing) and an analysis system (data warehouse) that is performed in batch processing or the like.
- ETL Extract / Transform / Load
- the data warehouse system includes a large-scale database for information analysis and decision making by extracting and reconstructing data (for example, transaction data) from the core system. It is necessary to perform data migration from the database of the mission critical system to the data warehouse database, and this process is called ETL (Extract / Transform / Load). “Extract” extracts data from the information source of the department, “Transform” converts and processes the extracted data as required by the business, and “Load” converts to the final target (ie, data warehouse). Represents loading processed data.
- ETL Extract / Transform / Load
- “Extract” extracts data from the information source of the department
- Transform converts and processes the extracted data as required by the business
- “Load” converts to the final target (ie, data warehouse). Represents loading processed data.
- FIG. 16 the above-described embodiment is applied to ETL data conversion. That is, the asynchronous data conversion by ETL in FIG. 16 corresponds to the
- the ETL asynchronously transfers data (replicated data) in the active system (online processing) to the column store (Column-Store) format for the analysis system (data warehouse).
- access history information access frequency information
- the timer that performs ETL conversion asynchronously based on the access frequency the data structure conversion bottleneck is eliminated, and the storage usage efficiency is improved. Can be increased.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本発明は、日本国特許出願:特願2011-169588号(2011年8月2日出願)の優先権主張に基づくものであり、同出願の全記載内容は引用をもって本書に組み込み記載されているものとする。
本発明は、分散ストレージに関し、特に、データ構造の制御が可能な分散ストレージシステム、および方法と装置に関する。
・データをどの計算機(ノード)に配置するか、
・処理をどの計算機(ノード)で行うか、
といった判断をソフトウェアや特別な専用ハードウェア等により実現している。分散ストレージシステムにおいて、システムの状態に対して、その動作を動的に変更することで、システム内のリソース使用量を調整し、システム利用者(クライアント計算機)に対する性能を向上している。
データを保持しているノードの位置を知るための別の手法(技術)として、分散関数(例えば、ハッシュ関数)を用いてデータの位置を求めるものがある。この種の手法は、例えば分散KVS(Key Value Store:キー・バリュー・ストア)で利用されている。分散KVSとは、連想配列のような「Key(キー)」と「Value(値)」のペアからなるシンプル(simple)なデータモデルのストレージ機能を、複数ノードで実現する分散ストレージシステムの一種である。分散KVS手法に基づく分散ストレージシステム(分散KVSシステムともいう)では、全てのクライアントで、分散関数と、システムに参加しているノードのリスト(ノードリスト)とを共有する。また、格納データは、固定長あるいは任意長のデータ断片(Value)に分かれている。各データ断片には、該データ断片を一意に特定可能な識別子が付与され、データ断片の配置場所を識別子と分散関数を用いて決定する。例えば、ハッシュ関数によりキーの値に応じて保存先のノード(サーバ)は異なるため、複数のノードにデータを分散保存することが可能となる。また、分散関数が同一ならば、同一キーに基づく保存先が常に同一となるため、アクセスするクライアントはデータアクセス先を容易に把握することができる。簡潔な分散KVSシステムでは、Keyを識別子とし、Keyに対応したValueを格納データの単位とすることで、KeyとValueに基づくデータアクセス機能を実現する。
分散ストレージシステムにおいては、可用性(Availability:システムが連続して動作できる能力)確保のために、データの複製を複数ノードで保持し、データの複製を、負荷分散に活用することが一般的に行われている。
前記データノードへのアクセスの履歴情報を記憶するアクセス履歴記録部を備え、
前記データノードで非同期に行われる前記目的のデータ構造への変換の実行の契機となる契機情報を、前記アクセス履歴記録部に記録されたアクセス履歴情報に基づき、可変させる手段を備えている、分散ストレージシステムが提供される。
データ更新要求に対応したデータの複製にあたり、複製先のデータノードでは、
更新対象のデータを、一旦、書き込みデータ保持用の中間構造に格納し、更新要求とは非同期で、それぞれ目的のデータ構造に変換して前記データ格納部に格納し、
前記データノードで非同期に行われる前記目的のデータ構造への変換の実行の契機となる契機情報を、前記データノードのアクセス履歴情報に基づき、可変させる、分散ストレージのデータ複製方法が提供される。
前記テーブル識別子に対応して、前記レプリカ識別子と、前記レプリカ識別子に対応した1つ又は複数のデータ配置先のデータノード情報とを備えたデータ配置特定情報(図2の922:図5)と、を記憶管理する構造情報保持部(92)を有する構造情報管理装置(9)と、前記データ構造管理情報と前記データ配置特定情報とを参照して、更新処理及び参照処理のアクセス先を特定するデータアクセス部を備えたクライアント機能実現部(61)と、それぞれが前記データ格納部(12)を備え、前記構造情報管理装置(9)と前記クライアント機能実現部(61)とに接続される複数の前記データノード(1~4)と、を備えている。前記データノードは、前記クライアント機能実現部(61)からのアクセス要求に基づき、更新処理を行う場合に、一旦中間構造にデータを保持した上で前記クライアント機能実現部(61)に応答を返すアクセス受付・処理部(111、112)と、前記データ構造管理情報を参照し、指定された更新契機に応答して、前記中間構造に保持されるデータを、前記データ構造管理情報で指定されたデータ構造に変換する処理を行うデータ構造変換部(113)とを備えたデータ管理・処理部(11)構成としてもよい。
前記アクセス受付部(111)は、
前記アクセス処理部からの完了通知(図14)、又は、
前記アクセス処理部からの完了通知、及びレプリカ先の各データノードからの完了通知(図13)、
を受けると、前記クライアント機能実現部(9)に対して応答し、
前記データ構造変換部(113)は、前記中間構造に保持されたデータを、前記データ構造管理情報に指定されているデータ構造に変換し変換先の前記構造別データ格納部(121~123)に格納するようにしてもよい。
図1は、本発明の例示的な一実施形態のシステム構成の一例を示す図である。データノード1~4、ネットワーク5、クライアントノード6、構造情報管理手段(構造情報管理装置)9を備える。
図2は、図1の構成例詳細に説明する図である。図2には、図1のデータノード1~4を中心に示した構成が示されている。図1のデータノード1~4は基本的に同一構成とされるため、図2では、データノード1のデータ管理・処理手段11、データ格納部12、アクセス履歴記録部71(図1の71-1に対応)が示されている。なお、図2等の図面において、簡単化のため、構造情報保持部92に格納される構造情報は参照符号92で参照される場合がある。
図2のデータ構造管理情報921は、データの集合毎にデータの格納方式を特定するためのパラメータ情報である。図3は、図2のデータ構造管理情報921の一例を示す図である。特に制限されるものではないが、図3に示す例では、データの格納方式を制御する単位を、テーブルとする。そして、テーブル毎(テーブル識別子毎)に、レプリカ識別子、データ構造の種別、更新契機の各情報を、データ複製の複製数分、用意する。
A:キュー、
B:ロウストア、
C:カラムストア
が指定されている。図3(B)の例では、テーブル識別子「Stocks」のレプリカ識別子0は、データ構造B(ロウストア)として格納される。
A:キュー(queue)は、リンクトリスト(Linked List)である。
図4は、テーブルのデータ保持構造の一例を模式的に示す図である。図4の(A)のテーブルは、Keyカラムと、3つのValueカラムを備え、各ローは、Keyと3つのValueのセットからなる。
レプリカ識別子0と1のデータとして、データ構造B(ロウストア)で保持し(図4の(B)、(C)参照)、
レプリカ識別子2のデータとして、データ構造C(カラムストア)として保持する(図4の(D)参照)。
再び図3(A)を参照すると、データ構造管理情報921(図2参照)における更新契機は、データを指定されたデータ構造として格納されるまでの時間契機である。Stocksのレプリカ識別子0の例では30secと指定されている。したがって、Stocksのレプリカ識別子0のデータ構造B(ロウストア)を格納するデータノードにおいて、ロウストア方式の構造別データ格納部122に対して、データの更新が反映されるのが30sec契機であることを示す。データ更新が反映されるまでの間は、キュー等の中間構造としてデータが保持される。また、データノードでは、クライアントからの要求に対しても、中間構造に格納して応答が行われる。本実施形態では、指定されたデータ構造への変換は、更新要求に対して、非同期(Asynchronous)で行われる。
図5は、図2のデータ配置特定情報922の一例を示す図である。各テーブル識別子のレプリカ識別子0、1、2(図3参照)のそれぞれに対して、配置ノード(データ格納先のデータノード)が指定されている。これは、前述したメタサーバ方式に対応している。分散KVS方式の場合、データ配置特定情報922は、分散ストレージに参加しているノードリスト情報(不図示)が該当する。このノードリスト情報をデータノード間で共有することによって、例えば「テーブル識別子」+「レプリカ識別子」をキー情報として、コンシステント・ハッシング方式により、配置ノードを特定することが出来る。また、レプリカの配置先として、コンシステント・ハッシング方式における隣接ノードに格納することができる。
図6は、テーブルのデータ保持、非同期更新の基本形式を模式的に説明する図である。図6は、本発明で解決されることになる問題点を説明するための図、したがって、本発明の比較例を説明するための図でもある。
・ロウストア構造におけるインデックスの有無、
・インデックスを作成したカラムの種類の違い、
・更新を追記構造で格納するロウストア形式、
等であってもよい。
そこで、本実施形態では、図8に示すように、例えば、アクセスの頻度に関連付けてデータ構造の変換の契機(図3(A)の更新契機情報)を調整する。アクセス頻度(Readアクセスの頻度)が予め定めた閾値以下/以上ならば、非同期タイマ(Async(タイマ))の設定値(タイムアウト時間)を大きく/小さくする。すなわち、データ構造管理情報921(図2)の更新契機情報(非同期タイマ:図3(A)の更新契機情報)の値を、アクセス頻度に合わせて、調整する。
図9は、データ構造管理情報921の更新契機情報の調整を行うための構成の一例を示す図である。図9に示すように、アクセス履歴記録部71のアクセス情報に基づき、データ構造管理情報921の更新契機情報の変更を行うか否かを判断する変更判定手段(変更判定部)72を備えている。
図10は、図1のクライアント機能実現手段61が、更新先のデータノードに対して命令を発行し、待ち合わせるというクライアント機能実現手段61の動作を説明するためのフローチャートである。図10を参照して、クライアントのアクセスフローについて説明する。
・INSERT命令(テーブルへレコードを追加するSQL命令)であれば、WRITE処理、
・SELECT命令(テーブルからレコードを参照、検索するSQL命令)であれば、参照系処理、
である。
図11は、図2のデータノードにおけるアクセス処理を説明するフローチャートである。図11、図2を参照して、データノードの動作について詳細に説明する。
図12は、図2のデータ構造変換手段113におけるデータ変換処理の動作を示すフローチャートである。図12、図2を参照して、データ変換処理を説明する。
図13は、Write処理(データの更新を伴う処理)のシーケンスを示す図である。
なお、図13の例では、データノード1が、レプリカ先のデータノード2、3に対して、Write要求を転送しているが、図14に示すように、クライアント計算機が、格納先のデータノードの全てに対して、Write要求を発行するようにしても良い。
図15は、図8の構成の一変形例を説明する図である。図15を参照すると、図8のカラムストア(Column Store)形式のデータノード3を、2つのデータノード3A、3Bで構成し、一方のデータノード3AでWrite中間構造からカラムストア(Column Store)形式のデータ構造への変換を行っている場合、解析系のクライアント(Client)は、他方のデータノード3Bのデータ(Write中間構造に格納された変換前のデータとカラムストア形式に変換済みのデータ)を参照して解析を行う。データノード3A、3Bでの非同期のタイマの設定は20秒(Aync(20秒))であるが、データノード3Bでのデータ構造の変換は、データノード3Aでのデータ構造の変換よりも10秒遅れている。例えばデータノード3Aでは、0秒~20秒の時間区間でデータ構造の変換が行われ、続く20~40秒の時間区間でReadアクセスを行うクライアント(Client)によるデータの解析が行われる。データノード3Bでは、10秒~30秒の時間区間でReadアクセスを行うクライアント(Client)によるデータの解析が行われ、続く30~50秒の時間区間で、データ構造の変換が行われる。したがって、例えば10秒と20秒の中間の15秒時点では、データノード3Aではデータ構造の変換、データノード3Bではデータの解析が行われる。なお、データノード3A、3Bにおける非同期のタイマの設定は、データノード3A、3Bのアクセス履歴情報(アクセス頻度)に基づいて設定される。
図16は、オンライン処理(Write処理を行うオンライン処理システム)とバッチ処理等で行われる解析系(データウエアハウス)間にETL(Extract/Transform/Load)を配設した例を示している。
5 ネットワーク
6 クライアントノード
9 構造情報管理手段(構造情報管理装置)
11、21、31、41 データ管理・処理手段(データ管理・処理部)
12、22、32、42 データ格納部
61 クライアント機能実現手段(クライアント機能実現部)
71 アクセス履歴記録部
72 変更判定手段(変更判定部)
91 構造情報変更手段(構造情報変更部)
92 構造情報保持部
111 アクセス受付手段(アクセス受付部)
112 アクセス処理手段(アクセス処理部)
113 データ構造変換手段(データ構造変換部)
121、122、123、12X 構造別データ格納部
611 データアクセス手段(データアクセス部)
612 構造情報キャッシュ保持部
921 データ構造管理情報
922 データ配置特定情報
Claims (14)
- それぞれがデータ格納部を備え、ネットワーク結合される複数のデータノードを備え、
データの更新要求に対して前記データの複製先のデータノードでは、
更新対象のデータを、一旦、書き込みデータ保持用の中間構造に格納し、受け取った前記更新要求とは非同期で、それぞれ目的のデータ構造に変換して前記データ格納部に格納し、
前記データノードへのアクセス頻度の履歴を記憶するアクセス履歴記録部を備え、
前記データノードで非同期に行われる前記目的のデータ構造への変換の契機となる契機情報を、前記アクセス履歴記録部に記録されたアクセス情報に基づき、可変させる手段を備えている、分散ストレージシステム。 - 複製先の前記データノードは、それぞれ、前記中間構造に、前記データを保持して、応答を返し、
前記中間構造に保持されるデータ構造を、前記更新対象のデータの受信から前記契機情報で規定される時間経過時に、前記目的のデータ構造に非同期で変換した上で前記データ格納部に格納する、請求項1記載の分散ストレージシステム。 - 予め定められたテーブル単位でデータ配置先のデータノード、配置先のデータノードにおける目的のデータ構造を制御する手段を備えた請求項1又は2記載の分散ストレージシステム。
- 格納対象のデータを識別する識別子であるテーブル識別子に対応させて、複製を特定するレプリカ識別子と、前記レプリカ識別子に対応したデータ構造の種類を特定するデータ構造情報と、指定されたデータ構造に変換して格納されるまでのタイマ情報である契機情報と、を、前記データ構造の種類の数に対応させて備えたデータ構造管理情報と、
前記テーブル識別子に対応して、前記レプリカ識別子と、前記レプリカ識別子に対応した1つ又は複数のデータ配置先のデータノード情報とを備えたデータ配置特定情報と、
を記憶管理する構造情報保持部を有する構造情報管理装置と、
前記データ構造管理情報と前記データ配置特定情報とを参照して、更新処理及び参照処理のアクセス先を特定するデータアクセス部を備えたクライアント機能実現部と、
それぞれが前記データ格納部を備え、前記構造情報管理装置と前記クライアント機能実現部とに接続される複数の前記データノードと、
を備え、
前記データノードは、
前記クライアント機能実現部からのアクセス要求に基づき、更新処理を行う場合に、中間構造にデータを保持して前記クライアント機能実現部に応答を返すアクセス受付・処理部と、
前記データ構造管理情報を参照し、指定された更新契機に応答して、前記中間構造に保持されるデータを、前記データ構造管理情報で指定されたデータ構造に変換する処理を行うデータ構造変換部と、
を備えたデータ管理・処理部を有する、ことを特徴とする、請求項1乃至3のいずれか1項に記載の分散ストレージシステム。 - 前記アクセス履歴記録部に記録されたアクセス情報、又は、前記アクセス情報を加工して得た別のアクセス情報を用いて、前記構造情報保持部の前記データ構造管理情報の更新契機情報を変更するか否か判定し、
前記データ構造管理情報の更新契機情報を変更する場合、前記構造情報管理装置に通知する変更判定部を備え、
前記構造情報管理装置は、前記変更判定部からの前記更新契機情報の変更の通知を受け、前記データ構造管理情報の更新契機情報を変更する構造情報変更部を備えた、請求項4記載の分散ストレージシステム。 - 前記アクセス履歴記録部に記録されたアクセス情報が、前記データ格納部からの読み出しアクセスと、前記中間構造へのデータの書き込みアクセスの頻度情報を含む、請求項1又は5記載の分散ストレージシステム。
- 前記データノードにおいて、
前記アクセス受付・処理部が、
アクセス受付部、アクセス処理部を備え、
前記データノードの前記データ格納部は、構造別データ格納部を備え、
前記アクセス受付部は、
前記クライアント機能実現部からの更新要求を受け付け、前記データ配置特定情報においてレプリカ識別子に対応して指定されているデータノードに対して更新要求を転送し、
さらに前記アクセス履歴記録部にアクセス要求を記録し、
前記データノードの前記アクセス処理部は、
受け取った更新要求の処理を行い、前記データ構造管理情報の情報を参照して更新処理を実行し、その際、前記データ構造管理情報の情報から、前記データノードに対する前記更新契機情報が零の場合、更新データを、前記データ構造管理情報に指定されるデータ構造に変換して、前記構造別データ格納部に格納し、
前記更新契機が零でない場合、前記中間構造に、一旦、更新データを書き込み、処理完了を応答し、
前記アクセス受付部は、
前記アクセス処理部からの完了通知、又は、
前記アクセス処理部からの完了通知及びレプリカ先の各データノードからの完了通知、
を受けると、前記クライアント機能実現部に対して応答し、
前記データ構造変換部は、前記中間構造のデータを、前記データ構造管理情報に指定されているデータ構造に変換し変換先の前記構造別データ格納部に格納する、請求項5記載の分散ストレージシステム。 - 前記目的のデータ構造が同一の少なくとも二つのデータノードを備え、
前記二つのデータノードでは、前記書き込みデータ保持用の中間構造に保持されたデータから前記目的のデータ構造への変換を、設定された前記契機情報に基づき、それぞれ、時間的に重ならないタイミングで行い、一方のデータノードで、前記書き込みデータ保持用の中間構造に保持されたデータを、前記目的のデータ構造に変換しているとき、他方のデータノードでは、前記目的のデータ構造に変換されたデータの読み出しが行われる、請求項1記載の分散ストレージシステム。 - それぞれがデータ格納部を備え、ネットワーク結合される複数のデータノードを備えた分散ストレージのデータ複製方法において、
データの更新要求に対応したデータの複製にあたり、複製先のデータノードでは、
更新対象のデータを、一旦、書き込みデータ保持用の中間構造に格納し、更新要求とは非同期で、それぞれ目的のデータ構造に変換して前記データ格納部に格納し、
前記データノードで非同期に行われる前記目的のデータ構造への変換の実行の契機となる契機情報を、前記データノードへのアクセスの履歴情報に基づき、可変させる、データ複製方法。 - 複製先の前記データノードは、前記中間構造に前記データを保持して応答を返し、
前記中間構造に保持されるデータ構造を、前記更新対象のデータの受信から前記契機情報で規定される時間経過時に、前記目的のデータ構造に非同期で変換する、請求項9記載のデータ複製方法。 - 予め定められたテーブル単位でデータ配置先のデータノード、配置先のデータノードにおける目的のデータ構造を制御する請求項9又は10記載のデータ複製方法。
- 格納対象のデータを識別する識別子であるテーブル識別子に対応させて、複製を特定するレプリカ識別子と、前記レプリカ識別子に対応したデータ構造の種類を特定するデータ構造情報と、指定されたデータ構造に変換して格納されるまでの時間情報である契機情報と、を、前記データ構造の種類の数に対応させて管理するデータ構造管理情報と、
前記テーブル識別子に対応して、前記レプリカ識別子と、前記レプリカ識別子に対応した1つ又は複数のデータ配置先のデータノード情報とを備えたデータ配置特定情報と、
を構造情報管理装置の構造情報保持部にて記憶し、
データアクセス部において、前記データ構造管理情報と前記データ配置特定情報とを参照して、更新処理及び参照処理のアクセス先を特定し、
前記データノードは、
前記クライアントからのアクセス要求に基づき、更新処理を行う場合に、中間構造にデータを保持して応答を返し、
前記データ構造管理情報を参照し、指定された更新契機に応答して、前記中間構造に保持されるデータを、前記データ構造管理情報で指定されたデータ構造に変換する、ことを特徴とする、請求項9乃至11のいずれか1項に記載のデータ複製方法。 - 前記アクセスの履歴情報が、前記データ格納部からの読み出しアクセスと、前記中間構造へのデータの書き込みアクセスの頻度情報を含む、請求項9又は12記載のデータ複製方法。
- 前記目的のデータ構造が同一の少なくとも二つのデータノードを用意し、
前記二つのデータノードでは、前記書き込みデータ保持用の中間構造に保持されたデータから前記目的のデータ構造への変換を、設定された前記契機情報に基づき、それぞれ、時間的に重ならないタイミングで行い、一方のデータノードで、前記書き込みデータ保持用の中間構造に保持されたデータを、前記目的のデータ構造に変換しているとき、他方のデータノードでは、前記目的のデータ構造に変換されたデータの読み出しが行われる、請求項9記載のデータ複製方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013526936A JP6044539B2 (ja) | 2011-08-02 | 2012-07-31 | 分散ストレージシステムおよび方法 |
US14/236,666 US9609060B2 (en) | 2011-08-02 | 2012-07-31 | Distributed storage system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011-169588 | 2011-08-02 | ||
JP2011169588 | 2011-08-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013018808A1 true WO2013018808A1 (ja) | 2013-02-07 |
Family
ID=47629329
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/069499 WO2013018808A1 (ja) | 2011-08-02 | 2012-07-31 | 分散ストレージシステムおよび方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9609060B2 (ja) |
JP (1) | JP6044539B2 (ja) |
WO (1) | WO2013018808A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015156111A (ja) * | 2014-02-20 | 2015-08-27 | 日本電気株式会社 | 配置先決定装置、配置先決定方法、及び配置先決定プログラム |
JP2015184685A (ja) * | 2014-03-20 | 2015-10-22 | 日本電気株式会社 | 情報記憶システム |
WO2016194159A1 (ja) * | 2015-06-03 | 2016-12-08 | 株式会社日立製作所 | 計算機、データベース管理方法、データベース管理システム |
CN114489464A (zh) * | 2020-10-27 | 2022-05-13 | 北京金山云网络技术有限公司 | 数据写入方法、装置和电子设备 |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10706021B2 (en) | 2012-01-17 | 2020-07-07 | Oracle International Corporation | System and method for supporting persistence partition discovery in a distributed data grid |
US10229221B1 (en) * | 2012-03-01 | 2019-03-12 | EMC IP Holding Company LLC | Techniques for cache updates based on quality of service |
US9542433B2 (en) | 2012-12-20 | 2017-01-10 | Bank Of America Corporation | Quality assurance checks of access rights in a computing system |
US9483488B2 (en) | 2012-12-20 | 2016-11-01 | Bank Of America Corporation | Verifying separation-of-duties at IAM system implementing IAM data model |
US9189644B2 (en) | 2012-12-20 | 2015-11-17 | Bank Of America Corporation | Access requests at IAM system implementing IAM data model |
US9537892B2 (en) | 2012-12-20 | 2017-01-03 | Bank Of America Corporation | Facilitating separation-of-duties when provisioning access rights in a computing system |
US9529629B2 (en) * | 2012-12-20 | 2016-12-27 | Bank Of America Corporation | Computing resource inventory system |
US9639594B2 (en) | 2012-12-20 | 2017-05-02 | Bank Of America Corporation | Common data model for identity access management data |
US9489390B2 (en) | 2012-12-20 | 2016-11-08 | Bank Of America Corporation | Reconciling access rights at IAM system implementing IAM data model |
US9495380B2 (en) | 2012-12-20 | 2016-11-15 | Bank Of America Corporation | Access reviews at IAM system implementing IAM data model |
US9477838B2 (en) | 2012-12-20 | 2016-10-25 | Bank Of America Corporation | Reconciliation of access rights in a computing system |
JP6152704B2 (ja) * | 2013-05-28 | 2017-06-28 | 富士通株式会社 | ストレージシステム、情報処理装置の制御プログラム、およびストレージシステムの制御方法 |
GB2522459B (en) * | 2014-01-24 | 2021-02-17 | Metaswitch Networks Ltd | Timer services |
JP6549704B2 (ja) | 2014-09-25 | 2019-07-24 | オラクル・インターナショナル・コーポレイション | 分散コンピューティング環境内でゼロコピー2進基数木をサポートするためのシステムおよび方法 |
US10664495B2 (en) | 2014-09-25 | 2020-05-26 | Oracle International Corporation | System and method for supporting data grid snapshot and federation |
US9781225B1 (en) * | 2014-12-09 | 2017-10-03 | Parallel Machines Ltd. | Systems and methods for cache streams |
CN105739909A (zh) | 2014-12-11 | 2016-07-06 | 国际商业机器公司 | 分布式存储系统中基于时间的数据放置方法和装置 |
US10096065B2 (en) * | 2015-01-16 | 2018-10-09 | Red Hat, Inc. | Distributed transactions with extended locks |
US11163498B2 (en) * | 2015-07-01 | 2021-11-02 | Oracle International Corporation | System and method for rare copy-on-write in a distributed computing environment |
US10860378B2 (en) | 2015-07-01 | 2020-12-08 | Oracle International Corporation | System and method for association aware executor service in a distributed computing environment |
US10585599B2 (en) | 2015-07-01 | 2020-03-10 | Oracle International Corporation | System and method for distributed persistent store archival and retrieval in a distributed computing environment |
US11567972B1 (en) * | 2016-06-30 | 2023-01-31 | Amazon Technologies, Inc. | Tree-based format for data storage |
US11550820B2 (en) | 2017-04-28 | 2023-01-10 | Oracle International Corporation | System and method for partition-scoped snapshot creation in a distributed data computing environment |
US10769019B2 (en) | 2017-07-19 | 2020-09-08 | Oracle International Corporation | System and method for data recovery in a distributed data computing environment implementing active persistence |
US10721095B2 (en) | 2017-09-26 | 2020-07-21 | Oracle International Corporation | Virtual interface system and method for multi-tenant cloud networking |
US10862965B2 (en) | 2017-10-01 | 2020-12-08 | Oracle International Corporation | System and method for topics implementation in a distributed data computing environment |
CN108038199A (zh) * | 2017-12-12 | 2018-05-15 | 清华大学 | 一种层次结构的传感器时序数据存储方法和系统 |
CN108255429B (zh) * | 2018-01-10 | 2021-07-02 | 郑州云海信息技术有限公司 | 一种写操作控制方法、系统、装置及计算机可读存储介质 |
US20200051147A1 (en) * | 2018-08-10 | 2020-02-13 | Digital River, Inc. | Deriving and Presenting Real Time Marketable Content by Efficiently Deciphering Complex Data of Large Dynamic E-Commerce Catalogs |
US10796276B1 (en) * | 2019-04-11 | 2020-10-06 | Caastle, Inc. | Systems and methods for electronic platform for transactions of wearable items |
WO2021046750A1 (zh) * | 2019-09-11 | 2021-03-18 | 华为技术有限公司 | 数据重分布方法、装置及系统 |
CN111125101B (zh) * | 2019-12-16 | 2023-10-13 | 杭州涂鸦信息技术有限公司 | 一种数据中心表结构一致性监控方法及系统 |
US11803568B1 (en) * | 2020-03-25 | 2023-10-31 | Amazon Technologies, Inc. | Replicating changes from a database to a destination and modifying replication capacity |
US11947838B2 (en) * | 2020-11-30 | 2024-04-02 | International Business Machines Corporation | Utilizing statuses to preserve a state of data during procedures such as testing without causing functional interruptions |
CN117895546A (zh) * | 2024-03-15 | 2024-04-16 | 国网山东省电力公司东营供电公司 | 基于农机电气化改造的新能源一体化站点配置方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002197781A (ja) * | 2000-12-22 | 2002-07-12 | Ricoh Co Ltd | 情報記録再生装置と情報記録再生方法 |
JP2007317017A (ja) * | 2006-05-26 | 2007-12-06 | Nec Corp | ストレージシステム及びデータ保護方法とプログラム |
JP2010128752A (ja) * | 2008-11-27 | 2010-06-10 | Internatl Business Mach Corp <Ibm> | データベース・システム、サーバ、更新方法およびプログラム |
WO2010101189A1 (ja) * | 2009-03-06 | 2010-09-10 | 日本電気株式会社 | 情報処理システムと方法 |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5119465A (en) * | 1989-06-19 | 1992-06-02 | Digital Equipment Corporation | System for selectively converting plurality of source data structures through corresponding source intermediate structures, and target intermediate structures into selected target structure |
JP3911810B2 (ja) | 1998-01-07 | 2007-05-09 | 富士ゼロックス株式会社 | 情報流通システム及び可搬型記憶媒体 |
JP3756708B2 (ja) * | 1999-09-30 | 2006-03-15 | 株式会社東芝 | 情報処理端末装置およびそのファイル管理方法 |
US6658540B1 (en) * | 2000-03-31 | 2003-12-02 | Hewlett-Packard Development Company, L.P. | Method for transaction command ordering in a remote data replication system |
JP4528039B2 (ja) | 2004-06-29 | 2010-08-18 | 国立大学法人東京工業大学 | 自律ストレージ装置、自律ストレージシステム、ネットワーク負荷分散プログラム及びネットワーク負荷分散方法 |
US7936691B2 (en) * | 2005-02-28 | 2011-05-03 | Network Equipment Technologies, Inc. | Replication of static and dynamic databases in network devices |
JP4615337B2 (ja) * | 2005-03-16 | 2011-01-19 | 株式会社日立製作所 | ストレージシステム |
US7689602B1 (en) * | 2005-07-20 | 2010-03-30 | Bakbone Software, Inc. | Method of creating hierarchical indices for a distributed object system |
JP2007128335A (ja) * | 2005-11-04 | 2007-05-24 | Nec Corp | レプリケーション調停装置と方法並びにプログラム |
US7617253B2 (en) * | 2005-12-19 | 2009-11-10 | Commvault Systems, Inc. | Destination systems and methods for performing data replication |
US7651593B2 (en) * | 2005-12-19 | 2010-01-26 | Commvault Systems, Inc. | Systems and methods for performing data replication |
US7716180B2 (en) | 2005-12-29 | 2010-05-11 | Amazon Technologies, Inc. | Distributed storage system with web services client interface |
US7783956B2 (en) * | 2006-07-12 | 2010-08-24 | Cronera Systems Incorporated | Data recorder |
US8433730B2 (en) * | 2006-10-31 | 2013-04-30 | Ariba, Inc. | Dynamic data access and storage |
US7925749B1 (en) * | 2007-04-24 | 2011-04-12 | Netapp, Inc. | System and method for transparent data replication over migrating virtual servers |
US8006111B1 (en) * | 2007-09-21 | 2011-08-23 | Emc Corporation | Intelligent file system based power management for shared storage that migrates groups of files based on inactivity threshold |
JP5199003B2 (ja) * | 2008-09-25 | 2013-05-15 | 株式会社日立製作所 | 管理装置及び計算機システム |
WO2011121869A1 (ja) * | 2010-03-29 | 2011-10-06 | 日本電気株式会社 | データアクセス場所選択システム、方法およびプログラム |
US9342574B2 (en) * | 2011-03-08 | 2016-05-17 | Nec Corporation | Distributed storage system and distributed storage method |
US8626799B2 (en) * | 2011-10-03 | 2014-01-07 | International Business Machines Corporation | Mapping data structures |
US9069835B2 (en) * | 2012-05-21 | 2015-06-30 | Google Inc. | Organizing data in a distributed storage system |
US9635109B2 (en) * | 2014-01-02 | 2017-04-25 | International Business Machines Corporation | Enhancing reliability of a storage system by strategic replica placement and migration |
US10061628B2 (en) * | 2014-03-13 | 2018-08-28 | Open Text Sa Ulc | System and method for data access and replication in a distributed environment utilizing data derived from data access within the distributed environment |
JP6361199B2 (ja) * | 2014-03-20 | 2018-07-25 | 日本電気株式会社 | 情報記憶システム |
US9569108B2 (en) * | 2014-05-06 | 2017-02-14 | International Business Machines Corporation | Dataset replica migration |
-
2012
- 2012-07-31 WO PCT/JP2012/069499 patent/WO2013018808A1/ja active Application Filing
- 2012-07-31 JP JP2013526936A patent/JP6044539B2/ja active Active
- 2012-07-31 US US14/236,666 patent/US9609060B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002197781A (ja) * | 2000-12-22 | 2002-07-12 | Ricoh Co Ltd | 情報記録再生装置と情報記録再生方法 |
JP2007317017A (ja) * | 2006-05-26 | 2007-12-06 | Nec Corp | ストレージシステム及びデータ保護方法とプログラム |
JP2010128752A (ja) * | 2008-11-27 | 2010-06-10 | Internatl Business Mach Corp <Ibm> | データベース・システム、サーバ、更新方法およびプログラム |
WO2010101189A1 (ja) * | 2009-03-06 | 2010-09-10 | 日本電気株式会社 | 情報処理システムと方法 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015156111A (ja) * | 2014-02-20 | 2015-08-27 | 日本電気株式会社 | 配置先決定装置、配置先決定方法、及び配置先決定プログラム |
JP2015184685A (ja) * | 2014-03-20 | 2015-10-22 | 日本電気株式会社 | 情報記憶システム |
US10095737B2 (en) | 2014-03-20 | 2018-10-09 | Nec Corporation | Information storage system |
WO2016194159A1 (ja) * | 2015-06-03 | 2016-12-08 | 株式会社日立製作所 | 計算機、データベース管理方法、データベース管理システム |
CN114489464A (zh) * | 2020-10-27 | 2022-05-13 | 北京金山云网络技术有限公司 | 数据写入方法、装置和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2013018808A1 (ja) | 2015-03-05 |
JP6044539B2 (ja) | 2016-12-14 |
US20140173035A1 (en) | 2014-06-19 |
US9609060B2 (en) | 2017-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6044539B2 (ja) | 分散ストレージシステムおよび方法 | |
JP5765416B2 (ja) | 分散ストレージシステムおよび方法 | |
US11816063B2 (en) | Automatic archiving of data store log data | |
US8271455B2 (en) | Storing replication requests for objects in a distributed storage system | |
US8918392B1 (en) | Data storage mapping and management | |
CN102265277B (zh) | 数据存储系统的操作方法和装置 | |
US8832234B1 (en) | Distributed data storage controller | |
CN104965850B (zh) | 一种基于开源技术的数据库高可用实现方法 | |
JP5387757B2 (ja) | 並列データ処理システム、並列データ処理方法及びプログラム | |
KR20180055952A (ko) | 데이터베이스 관리 시스템에서의 데이터 복제 기법 | |
JP5548829B2 (ja) | 計算機システム、データ管理方法及びデータ管理プログラム | |
JP5686034B2 (ja) | クラスタシステム、同期制御方法、サーバ装置および同期制御プログラム | |
CN103294167B (zh) | 一种基于数据行为的低能耗集群存储复制装置和方法 | |
WO2013166520A1 (en) | Repository redundancy implementation of a system which incrementally updates clients with events that occurred via cloud-enabled platform | |
US11544232B2 (en) | Efficient transaction log and database processing | |
US9984139B1 (en) | Publish session framework for datastore operation records | |
CN113010496A (zh) | 一种数据迁移方法、装置、设备和存储介质 | |
JP2003296171A (ja) | 電子帳票管理方法及びプログラム | |
WO2013172405A1 (ja) | ストレージシステムおよびデータアクセス方法 | |
CN108776690B (zh) | 基于分层治理的hdfs分布式与集中式混合数据存储系统的方法 | |
CN106873902B (zh) | 一种文件存储系统、数据调度方法及数据节点 | |
JP2006323663A (ja) | 情報処理システムとレプリケーション方法並びに差分情報保持装置とプログラム | |
US11210212B2 (en) | Conflict resolution and garbage collection in distributed databases | |
JP2012008934A (ja) | 分散ファイルシステム及び分散ファイルシステムにおける冗長化方法 | |
WO2012046585A1 (ja) | 分散ストレージシステム、その制御方法、およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12819936 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013526936 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14236666 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12819936 Country of ref document: EP Kind code of ref document: A1 |