CN103034739A

CN103034739A - Distributed memory system and updating and querying method thereof

Info

Publication number: CN103034739A
Application number: CN2012105941055A
Authority: CN
Inventors: 任景彪; 孟祥斌; 施宁; 崔维力; 武新; 赵伟
Original assignee: TIANJIN NANDA GENERAL DATA TECHNOLOGY Co Ltd
Current assignee: TIANJIN NANDA GENERAL DATA TECHNOLOGY Co Ltd
Priority date: 2012-12-29
Filing date: 2012-12-29
Publication date: 2013-04-10

Abstract

The invention provides a distributed memory system, which comprises a safety group consisting of at least one node, wherein the node is used for saving data and can be used for saving copies of other nodes in the same safety group. In addition, the invention also provides an updating and querying method aiming at the distributed memory system. The distributed memory system has the beneficial effects that the system has high usability, high query parallelism and management and extension capacity of large data volume and the management cost and the maintenance cost can be effectively reduced.

Description

A kind of distributed memory system and renewal thereof and querying method

Technical field

The invention belongs to field of data storage, especially relate to a kind of distributed memory system that possesses high availability.

Background technology

In the face of the fulminant growth of data volume, also there are a lot of technical matterss in the urgent need to address in current database technology.For database, record the just the most basic requirement of correct result, outside this, database also will accomplish how to improve processing speed, availability of data, the security of data and the dilatancy of data set.Large data are had higher requirement to above four aspects.

According to physical laws, improving redundance is the unique channel that improves availability of data.At present mainly be the redundancy by hardware-level, the redundancy of communication link level, the redundancy of software and data redundancy.For improving the usually implementation method of usage data redundancy of high availability.

Traditional data redundancy scheme has following several, and weak point is respectively arranged:

1, passes through data Replica, all present Data Replication Technology in Mobile (synchronous or asynchronous), for example disk mirroring (EMC TimeFinder series), database file copy the dbbackup utility that (such as DoubleTake, Veritas and Legato) and database manufacturer carry and all can only produce the passive replication data set.Usually, in order to realize copy function, need to consume master server 5%(asynchronous) to 30%(synchronous) processing power.The data of passive renewal are general only for disaster recovery.Passive update data set also has two fatal problems: in case the master processor fault causes corrupted data, the data set of passive renewal also can be destroyed.The time that this method easily makes system be in single node danger increases, and has reduced the utilization factor of system.

2, the two-node cluster hot backup of complete machine backup.This mode all is that a station server is moving at any time, although the response speed of system does not reduce, the utilization factor of system has but descended 50%.

3, asynchronous Active Replication data set: this technology is to give master server issued transaction first to finish, and then these issued transaction are given serially backup server again and guaranteed data's consistency to carry out same operation.Data set and master data set that this technology generates have a mistiming, so be only applicable to disaster recovery, data mining, report form statistics and limited online application.All commercial data bases are all supported asynchronous Active Replication technology.The difficulty of this way is in the management of replication queue, and this formation is to shield speed difference between master server and the backup server.Because master server can utilize the concurrency of all software and hardwares to process concurrent affairs as much as possible, and backup server can only copy serially, in the situation that the high load capacity issued transaction, replication queue often may be overflowed.Because control the speed of transaction request without any way, in the situation that the high load capacity issued transaction, replication queue can only be rebuild regularly.

4, synchronous Active Replication data set: all concurrent transactions of this technical requirement are processed and are finished simultaneously on all database servers.A direct benefit is exactly the problem of management that has not had formation, also can realize higher performance and the availability of Geng Gao by load balancing simultaneously.This technology also has two kinds of diverse implementation methods: complete serialization and dynamic serializing.Complete serialized issued transaction comes from the issued transaction engine of master data base, RAC, UDB, MSCS (SQL Server2005) and ASE are finished full serial and realize in conjunction with two-phase commitment protocol, and the target of this design is exactly the data set that can be used for quick disaster recovery in order to obtain portion.There is the problem of two keys in this system.The first, two-phase commitment protocol is the agreement of a kind of " ALL OR NOTHING ".Just can find after scrutinizing two-phase commitment protocol, in order to obtain this backup data set, the availability of issued transaction can reduce half.The second, complete serialized way has been introduced again the unmatched problem of MS master-slave database server speed.Forced synchronism causes the speed of whole system to be lowered to complete serialized level.

Summary of the invention

The problem to be solved in the present invention provides a kind of distributed memory system, especially is fit to the high availability storage of Large Volume Data.

For solving the problems of the technologies described above, the technical solution used in the present invention is: a kind of distributed memory system, described storage system comprise the secure group (SafeGroup) that is comprised of at least one node, and described node is in order to save data.

Further, described node can be preserved the copy of other node in the same secure group (SafeGroup).

Further, described copy is mirror back-up.

Further, all nodes are all preserved identical sheet data in the described same secure group (SafeGroup).

Further, select the sheet data of the burst of numbering equally by own node serial number be the master data of self node to described node.

The affairs numbering that system's overall situation separately further, is arranged on the data on described each node.

According to another aspect of the present invention, the present invention also provides a kind of querying method of distributed memory system, comprising:

Query requests is sent to system;

Initiate node according to the situation generated query plan of enabled node in query requests and the system;

Initiate node inquiry plan is sent to each enabled node in the system;

Each enabled node is the interior master data of this node of computing separately;

Each enabled node returns operation result to the initiation node and gathers.

According to another aspect of the present invention, the present invention also provides a kind of update method of distributed memory system, comprising:

For system adds distributed lock;

The application system global transaction is numbered, and update request is sent to the node at main burst and all mirror image burst places again;

The burst of each node updates oneself, and the data of new adding are stamped global transaction number mark;

Last release profile formula lock.

Owing to adopting technique scheme, so that individual node operates the data of oneself separately, can take full advantage of node processing power separately.Thereby so that system has management and the extended capability of high availability, high inquiry degree of parallelism and large data capacity, because the granularity of data trnascription is identical with the granularity of hardware NATURAL DISTRIBUTION, thereby also can effectively reduces and administer and maintain cost.

Description of drawings

Fig. 1 is the schematic diagram of an embodiment of a kind of distributed memory system of the present invention

Fig. 2 is each node storage schematic diagram data in the secure group (SafeGroup) in one embodiment of the present of invention

Embodiment

Fig. 1 is the schematic diagram of one embodiment of the present of invention, as seen from Figure 1, this embodiment comprises three secure group, each secure group respectively comprises three nodes, wherein each node possesses certain transaction capabilities, node in each system can be connected to each other by network or alternate manner, can access mutually by network or other connected mode.Each secure group (SafeGroup) also is connected to each other, and also can access mutually.

Fig. 2 is the schematic diagram of the content of storing in included three nodes in a certain secure group (SafeGroup) among the embodiment, as seen from Figure 2, each node is also preserved the mirror image data of other node in the native system except the data that store self, and the content of storage is identical.

The metadata of the present embodiment comprises: the tabulation of all enabled nodes in the system, and the mapping table between all nodes and its place secure group, and current maximum global transaction numbering.With respect to Hadoop HDFS(distributed file system), the data trnascription of the present embodiment is take node as granularity; And the data trnascription of HDFS is take piecemeal (general 64MB is as a data block) as granularity.Just because of this, the metadata of HDFS need to be managed the corresponding relation of all data blocks and copy and its place node, and when the data scale of HDFS management was very large, the scale of its metadata was also considerable.Thereby the cost of HDFS when management of metadata is higher, when System Expansion, capacity reducing and data heavily distribute, needs migration and revises a large amount of metadata.In the present embodiment, owing to the granularity identical (node level) of data trnascription granularity and system physical deploy, need not to store the corresponding relation of any data and node, thereby metadata is very simple.When System Expansion, capacity reducing and data heavily distribute, only need revise node listing, node and secure group mapping table, the metadata maintenance cost greatly reduces.

Simultaneously, the metadata of lightweight can the interior formula of adaptive zone be disposed the metadata management strategy of the present embodiment, also can the outer formula of adaptive zone dispose; And for known schemes (such as HDFS), too the metadata of heavyweight almost defines its deployment way and can only take to be with outer formula metadata management scheme.

The present embodiment is realized the synchronous of distributed lock and metadata by a kind of realization of increasing income (TOTEM: a kind of communication of based on token ring and distributed consensus agreement) of Paxos agreement.

The present embodiment can effectively reduce project management and maintenance cost.When doing the distributed query plan, do not need as Hadoop, obtain Data Position from meta data server, and only need simply same inquiry plan is sent to all nodes in the secure group, they separately computing wherein a part of data get final product, do not have the "bottleneck" problem of meta data server.

The present embodiment adds first distributed lock when new data more, and obtains the global transaction numbering, and update request is sent to needs the data fragmentation that upgrades and the respective nodes at mirror image burst place thereof again.Each node executed in parallel is upgraded operation, and numbers mark for the data of this renewal arrange global transaction.If the renewal operation failure of certain node then is set to this node " unavailable " state.The node of " unavailable " state carries out the data synchronous working on the backstage, when its data return to enabled node equal state on the same group, recover its upstate.When each node that relates to when current renewal was updated successfully or is set to " unavailable " state, Data Update was complete, release profile formula lock.

The present embodiment is in when inquiry, the set of obtaining first all available nodes from system metadata; Again query requests is sent to these enabled nodes, for the node of " unavailable " state, will be redirected on the same group other enabled nodes to the query requests of its main fragment data.

Carried out in advance level during this example storage data fragmentation and cut apart, every generation 2GB data with regard to horizontal split once.When doing dilatation or capacity reducing, only need the burst of mobile these 2GB, so that distributing, final data are similar to evenly, can realize that data heavily distribute, metadata then needs to revise hardly.This example can be served relational model (because level is cut apart the field constraint that can not break relational model), but is not limited to relational model.As long as have the data model of similar demand to be suitable for to the data distribution characteristics.

Can be found out by above, the present invention has significantly improved for following four aspects:

1, improve processing speed: the number of node in the increase system, namely increase the data trnascription number of system, the total number of copies of system is more, and the data volume of single copy can be fewer, and concurrent processing speed is faster.

2, availability of data: during Data Update, only need a node success in the secure group, the data of renewal are namely available; During inquiry, under normal circumstances, can reach maximum performance by the concurrent computing of a plurality of nodes, when having node unavailable, can be redirected to other nodes to the computing to this node master burst.

3, the security of data: identical with the data security of one-of-a-kind system.

4, the dilatancy of data set: in the expansion data, the mode that data are pressed with the direct copying data file is moved, and need not to safeguard in a large number metadata, greatly reduces the running cost of system extension.

Above one embodiment of the present of invention are had been described in detail, but described content only is preferred embodiment of the present invention, can not be considered to be used to limiting practical range of the present invention.All equalizations of doing according to the present patent application scope change and improve etc., all should still belong within the patent covering scope of the present invention.

Claims

1. distributed memory system, it is characterized in that: described storage system comprises the secure group (SafeGroup) that is comprised of at least one node, and described node is in order to save data.

2. distributed memory system according to claim 1 is characterized in that: described node can be preserved the copy of other node in the same secure group (SafeGroup).

3. distributed memory system according to claim 2, it is characterized in that: described copy is mirror back-up.

4. distributed memory system according to claim 2, it is characterized in that: the data on the described node are divided into the sheet number identical with same secure group (SafeGroup) interior nodes number.

5. distributed memory system according to claim 4 is characterized in that: all nodes are all preserved identical sheet data in the described same secure group (SafeGroup).

6. distributed memory system according to claim 5 is characterized in that: described node is selected the sheet data of the burst of same numbering by own node serial number be the master data of self node.

7. distributed memory system according to claim 6 is characterized in that: the affairs numbering that system's overall situation is separately arranged on the data on described each node.

8. the update method of a distributed memory system as claimed in claim 7 comprises:

For system adds distributed lock;

Last release profile formula lock.

9. the querying method of a distributed memory system as claimed in claim 6 comprises:

Query requests is sent to system;

Query requests is issued certain node in the system (initiation node);

Initiate node according to enabled node situation generated query plan in query requests and the system;

Initiate node inquiry plan is sent to each enabled node in the system;

Each enabled node returns operation result to the initiation node and gathers.