CN105045917A

CN105045917A - Example-based distributed data recovery method and device

Info

Publication number: CN105045917A
Application number: CN201510515919.9A
Authority: CN
Inventors: 赖春波; 薛英飞; 王仆; 赵博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-08-20
Filing date: 2015-08-20
Publication date: 2015-11-11
Anticipated expiration: 2035-08-20
Also published as: US10783163B2; WO2017028394A1; US20180150536A1; CN105045917B

Abstract

The invention discloses an example-based distributed data recovery method and device. A specific execution manner of the method comprises the following steps: detecting a down non-master node; distributing a plurality of secondary storage units which belong to the down node to at least one online node; carrying out Hash classification on examples kept in logs and distributing the examples to a plurality of threads; and recovering data of a plurality of primary storage units in parallel in the online node. According to the example-based distributed data recovery method and device disclosed by the invention, the parallel recovery of data of down nodes in distributed databases in nodes is realized.

Description

A kind of distributed data restoration methods of Case-based Reasoning and device

Technical field

The present invention relates to database field, be specifically related to a kind of distributed data restoration methods and device of Case-based Reasoning.

Background technology

Along with the development of internet, distributed data base obtains to be applied more and more widely, thus also improves constantly the requirement of its reliability.In order to reduce the time of break in service, the data base set group node data reconstruction method carried out after machine of delaying just seems most important.The distributed data restoration methods that current industry uses the data of machine node of delaying is distributed to multiple line node recover, and adopt single thread at each intra-node, or by recovering realizing multithreading after the retrys such as log recording sequence.Use these methods recover data obviously exist delay machine node data recover efficiency low, the shortcoming low to Duty-circle.

Summary of the invention

Embodiments provide a kind of distributed data restoration methods of Case-based Reasoning, when distributed data base system delays machine, parallel data recovery can be carried out, improve data recovering efficiency and Duty-circle, thus improve the availability of Database Systems.

An aspect of the application provides a kind of distributed data restoration methods of Case-based Reasoning, comprising:

The non-master of the machine of delaying detected, the multiple secondary storage belonging to machine node of delaying are distributed at least one line node, the example that daily record is deposited is carried out to Hash classification and is assigned to multiple thread, and in the data of the multiple one-level storage unit of the inner parallel recovery of line node.

In a kind of exemplary embodiment of the application's first aspect, tertiary storage unit has the index of secondary storage node, each in multiple secondary storage stores the index of multiple one-level storage unit, each in multiple one-level storage unit stores an example, and the data stored in multiple one-level storage unit are orderly according to example, non-master and host node form the node in cluster jointly, the one-level storage unit of each management secondary storage index in non-master, host node management tertiary storage unit and secondary storage.

In addition, in data recovery procedure, utilize Hash to sort out to make the daily record of identical example be mapped to same thread, thus according to the difference of example, daily record is assigned to multiple thread; At least one line node recurs recovery data according to the content of daily record in the in-process logic of carrying out of oneself.After at least one line node completes date restoring, the management node of secondary storage is changed to the line node performing recovery operation.

The second aspect of the application provides a kind of device, comprises host node device and non-master equipment, and host node device is for managing host node, and non-master equipment is for managing non-master.

In a kind of exemplary embodiment of the application's second aspect, host node device comprises the detection module of the non-master for detecting the machine of delaying, and for the multiple secondary storage corresponding to machine node of delaying being distributed to the distribution module of at least one line node.

In addition, non-master equipment comprises: receiver module, for being dispensed to the information of relevant multiple secondary storage corresponding to machine node of delaying of non-master; Scan module, for scanning machine node log of delaying; And processing module, sort out same with what make the daily record of identical example be mapped in multiple thread for carrying out Hash.

Wherein, the management node of secondary storage, also for after at least one line node completes date restoring, is changed to the line node performing recovery operation by distributor.Receiver module is also for receiving the network address and the port name of machine node of delaying.

The beneficial effect of the application is: after node delays machine, by carrying out Hash classification to the example deposited in daily record, being assigned to multiple thread, making line node recover data concurrently at intra-node.Thus improve data recovering efficiency and the utilization factor to node.

Accompanying drawing explanation

Fig. 1 is a kind of distributed data system general frame figure that embodiments of the invention provide;

Fig. 2 is the data store organisation block diagram of the Case-based Reasoning that embodiments of the invention provide;

Fig. 3 is the Data Recovery Process figure of the distributed data restoration methods of the Case-based Reasoning that embodiments of the invention provide;

Fig. 4 be the distributed data restoration methods of the Case-based Reasoning that embodiments of the invention provide in data recovery procedure, the schematic diagram of Hash classification process;

Fig. 5 is the block diagram of a kind of host node device that embodiments of the invention provide; And

Fig. 6 is the block diagram of a kind of non-master equipment that embodiments of the invention provide.

Embodiment

The invention provides a kind of distributed data restoration methods of Case-based Reasoning, below in conjunction with Figure of description, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.And when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.

Fig. 1 is a kind of distributed data system general frame figure that embodiments of the invention provide, and it should be understood that, the embodiment of the present invention is not limited to the framework shown in Fig. 1.

In the present embodiment, there are two kinds of nodes in data-base cluster: host node 100 and non-master 102.In a cluster, a configuration host node 100 usually.In another embodiment, also can configure multiple standby host node, but only have a host node in running order.As can be seen from Figure 1, multiple non-master 102 is also comprised.Under the state that Database Systems normally work, the equal on-line operation of multiple node, is called line node, as indicated by numeral 104 in Fig. 1.Have at any time in distributed data base node delay machine situation occur, in this case, non-master 102 is divided into again individual machine node 106 and N2 (N1<N2<N wherein remembers that not a node adds up to N) the individual line node 104 of delaying of N1 (1≤N1<N2).Host node 100 and non-master 102 data cooperatively in management database.In distributed data base, data are present in distributed file system 108 in the form of a file, and file system 108 is present in storer regularly.Node can carry out read-write operation to the data in file system 108.Daily record (Log) 110 in file system 108 have recorded all changes (comprise insertion, deletion etc.) of node for data, and thus distributed file system is shared for node.

Fig. 2 is the data store organisation block diagram of the Case-based Reasoning that embodiments of the invention provide.In this example, technology realizes being basic with being stored as of example.Particularly, example can be storage object, such as machine name (as server name), program name etc.In addition, database is the storage organization of three levels, and example is stored in one-level storage unit (as SSTABLE) 202.Other two levels are respectively secondary storage (as LeafTablet) 204 and tertiary storage unit (as RootTablet) 206.

Alternatively, database store structure comprises multiple one-level storage unit 202, and one-level storage organization 202 can be storage unit minimum in database, and the data of each one-level storage unit 202 li are orderly according to major key.Instance Name is contained in major key as a part for major key, and the data thus stored are orderly according to example.In addition, only store the data of an example in each one-level storage unit 202, the sequence number of each one-level storage unit 202 is unique.Database store structure also can comprise multiple secondary storage 204, and secondary storage 204 can be the least unit of cluster host node 100 metadata store.The index according to the orderly one-level storage unit 202 of major key is deposited in each secondary storage 204.In addition, database store structure also can comprise one or more tertiary storage unit 206, and tertiary storage unit 206 is used for index secondary storage 204, deposits the index according to the orderly sensing secondary storage 204 of major key wherein.

Further, at the cluster that embodiments of the invention provide, non-master 102 manages one-level storage unit 202, and each non-master 102 manages one or more one-level storage unit 202 by secondary storage 204 index.A secondary storage 204 can not across multiple node administration, and namely in a secondary storage 204, the one-level storage unit 202 of index can only be managed by a non-master 102.Particularly, as shown in Figure 2, one-level storage unit 208 and 210 can not belong to two non-master management.In addition, when an one-level storage unit can not simultaneously by two different secondary storage indexes.Particularly, as shown in Figure 2, one-level storage unit 212 can not simultaneously by secondary storage 214 and 216 index.Host node 100 manages secondary storage 204 and tertiary storage unit 206.

Fig. 3 is the Data Recovery Process figure of the distributed data restoration methods of the Case-based Reasoning that embodiments of the invention provide.In an embodiment provided by the invention, certain node in a certain moment cluster is delayed machine.According to method provided by the invention, date restoring can comprise the steps.

In step 302, host node 100 detects the node of the machine of delaying.

According to an embodiment provided by the invention, i.e. a kind of exemplary database store structure as shown in Figure 2, host node 100 manages this index and to delay the secondary storage 204 of multiple one-level storage unit 202 that machine node 106 manages.In this embodiment, the secondary storage that multiple one-level storing storage units of being managed by machine node 106 of delaying of host node 100 are corresponding distributes to line node 104, as step 304.

According to narrating above, the secondary storage 202 corresponding to a non-master 102 can have multiple.In one embodiment, for ensureing data recovering efficiency, when performing step 304, host node 100 will correspond to multiple secondary storage 202 uniform distribution of this machine node 106 of delaying to multiple line node 104.In another embodiment, the daily record of each node can be left in the catalogue with the network address of this node and port name, when secondary storage 204 is distributed to line node 104, network address and the port of machine of the delaying node 106 that simultaneously will recover inform line node.Thus make line node 104 can find the log region of this machine node 106 of delaying in daily record.

In step 306, by carrying out Hash classification to daily record, and the daily record after sorting out is assigned to multiple thread.

In step 308, sort out and after distributing thread, carry out date restoring concurrently in line node 104 internal multi-thread completing Hash.

Further, in some embodiments, multiple line node 104 according to distributed multiple threads, carries out logic recurrence according to the operation of content to machine node 106 of delaying that daily record stores at intra-node.

In one embodiment, after line node 104 completes date restoring, host node can redistribute corresponding relation in tertiary storage unit 206 former secondary storage of delaying machine node 106 correspondence, and these secondary storage are corresponded to the line node recovering them, as step 310.

Fig. 4 be the distributed data restoration methods of the Case-based Reasoning that embodiments of the invention provide in data recovery procedure, the schematic diagram of Hash classification process.In embodiment provided by the invention, when carrying out data recovery procedure, after line node 104 finds the deposit position 402 of this node log according to the network address of machine node 106 of delaying and port, line node scans one by one to journal file.Because all there is the information about secondary storage 204 in every bar log recording, makes line node can find the daily record needing to be recovered by oneself in the scanning process of daily record, often find a daily record conformed to, then Hash classification is carried out to this daily record.

Particularly, in one embodiment, the Hash classifying method in stage 404 can be following process.According to the file layout of example, the Instance Name recorded in log recording is changed.In the present embodiment, example can be machine name, program name etc., be then equivalent to character string.Character string can be converted to ASC II yard.Then, by conversion after ASC II yard add up, and by gained and be taken as a 32bit integer numeral.Again to this numeral to recovery number of threads delivery, the Thread Id of this example that is restored.Because Instance Name is unique, so corresponding Thread Id is also unique.That is, after such conversion, the thread that each example is corresponding unique.Therefore the daily record having machine node 106 of delaying can be mapped as multiple parallel date restoring thread according to example.

The second aspect of the application provides a kind of device of the distributed data base date restoring for Case-based Reasoning.This device comprises host node device and non-master equipment.

Fig. 5 shows the block diagram of a kind of host node device that embodiments of the invention provide.Selectively, host node device 500 comprises detection module 502 and distribution module 504.In one embodiment, detection module 502 is for detecting the non-master 102 of the machine of delaying.Distribution module 504 is for distributing at least one line node 104 by the multiple secondary storage 204 corresponding to machine node 106 of delaying.In another embodiment, the distribution module 504 provided can also be used for, and after line node 104 completes date restoring, the management node of secondary storage 204 is changed to the line node 104 performing recovery operation.

Fig. 6 shows the block diagram of a kind of non-master equipment that embodiments of the invention provide.Selectively, non-master equipment 600 comprises: receiver module 602, scan module 604 and processing module 606.

In one embodiment, receiver module 602 is for being dispensed to the information of relevant multiple secondary storage 204 corresponding to machine node 106 of delaying of non-master.Scan module 604 is for scanning machine node log of delaying.Processing module 606 is sorted out same with what make the daily record 110 of identical example be mapped in multiple thread for carrying out Hash.In another embodiment, the receiver module 602 provided also for receiving the network address of machine node 106 of delaying and port name, with the region making the line node 104 for date restoring to be found daily record 110 place of machine node 106 of delaying by the network address that receives and port name in file system 108.

It will be understood by those skilled in the art that all or part of of above-described embodiment method has come by computer program instruction related hardware, described program can store with computer-readable storage medium.During executive routine, the flow process of the embodiment of said method can be comprised.Description of the invention is for instruction those skilled in the art realize best mode of the present invention, can not therefore limit interest field of the present invention, therefore according to the equivalent variations of claim of the present invention, still belong to the scope that the present invention is contained.

Claims

1. a distributed data restoration methods for Case-based Reasoning, comprising:

The non-master of the machine of delaying detected;

Multiple secondary storage of the non-master corresponding to described machine of delaying are distributed at least one line node;

The example that daily record is deposited is carried out to Hash classification and is assigned to multiple threads of described line node inside; And

The data of the multiple one-level storage unit of parallel recovery in described multiple thread.

2. method according to claim 1, wherein, the index of described secondary storage node is stored in tertiary storage unit, each in described multiple secondary storage stores the index of described multiple one-level storage unit, each in described multiple one-level storage unit stores an example, and the data stored in described multiple one-level storage unit are orderly according to described example.

3. method according to claim 1 and 2, wherein, host node and non-master form the node in cluster jointly, the described one-level storage unit of each management described secondary storage index in described non-master, described host node manages described tertiary storage unit and described secondary storage.

4. multiple secondary storage of machine node of delaying described in corresponding to wherein, are distributed at least one line node by method according to claim 1 equably.

5. method according to claim 1, wherein, utilizes Hash to sort out same with what make the daily record of identical described example be mapped in described multiple thread, thus according to the difference of described example, described daily record is assigned to multiple described thread.

6. method according to claim 5, wherein, described Hash classifying step is: change the Instance Name recorded in log recording, adds up after each character of character string is transformed into ASC II yard, and by gained and be taken as a 32bit integer numeral; And, to this numeral to recovery number of threads delivery, the Thread Id of this example that is restored.

7. method according to claim 1, wherein, at least one line node described recurs recovery data according to the content of described daily record in the in-process logic of carrying out of oneself.

8. method according to claim 1, also comprises:

After at least one line node described completes date restoring, the management node of described secondary storage is changed to the line node performing recovery operation.

9., for a distributed data recovery device for method according to claim 1, comprising:

Host node device, for managing secondary storage and tertiary storage unit; And

Non-master equipment, for managing one-level storage unit.

10. device according to claim 9, wherein, host node device comprises:

Detection module, for detecting the non-master of the machine of delaying; And

Distribution module, for distributing at least one line node by the multiple described secondary storage corresponding to machine node of delaying.

11. devices according to claim 9, wherein, non-master equipment comprises:

Receiver module, for receiving the information of the relevant multiple secondary storage corresponding to machine node of delaying being dispensed to described non-master;

Scan module, for machine node log of delaying described in scanning; And

Processing module, sorts out same with what make the daily record of identical described example be mapped in multiple thread for carrying out Hash.

The distributed data recovery device of 12. a kind of Case-based Reasoning according to claim 10, wherein, the management node of described secondary storage, also for after at least one line node described completes date restoring, is changed to the line node performing recovery operation by distributor.

13. devices according to claim 11, wherein, receiver module is also for the network address and the port name of machine node of delaying described in receiving.