WO2017028394A1 - 一种基于实例的分布式数据恢复方法和装置 - Google Patents

一种基于实例的分布式数据恢复方法和装置 Download PDF

Info

Publication number
WO2017028394A1
WO2017028394A1 PCT/CN2015/095766 CN2015095766W WO2017028394A1 WO 2017028394 A1 WO2017028394 A1 WO 2017028394A1 CN 2015095766 W CN2015095766 W CN 2015095766W WO 2017028394 A1 WO2017028394 A1 WO 2017028394A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
primary
storage units
online
secondary storage
Prior art date
Application number
PCT/CN2015/095766
Other languages
English (en)
French (fr)
Inventor
赖春波
薛英飞
王仆
赵博
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to US15/533,955 priority Critical patent/US10783163B2/en
Publication of WO2017028394A1 publication Critical patent/WO2017028394A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present invention relates to the field of databases, and in particular, to an instance-based distributed data recovery method and apparatus.
  • the data recovery method performed after the database cluster node is down is very important.
  • the distributed data recovery method used in the industry is to allocate the data of the down node to multiple online nodes for recovery, and adopt a single thread inside each node, or realize multi-thread recovery after re-operation such as sorting log records. .
  • the use of these methods to recover data clearly has the disadvantages of low data recovery efficiency and low utilization of nodes.
  • the embodiment of the invention provides an instance-based distributed data recovery method, which can perform parallel data recovery when the distributed database system is down, improve data recovery efficiency and node utilization, thereby improving the availability of the database system.
  • An aspect of the present application provides an instance-based distributed data recovery method, including:
  • Detecting a non-master node that has a downtime assigning multiple secondary storage units belonging to the down node to at least one online node, hashing the instances stored in the log and assigning them to multiple threads, and at the online node Internal parallel recovery of data for multiple primary storage units.
  • the third-level storage unit has two An index of the level storage node, each of the plurality of secondary storage units storing an index of the plurality of primary storage units, each of the plurality of primary storage units storing one instance, and the plurality of primary storage units
  • the stored data is ordered according to the instance.
  • the non-primary node and the primary node together form a node in the cluster, and each of the non-primary nodes manages the primary storage unit of the secondary storage unit index, and the primary node manages the tertiary storage unit and Secondary storage unit.
  • the hash is categorized so that the logs of the same instance are mapped to the same thread, so that the logs are allocated to multiple threads according to different instances; at least one online node follows the contents of the log in its own process. Perform logical replay to recover data. After the data recovery is completed by the at least one online node, the management node of the secondary storage unit is changed to the online node performing the recovery operation.
  • a second aspect of the present application provides an apparatus, including a master node device for managing a master node and a non-master node device for managing a non-master node.
  • the master node device includes a detection module for detecting a non-primary node that is down, and for allocating a plurality of secondary storage units corresponding to the down node An allocation module for at least one online node.
  • the non-master node device includes: a receiving module, configured to allocate information to the non-master node about the plurality of secondary storage units corresponding to the down node; a scanning module, configured to scan the down node log; and a processing module, Used to hash the categorization so that the logs of the same instance are mapped to the same one of the multiple threads.
  • the distribution device is further configured to change the management node of the secondary storage unit to an online node that performs the recovery operation after the at least one online node completes the data recovery.
  • the receiving module is also configured to receive the network address and port naming of the down node.
  • the beneficial effects of the present application are: after the node is down, the Hash classification is performed on the instance stored in the log, and is allocated to multiple threads, so that the online node recovers data in parallel within the node. Thereby improving data recovery efficiency and utilization of nodes.
  • FIG. 1 is an overall framework diagram of a distributed data system according to an embodiment of the present invention
  • FIG. 2 is a block diagram of an example-based data storage structure provided by an embodiment of the present invention.
  • FIG. 3 is a number of example-based distributed data recovery methods according to an embodiment of the present invention. According to the recovery flowchart;
  • FIG. 4 is a schematic diagram of a hash categorization process in a data recovery process of an instance-based distributed data recovery method according to an embodiment of the present invention
  • FIG. 5 is an exemplary block diagram of a master node device according to an embodiment of the present invention.
  • FIG. 6 is an exemplary block diagram of a non-primary node device according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • the present invention provides an example-based distributed data recovery method.
  • the preferred embodiments of the present invention are described below in conjunction with the accompanying drawings. It is to be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention. The invention is defined. And in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
  • FIG. 1 is a general framework diagram of a distributed data system according to an embodiment of the present invention, but it should be understood that the embodiment of the present invention is not limited to the architecture shown in FIG.
  • nodes in the database cluster there are two kinds of nodes in the database cluster: the primary node 100 and the non-master node 102.
  • one master node 100 is usually configured.
  • multiple standby primary nodes may also be configured, but only one primary node is in an active state.
  • a plurality of non-master nodes 102 are also included.
  • multiple nodes are running online, which is called an online node, as indicated by numeral 104 in FIG.
  • the node is down at any time.
  • the non-master node 102 is further divided into N1 (1 ⁇ N1 ⁇ N2) decapitated nodes 106 and N2 (N1 ⁇ N2 ⁇ N, The total number of non-nodes is N) online nodes 104.
  • the master node 100 and the non-master node 102 work together to manage data in the database.
  • data is present in distributed file system 108 in the form of files that are fixedly present in the memory.
  • the node can read and write data in the file system 108.
  • a log 110 in the file system 108 records all changes (including insertions, deletions, etc.) of the node to the data, and thus the distributed file system is shared for the nodes.
  • the technical implementation is based on the storage of the instance.
  • the instance may be storage Object, such as machine name (such as server name), program name, and so on.
  • the database is a three-level storage structure, and the instances are stored in a primary storage unit (such as SSTABLE) 202.
  • the other two levels are a secondary storage unit (such as a Leaf Tablet) 204 and a tertiary storage unit (such as a Root Tablet) 206.
  • the database storage structure includes a plurality of primary storage units 202.
  • the primary storage structure 202 may be the smallest storage unit in the database, and the data in each primary storage unit 202 is ordered according to the primary key. The instance name is included in the primary key as part of the primary key, so the stored data is ordered by instance. In addition, only one instance of data is stored in each primary storage unit 202, and the serial number of each primary storage unit 202 is unique.
  • the database storage structure may also include a plurality of secondary storage units 204, and the secondary storage units 204 may be the smallest unit of cluster primary node 100 metadata storage. Each secondary storage unit 204 stores an index of the primary storage unit 202 in accordance with the primary key order.
  • the database storage structure may further include one or more tertiary storage units 206 for indexing the secondary storage unit 204, in which an index directed to the secondary storage unit 204 in accordance with the primary key order is stored.
  • the non-master node 102 manages the primary storage unit 202, and each non-primary node 102 manages one or more primary storage units 202 indexed by the secondary storage unit 204.
  • a secondary storage unit 204 cannot be managed across multiple nodes, i.e., the primary storage unit 202 indexed in one secondary storage unit 204 can only be managed by one non-primary node 102.
  • the primary storage units 208 and 210 cannot be assigned to two non-primary node management.
  • the primary storage unit 212 cannot be indexed by the secondary storage units 214 and 216 at the same time.
  • the master node 100 manages the secondary storage unit 204 and the tertiary storage unit 206.
  • FIG. 3 is a flowchart of data recovery of an instance-based distributed data recovery method according to an embodiment of the present invention.
  • a certain node in the cluster is down at a certain moment.
  • data recovery can include the following steps.
  • step 302 the master node 100 detects the node where the downtime occurred.
  • the master node 100 manages the indexing of the plurality of primary storage units 202 managed by the down node 106.
  • Level storage unit 204 the master node 100
  • the secondary storage unit corresponding to the plurality of primary storage units managed by the down node 106 is assigned to the online node 104, as in step 304.
  • the master node 100 evenly distributes the plurality of secondary storage units 202 corresponding to the down node 106 to the plurality of online nodes 104.
  • the log for each node is stored in a directory named with the node's network address and port.
  • the log is hashed and the categorized log is distributed to multiple threads.
  • step 308 after the hash categorization is completed and the threads are allocated, the data is restored in parallel by the multithreading within the online node 104.
  • the plurality of online nodes 104 logically replay the operation of the down node 106 according to the content stored in the log according to the allocated multiple threads within the node.
  • the primary node may re-allocate the corresponding secondary storage units in the tertiary storage unit 206 to the secondary storage units corresponding to the original downlink nodes 106, and map the secondary storage units to Restore their online nodes, as in step 310.
  • FIG. 4 is a schematic diagram of a hash categorization process in a data recovery process of an instance-based distributed data recovery method according to an embodiment of the present invention.
  • the online node 104 finds the storage location 402 of the node log according to the network address and port of the down node 106
  • the online node scans the log file one by one. Because each log record has information about the secondary storage unit 204, the online node can find the log that needs to be restored by itself during the scanning process of the log, and each time a matching log is found, the log is performed. Hash is classified.
  • the hash categorization method of stage 404 may be the following process.
  • the instance name recorded in the log record is converted according to the storage form of the instance.
  • the instance may be a machine name, a program name, etc., and is equivalent to a character string.
  • the string can be converted to an ASC II code.
  • the converted ASC II code is accumulated, and the resulting sum is taken as a 32-bit integer number.
  • the number is modulo the number of recovery threads, and the thread ID of the instance is restored.
  • the instance name is unique, the corresponding thread ID is also unique. That is, after such a conversion, each instance corresponds to a unique thread. Therefore, the log storing the down node 106 can be mapped as an instance to a plurality of parallel data recovery threads.
  • a second aspect of the present application provides an apparatus for instance based distributed database data recovery.
  • the device includes a master node device and a non-master node device.
  • FIG. 5 shows an exemplary block diagram of a master node device provided by an embodiment of the present invention.
  • the master node device 500 includes a detection module 502 and an allocation module 504.
  • the detection module 502 is configured to detect a non-master node 102 that is down.
  • the allocation module 504 is configured to assign a plurality of secondary storage units 204 corresponding to the down node 106 to at least one online node 104.
  • the provided distribution module 504 can also be used to change the management node of the secondary storage unit 204 to the online node 104 performing the recovery operation after the online node 104 completes the data recovery.
  • FIG. 6 shows an exemplary block diagram of a non-primary node device provided by an embodiment of the present invention.
  • the non-master device 600 includes a receiving module 602, a scanning module 604, and a processing module 606.
  • the receiving module 602 is configured to assign information to the non-primary node regarding the plurality of secondary storage units 204 corresponding to the down node 106.
  • the scanning module 604 is configured to scan the down node log.
  • the processing module 606 is configured to perform a hash categorization such that the log 110 of the same instance is mapped to the same one of the plurality of threads.
  • the provided receiving module 602 is further configured to receive the network address and port naming of the down node 106 such that the online node 104 for data recovery is named in the file system by the received network address and port. The area in which the log 110 of the down node 106 is located is found in 108.
  • FIG. 7 a block diagram of a computer system 700 suitable for use in implementing the apparatus of the embodiments of the present application is shown.
  • computer system 700 includes a central processing unit (CPU) 701 that can be loaded into a program in random access memory (RAM) 703 according to a program stored in read only memory (ROM) 702 or from storage portion 708. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM 703 various programs and data required for the operation of the system 700 are also stored.
  • the CPU 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also coupled to bus 704.
  • the following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, etc.; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 708 including a hard disk or the like And a communication portion 709 including a network interface card such as a LAN card, a modem, or the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • Driver 710 is also connected to I/O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 710 as needed so that a computer program read therefrom is installed into the storage portion 708 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 709, and/or installed from removable media 711.
  • each block in the flowchart or block diagram can represent a module, a program segment, or a portion of code, and a module, a program segment, or a portion of code includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or you can use dedicated hardware Implemented in combination with computer instructions.
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus in the foregoing embodiment, or may exist separately and not assembled.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the authentication methods described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于实例的分布式数据恢复方法,所述方法包括:检测到发生宕机的非主节点(302);将属于宕机节点的多个二级存储单元分配给至少一个在线节点(304);对日志所存放的实例进行Hash归类并分配到多个线程(306);以及在所述在线节点内部并行恢复多个一级存储单元的数据(308)。实现了分布式数据库中宕机节点数据在节点中的并行恢复。

Description

一种基于实例的分布式数据恢复方法和装置
相关申请的交叉引用
本申请要求于2015年08月20日提交的中国专利申请号为“201510515919.9”的优先权,其全部内容作为整体并入本申请中。
技术领域
本发明涉及数据库领域,具体涉及一种基于实例的分布式数据恢复方法和装置。
背景技术
随着互联网的发展,分布式数据库得到了越来越广泛的应用,因而对其可靠性的要求也不断提高。为了降低中断服务的时间,数据库集群节点宕机后所进行的数据恢复方法就显得至关重要。目前业界所使用的分布式数据恢复方法是将宕机节点的数据分配给多个在线节点进行恢复,而在每个节点内部采用单一线程,或者通过对日志记录排序等重操作后实现多线程恢复。使用这些方法恢复数据明显存在宕机节点数据恢复效率低,对节点利用率低的缺点。
发明内容
本发明实施例提供了一种基于实例的分布式数据恢复方法,可以在分布式数据库系统宕机时,进行并行数据恢复,提高数据恢复效率与节点利用率,从而提高数据库系统的可用性。
本申请的一个方面提供一种基于实例的分布式数据恢复方法,包括:
检测到发生宕机的非主节点,将属于宕机节点的多个二级存储单元分配给至少一个在线节点,对日志所存放的实例进行Hash归类并分配到多个线程,以及在在线节点内部并行恢复多个一级存储单元的数据。
本申请第一方面的一种示例性的实施方式中,三级存储单元存有二 级存储节点的索引,多个二级存储单元中的每一个存储有多个一级存储单元的索引,多个一级存储单元中的每一个存储有一个实例,并且多个一级存储单元中存储的数据是按照实例有序的,非主节点与主节点共同构成集群中的节点,非主节点中的每个管理二级存储单元索引的一级存储单元,主节点管理三级存储单元和二级存储单元。
此外,在数据恢复过程中,利用Hash归类以使得相同的实例的日志映射到同一线程,从而根据实例的不同将日志分配到多个线程;至少一个在线节点按照日志的内容在自己的进程内进行逻辑重演恢复数据。在至少一个在线节点完成数据恢复后,将二级存储单元的管理节点更改为执行恢复操作的在线节点。
本申请的第二方面提供一种装置,包括主节点设备和非主节点设备,主节点设备用于管理主节点,非主节点设备用于管理非主节点。
本申请第二方面的一种示例性的实施方式中,主节点设备包括用于检测发生宕机的非主节点的检测模块,以及用于将对应于宕机节点的多个二级存储单元分配给至少一个在线节点的分配模块。
此外,非主节点设备包括:接收模块,用于分配至非主节点的有关对应于宕机节点的多个二级存储单元的信息;扫描模块,用于扫描宕机节点日志;以及处理模块,用于进行Hash归类以使得相同的实例的日志映射到多个线程中的同一个。
其中,分配装置还用于当至少一个在线节点完成数据恢复后,将二级存储单元的管理节点更改为执行恢复操作的在线节点。接收模块还用于接收宕机节点的网络地址和端口命名。
本申请的有益效果为:在节点宕机后,通过对日志中存放的实例进行Hash归类,分配到多个线程,使在线节点在节点内部并行地恢复数据。从而提高了数据恢复效率与对节点的利用率。
附图说明
图1是本发明的实施例提供的一种分布式数据系统整体框架图;
图2是本发明的实施例提供的基于实例的数据存储结构框图;
图3是本发明的实施例提供的基于实例的分布式数据恢复方法的数 据恢复流程图;
图4是本发明的实施例提供的基于实例的分布式数据恢复方法在数据恢复过程中,Hash归类过程的示意图;
图5是本发明的实施例提供的一种主节点设备的示例性框图;
图6是本发明的实施例提供的一种非主节点设备的示例性框图;以及
图7是本发明的实施例提供的一种计算机系统的结构示意图。
具体实施方式
本发明提供了一种基于实例的分布式数据恢复方法,以下结合说明书附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
图1是本发明的实施例提供的一种分布式数据系统整体框架图,但是应理解,本发明实施例并不局限于图1所示的架构。
在本实施例中,数据库集群中存在两种节点:主节点100和非主节点102。在一个集群中,通常配置一个主节点100。在另一种实施方式中,也可以配置多个备用主节点,但只有一个主节点处于工作状态。从图1中可以看出,还包括多个非主节点102。在数据库系统正常工作的状态下,多个节点均在线运行,称为在线节点,如图1中由数字104标示。在分布式数据库中随时会有节点宕机的情况出现,在这种情况下,非主节点102又分为N1(1≤N1<N2)个宕机节点106和N2(N1<N2<N,其中记非节点总数为N)个在线节点104。主节点100与非主节点102一起共同管理数据库中的数据。在分布式数据库中,数据以文件的形式存在于分布式文件系统108中,文件系统108固定地存在于存储器。节点可以对文件系统108中的数据进行读写操作。文件系统108中的日志(Log)110记录了节点对于数据的所有改动(包括插入、删除等),因而分布式文件系统对于节点是共享的。
图2是本发明的实施例提供的基于实例的数据存储结构框图。在本实例中,技术实现是以实例的存储为基础的。具体地,实例可以为存储 对象,例如机器名(如服务器名)、程序名等。此外,数据库为三个层级的存储结构,实例存储于一级存储单元(如SSTABLE)202中。其他两个层级分别为二级存储单元(如Leaf Tablet)204和三级存储单元(如Root Tablet)206。
可选地,数据库存储结构中包括多个一级存储单元202,一级存储结构202可以是数据库中最小的存储单元,每个一级存储单元202里的数据是按照主键有序的。实例名作为主键的一部分包含于主键中,因而存储的数据是按照实例有序的。此外,每个一级存储单元202中只存储一个实例的数据,每个一级存储单元202的序号是唯一的。数据库存储结构还可包括多个二级存储单元204,且二级存储单元204可以是集群主节点100元数据存储的最小单位。每个二级存储单元204中存放有按照主键有序的一级存储单元202的索引。此外,数据库存储结构还可包括一个或多个三级存储单元206,三级存储单元206用来索引二级存储单元204,在其中存放有按照主键有序的指向二级存储单元204的索引。
更进一步,在本发明的实施例提供的集群,非主节点102管理一级存储单元202,每一个非主节点102管理一个或多个由二级存储单元204索引的一级存储单元202。一个二级存储单元204不能跨多个节点管理,即一个二级存储单元204中索引的一级存储单元202只能由一个非主节点102管理。具体地,如图2所示,一级存储单元208和210不能分属于两个非主节点管理。此外,当一个一级存储单元不可同时由两个不同的二级存储单元索引。具体地,如图2中所示,一级存储单元212不可同时被二级存储单元214和216索引。主节点100管理二级存储单元204和三级存储单元206。
图3是本发明的实施例提供的基于实例的分布式数据恢复方法的数据恢复流程图。在本发明提供的一个实施例中,某一时刻集群中的某个节点宕机。根据本发明提供的方法,数据恢复可以包括如下步骤。
步骤302中,主节点100检测到发生宕机的节点。
根据本发明提供的一个实施例,即如图2中所示的一种示例性的数据库存储结构,主节点100管理着该索引了宕机节点106所管理的多个一级存储单元202的二级存储单元204。在这种实施方式中,主节点100 将宕机节点106管理的多个一级存储存储单元对应的二级存储单元分配给在线节点104,如步骤304。
根据上文说述,对应于一个非主节点102的二级存储单元202可以有多个。在一个实施方式中,为保证数据恢复效率,在执行步骤304时,主节点100将对应于该宕机节点106的多个二级存储单元202均匀分配给多个在线节点104。在另一个实施方式中,每个节点的日志会存放在以该节点的网络地址和端口命名的目录中,在将二级存储单元204分配给在线节点104时,同时将要恢复的宕机节点106的网路地址和端口通知给在线节点。从而使在线节点104可以在日志中找到该宕机节点106的日志区域。
在步骤306,通过对日志进行Hash归类,并将归类后的日志分配到多个线程。
在步骤308,在完成Hash归类并分配线程后,于在线节点104内部多线程并行地进行数据恢复。
进一步的,在一些实施方式中,多个在线节点104在节点内部按照所分配的多个线程,根据日志所存储的内容对宕机节点106的操作进行逻辑重演。
在一个实施方式中,在线节点104完成数据恢复后,主节点可以在三级存储单元206中给原宕机节点106对应的二级存储单元重新分配对应关系,并且将这些二级存储单元对应到恢复它们的在线节点,如步骤310。
图4是本发明的实施例提供的基于实例的分布式数据恢复方法在数据恢复过程中,Hash归类过程的示意图。在本发明提供的实施例中,进行数据恢复过程时,在线节点104根据宕机节点106的网络地址和端口找到该节点日志的存放位置402后,在线节点对日志文件进行逐条扫描。因为每条日志记录中均存有关于二级存储单元204的信息,使得在线节点在对日志的扫描过程中可以找到需要由自己恢复的日志,每发现一条相符的日志,则对该条日志进行Hash归类。
具体地,在一个实施方式中,阶段404的Hash归类方法可以为如下过程。根据实例的存储形式,将日志记录中记载的实例名进行转换。 在本实施例中,实例可以为机器名、程序名等,则相当于字符串。可以将字符串转换成ASC II码。然后,将转换后的ASC II码累加,并将所得的和取为一个32bit的整型数字。再对该数字对恢复线程数量取模,得到恢复该实例的线程ID。因为实例名是唯一的,所以相应的线程ID也是唯一的。即,经过这样的转换后,每个实例对应唯一的一个线程。因此可以将存有宕机节点106的日志按照实例映射为多个并行的数据恢复线程。
本申请的第二方面提供了一种用于基于实例的分布式数据库数据恢复的装置。该装置包括主节点设备和非主节点设备。
图5示出了本发明的实施例提供的一种主节点设备的示例性框图。可选择地,主节点设备500包括检测模块502和分配模块504。在一个实施方式中,检测模块502用于检测发生宕机的非主节点102。分配模块504用于将对应于宕机节点106的多个二级存储单元204分配给至少一个在线节点104。在另一个实施方式中,所提供的分配模块504还可以用于,当在线节点104完成数据恢复后,将二级存储单元204的管理节点更改为执行恢复操作的在线节点104。
图6示出了本发明的实施例提供的一种非主节点设备的示例性框图。可选择地,非主节点设备600包括:接收模块602、扫描模块604以及处理模块606。
在一个实施方式中,接收模块602用于分配至非主节点的有关对应于宕机节点106的多个二级存储单元204的信息。扫描模块604用于扫描宕机节点日志。处理模块606用于进行Hash归类以使得相同的实例的日志110映射到多个线程中的同一个。在另一个实施方式中,所提供的接收模块602还用于接收宕机节点106的网络地址和端口命名,以使得用于数据恢复的在线节点104通过接收到的网络地址和端口命名在文件系统108中找到宕机节点106的日志110所在的区域。
本领域技术人员应理解上述实施例方法的全部或部分是可通过计算机程序指示相关硬件来完成的,所述的程序可存储于计算机可读的存储介质中。执行程序时,可包括上述方法的实施例的流程。
下面参考图7,其示出了适于用来实现本申请实施例的设备的计算机系统700的结构示意图。
如图7所示,计算机系统700包括中央处理单元(CPU)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件 与计算机指令的组合来实现。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,程序被一个或者一个以上的处理器用来执行描述于本申请的认证方法。
本发明的描述是为教导本领域技术人员实现本发明的最佳方式而已,不能因此限定本发明的权利范围,因此依照本发明的权利要求的等同变化,仍属本发明所涵盖的范围。

Claims (15)

  1. 一种基于实例的分布式数据恢复方法,包括:
    检测到发生宕机的非主节点;
    将对应于所述发生宕机的非主节点的多个二级存储单元分配给至少一个在线节点;
    对日志所存放的实例进行Hash归类并分配到所述在线节点内部的多个线程;以及
    在所述多个线程内并行恢复多个一级存储单元的数据。
  2. 根据权利要求1所述的方法,其中,所述二级存储节点的索引存于三级存储单元,所述多个二级存储单元中的每一个存储有所述多个一级存储单元的索引,所述多个一级存储单元中的每一个存储有一个实例,并且所述多个一级存储单元中存储的数据是按照所述实例有序的。
  3. 根据权利要求1或2所述的方法,其中,主节点与非主节点共同构成集群中的节点,所述非主节点中的每个管理所述二级存储单元索引的所述一级存储单元,所述主节点管理所述三级存储单元和所述二级存储单元。
  4. 根据权利要求1所述的方法,其中,将对应于所述宕机节点的多个二级存储单元均匀地分配给至少一个在线节点。
  5. 根据权利要求1所述的方法,其中,利用Hash归类以使得相同的所述实例的日志映射到所述多个线程中的同一个,从而根据所述实例的不同将所述日志分配到多个所述线程。
  6. 根据权利要求5所述的方法,其中,所述Hash归类步骤为:对日志记录中记载的实例名进行转换,将字符串的每个字符转变成 ASC II码后累加,并将所得的和取为一个32bit的整型数字;以及,对该数字对恢复线程数量取模,得到恢复该实例的线程ID。
  7. 根据权利要求1所述的方法,其中,所述至少一个在线节点按照所述日志的内容在自己的进程内进行逻辑重演恢复数据。
  8. 根据权利要求1所述的方法,还包括:
    在所述至少一个在线节点完成数据恢复后,将所述二级存储单元的管理节点更改为执行恢复操作的在线节点。
  9. 一种用于权利要求1所述的方法的装置,包括:
    主节点设备,用于管理二级存储单元和三级存储单元;以及
    非主节点设备,用于管理一级存储单元。
  10. 根据权利要求9所述的一种用于权利要求1所述的方法的装置,其中,主节点设备包括:
    检测模块,用于检测发生宕机的非主节点;以及
    分配模块,用于将对应于宕机节点的多个所述二级存储单元分配给至少一个在线节点。
  11. 根据权利要求9所述的一种用于权利要求1所述的方法的装置,其中,非主节点设备包括:
    接收模块,用于接收分配至所述非主节点的有关对应于宕机节点的多个二级存储单元的信息;
    扫描模块,用于扫描所述宕机节点日志;以及
    处理模块,用于进行Hash归类以使得相同的所述实例的日志映射到多个线程中的同一个。
  12. 根据权利要求10所述的一种基于实例的分布式数据恢复装置,其中,分配装置还用于当所述至少一个在线节点完成数据恢复后, 将所述二级存储单元的管理节点更改为执行恢复操作的在线节点。
  13. 根据权利要求11所述的一种用于权利要求1所述的方法的装置,其中,接收模块还用于接收所述宕机节点的网络地址和端口命名。
  14. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器:
    检测发生宕机的非主节点;
    将对应于所述发生宕机的非主节点的多个二级存储单元分配给至少一个在线节点;
    对日志所存放的实例进行Hash归类并分配到所述在线节点内部的多个线程;以及
    在所述多个线程内并行恢复多个一级存储单元的数据。
  15. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器:
    检测发生宕机的非主节点;
    将对应于所述发生宕机的非主节点的多个二级存储单元分配给至少一个在线节点;
    对日志所存放的实例进行Hash归类并分配到所述在线节点内部的多个线程;以及
    在所述多个线程内并行恢复多个一级存储单元的数据。
PCT/CN2015/095766 2015-08-20 2015-11-27 一种基于实例的分布式数据恢复方法和装置 WO2017028394A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/533,955 US10783163B2 (en) 2015-08-20 2015-11-27 Instance-based distributed data recovery method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510515919.9 2015-08-20
CN201510515919.9A CN105045917B (zh) 2015-08-20 2015-08-20 一种基于实例的分布式数据恢复方法和装置

Publications (1)

Publication Number Publication Date
WO2017028394A1 true WO2017028394A1 (zh) 2017-02-23

Family

ID=54452464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095766 WO2017028394A1 (zh) 2015-08-20 2015-11-27 一种基于实例的分布式数据恢复方法和装置

Country Status (3)

Country Link
US (1) US10783163B2 (zh)
CN (1) CN105045917B (zh)
WO (1) WO2017028394A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115437843A (zh) * 2022-08-25 2022-12-06 北京万里开源软件有限公司 一种基于多级分布式共识的数据库存储分区恢复方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9921910B2 (en) * 2015-02-19 2018-03-20 Netapp, Inc. Virtual chunk service based data recovery in a distributed data storage system
CN105045917B (zh) 2015-08-20 2019-06-18 北京百度网讯科技有限公司 一种基于实例的分布式数据恢复方法和装置
CN105930397B (zh) * 2016-04-15 2019-05-17 北京思特奇信息技术股份有限公司 一种消息处理方法和系统
CN106919679B (zh) * 2017-02-27 2019-12-13 北京小米移动软件有限公司 应用于分布式文件系统的日志重演方法、装置及终端
CN110825706B (zh) * 2018-08-07 2022-09-16 华为云计算技术有限公司 一种数据压缩的方法和相关设备
CN111459896B (zh) * 2019-01-18 2023-05-02 阿里云计算有限公司 数据恢复系统和方法、电子设备以及计算机可读存储介质
NL2027048B1 (en) * 2020-12-04 2022-07-07 Ing Bank N V Methods, systems and networks for recovering distributed databases, and computer program products, data carrying media and non-transitory tangible data storage media with computer programs and/or databases stored thereon useful in recovering a distributed database.
CN113268470A (zh) * 2021-06-17 2021-08-17 重庆富民银行股份有限公司 一种高效的数据库回滚方案验证方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853186A (zh) * 2008-12-31 2010-10-06 Sap股份公司 分布式事务恢复系统和方法
CN104376082A (zh) * 2014-11-18 2015-02-25 中国建设银行股份有限公司 一种把数据源文件中的数据导入到数据库中的方法
US20150169658A1 (en) * 2012-08-06 2015-06-18 Amazon Technologies, Inc. Static sorted index replication
CN105045917A (zh) * 2015-08-20 2015-11-11 北京百度网讯科技有限公司 一种基于实例的分布式数据恢复方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101207510B1 (ko) * 2008-12-18 2012-12-03 한국전자통신연구원 클러스터 데이터 관리시스템 및 클러스터 데이터 관리 시스템에서 공유 재수행 로그를 이용한 데이터 재구축 방법
CN102364448B (zh) * 2011-09-19 2014-01-15 浪潮电子信息产业股份有限公司 一种计算机故障管理系统的容错方法
CN103049355B (zh) * 2012-12-25 2015-06-17 华为技术有限公司 一种数据库系统恢复方法及设备
CN103198159B (zh) * 2013-04-27 2016-01-06 国家计算机网络与信息安全管理中心 一种基于事务重做的异构集群多副本一致性维护方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853186A (zh) * 2008-12-31 2010-10-06 Sap股份公司 分布式事务恢复系统和方法
US20150169658A1 (en) * 2012-08-06 2015-06-18 Amazon Technologies, Inc. Static sorted index replication
CN104376082A (zh) * 2014-11-18 2015-02-25 中国建设银行股份有限公司 一种把数据源文件中的数据导入到数据库中的方法
CN105045917A (zh) * 2015-08-20 2015-11-11 北京百度网讯科技有限公司 一种基于实例的分布式数据恢复方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115437843A (zh) * 2022-08-25 2022-12-06 北京万里开源软件有限公司 一种基于多级分布式共识的数据库存储分区恢复方法
CN115437843B (zh) * 2022-08-25 2023-03-28 北京万里开源软件有限公司 一种基于多级分布式共识的数据库存储分区恢复方法

Also Published As

Publication number Publication date
US20180150536A1 (en) 2018-05-31
US10783163B2 (en) 2020-09-22
CN105045917A (zh) 2015-11-11
CN105045917B (zh) 2019-06-18

Similar Documents

Publication Publication Date Title
WO2017028394A1 (zh) 一种基于实例的分布式数据恢复方法和装置
WO2015106711A1 (zh) 一种为半结构化数据构建NoSQL数据库索引的方法及装置
JP6542909B2 (ja) ファイル操作方法及び装置
JP5661104B2 (ja) 検索エンジンインデクシング及びインデックスを使用する検索のための方法とシステム
US10579973B2 (en) System for efficient processing of transaction requests related to an account in a database
CN107180113B (zh) 一种大数据检索平台
WO2013078583A1 (zh) 优化数据访问的方法及装置、优化数据存储的方法及装置
TW201800967A (zh) 分布式流式資料處理的方法和裝置
WO2018121025A1 (zh) 比较数据表的数据的方法和系统
CN104615785A (zh) 一种基于TYKY cNosql数据库的数据存储方法及装置
CN111597270A (zh) 数据同步方法、装置、设备及计算机存储介质
CN109614411B (zh) 数据存储方法、设备和存储介质
WO2017157111A1 (zh) 防止内存数据丢失的的方法、装置和系统
US10552419B2 (en) Method and system for performing an operation using map reduce
US20240143456A1 (en) Log replay methods and apparatuses, data recovery methods and apparatuses, and electronic devices
WO2018019310A1 (zh) 一种大数据系统中数据备份方法、恢复方法和装置和计算机存储介质
US11086649B2 (en) Minimizing downtime of highly available virtual machines
CN111221814B (zh) 二级索引的构建方法、装置及设备
JP2006092503A (ja) マルチインスタンス・インメモリ・データベース
US9852172B2 (en) Facilitating handling of crashes in concurrent execution environments of server systems while processing user queries for data retrieval
US11249952B1 (en) Distributed storage of data identifiers
CN105868370A (zh) 读写分离的HBase入库装置及方法
WO2019214685A1 (zh) 一种消息的处理方法、装置和系统
US20140157275A1 (en) Distributed computing method and distributed computing system
CN113553329B (zh) 数据集成系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15901598

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15533955

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15901598

Country of ref document: EP

Kind code of ref document: A1