WO2018098972A1 - Log recovery method, storage device and storage node - Google Patents

Log recovery method, storage device and storage node Download PDF

Info

Publication number
WO2018098972A1
WO2018098972A1 PCT/CN2017/081334 CN2017081334W WO2018098972A1 WO 2018098972 A1 WO2018098972 A1 WO 2018098972A1 CN 2017081334 W CN2017081334 W CN 2017081334W WO 2018098972 A1 WO2018098972 A1 WO 2018098972A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
storage node
recovery
target
storage
Prior art date
Application number
PCT/CN2017/081334
Other languages
French (fr)
Chinese (zh)
Inventor
李开广
王英
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018098972A1 publication Critical patent/WO2018098972A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata

Definitions

  • the invention relates to the invention of storage technology, and relates to the field of logging in the field of storage technology.
  • more and more distributed storage adopts an object-based architecture.
  • the architecture is divided into upper and lower layers, where the upper layer is the file system, the block, the service layer such as S3/Swift, and the lower layer is the distributed key value. , KV) storage layer.
  • the distributed KV storage layer is responsible for the allocation, storage, and release of system storage space, providing reliability for the upper service layer across storage nodes, and self-healing functions that support storage media failures such as disks.
  • Distributed KV can implement self-healing through the associated log, and the associated log of the repair class is also called the recovery log.
  • the service layer In addition to providing basic services such as file system, block, and S3/Swift, the service layer also needs to provide value-added features such as quotas and remote replication. These value-added features also need to store quota consumption records, remote replication incremental change records, and so on. As a result, these value-added features place additional demands on distributed storage. Distributed storage can also provide storage and reading methods for these additional records through companion logs. Service characteristics additional class associated logs are also called business logs.
  • a verification method of a distributed storage system is: splitting service data into data fragments, and computing redundant fragments by data fragmentation; some storage nodes storing service data fragments, and another part of storage
  • the node stores redundant slices.
  • the number of data storage nodes is N, each data storage node stores one data fragment; the number of redundant storage nodes is M, and each redundant storage node stores one verification fragment.
  • a log is provided to solve the problem that the service log recovery speed is slow in the prior art.
  • the present invention provides an embodiment of a log recovery method, which is applied to a target storage node, where the target storage node is faulty, the target storage node is located in a distributed storage system, and the distributed storage system includes The target storage node and the normal storage node, the method includes: after the target storage node recovers from the fault state to the normal state, sending a request for acquiring a recovery log to the normal storage node in the distributed storage system; The target storage node receives the recovery log returned by the other storage node, the recovery log includes a first recovery log record, and the first recovery log indicates a service log that needs to be restored; according to the indication of the first recovery log, The target storage node sends a request for obtaining a service log to the normal storage node; the target storage node generates a target service log according to the received service log; and saves the target service log.
  • the host can Get the business log at any time, and only need to get the business log from a storage node.
  • the process is
  • the first possible implementation manner of the first aspect after the host detects that the target storage node is faulty, the host sends a first recovery log to the normal storage node, where the normal storage node receives the first sent by the host Restore the log and store it.
  • This solution provides the basis for subsequent use of the first recovery log to recover business logs.
  • the second possible implementation manner of the first aspect wherein the target storage node generates the target service log according to the received service log, specifically: the target storage node merges and removes the received service log according to the received service log.
  • the project is repeated to generate the target business log.
  • This scheme describes the process of generating the target service log.
  • the service logs sent by other storage nodes may not be complete, so they should be merged; there may be duplicate content after the merge, so duplicates should be removed.
  • the recovery log record further includes a second recovery log record, wherein the second recovery log record records fragmentation location information
  • the method further includes: the target storage node Sending a fragmentation acquisition request to another storage node in the distributed storage system, where the fragment acquisition request carries the fragmentation location information; the target storage node receives a fragment sent by another storage node; and the target storage node uses Receive the fragment, obtain the fragment of the storage node and save it.
  • This scenario describes how the failed storage node fragmentation is recovered.
  • the method further includes: the host sending a service log obtaining request to the target storage node; the target storage node sending the target service log to the host; The target service log performs an operation.
  • the scheme describes that the host can perform the operation of the service log only by obtaining the target service log, instead of obtaining the service log from multiple storage nodes and then merging and deduplicating to perform the operation of the service log. This solution is simpler and more efficient than the prior art.
  • the target storage node and the normal storage node belong to the same partition.
  • the present invention further provides an embodiment of a service log storage device, where the service log storage device is in communication with a normal storage device, the service log storage device includes: a sending module, configured to be used in the service log storage device After the fault is restored to normal, the request for obtaining the recovery log is sent to the normal storage device; the receiving module is configured to receive the recovery log returned by the other storage node, where the recovery log includes a first recovery log record, and the first recovery The log indicates a service log that needs to be restored; the processing module is configured to send, by the sending module, a request for obtaining a service log to the normal storage device according to the indication of the first recovery log; the processing module is further configured to: A target service log is generated according to the received service log, and a storage module is configured to save the target service log.
  • a third aspect of the present invention provides an embodiment of a service log storage node, where the service log storage node is located in a distributed storage system, where the distributed storage system includes the service log storage node and a normal storage node, and the service
  • the log storage node includes a processor and a storage medium, and the processor is configured to: after the service log storage node recovers from the fault, send a request for acquiring the recovery log to the normal storage node; and receive the returned by the other storage node.
  • the recovery log includes a first recovery log record, where the first recovery log indicates a service log that needs to be restored; and according to the indication of the first recovery log, sends a service log to the normal storage node.
  • the request generates a target service log according to the received service log, and saves the target service log to the storage medium.
  • Figure 1 is a schematic diagram of a storage node failure
  • FIG. 2 is a schematic diagram of data recovery in a fault storage node
  • FIG. 3 is a schematic diagram of a storage node failure according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of data recovery in a fault storage node according to an embodiment of the present invention.
  • FIG. 5 is a flow chart of an embodiment of a log recovery method of the present invention.
  • FIG. 6 is a structural diagram of an embodiment of a service log storage device of the present invention.
  • FIG. 7 is a topological diagram of an embodiment of a service log storage node of the present invention.
  • Data stored in a distributed storage system can store files (or parts of files), objects, or key values (KV, key value).
  • KV key value
  • the associated log may be log information generated by the distributed storage system in the process of storing data along with the preservation of the data.
  • the associated log can be used for data recovery, and can also be used to record value-added features such as quotas.
  • the service log and recovery log are all associated logs.
  • the associated log can have characteristics: companionship, with the operation of the object, does not exist alone; reliability, with the operation of the object, each storage node saves the same companion log in a mirrored manner, and its reliability can be related to the object.
  • the reliability level is the same; consistency, with the operation of the object, the consistency between the storage nodes is the same as the object; the logistic, the operation of the associated log pair always adds a number of records to the associated log object.
  • the various logs mentioned in the embodiments of the present invention may be log records in a log file.
  • a distributed storage system includes: a storage node 1, a storage node 2, and a storage node 3.
  • Data (such as files or values) is split into data fragments.
  • the redundancy slice 3 is parity data formed by the data slice 1 and the data slice 2, and the three slices are prepared to be stored in different storage nodes. If the storage node 2 fails when the data is written, the data slice 2 that needs to be written to the storage node 2 cannot be successfully written, the data slice 1 is successfully written to the storage node 1, and the redundant slice 3 is successfully written. Enter storage node 3.
  • the second recovery log in FIG. 1 is used to recover data fragmentation or redundant fragmentation. For N+M redundancy ratio, as long as no less than N fragments are obtained, the remaining fragments can be recovered.
  • the recovery log records the storage node IDs that need to be restored, the fragments that need to be restored, and the version number that needs to be restored.
  • the data slice 2 can be calculated by the data slice 1 stored in the storage node 1 and the redundant slice 3 stored in the storage node 3, and the calculated data slice 2 is written into the storage node 2 so that the storage node The data slice 2 in 2 is recovered. However, the associated log in storage node 2 that should be stored is not recovered. If the host (the host and all the storage nodes communicate) need to obtain the service log, you must read the service logs from the storage node 1 and the storage node 3 separately, and then merge and collate to obtain a reliable service log.
  • FIG. 3 similar to FIG. 1, the storage node 2 has failed.
  • the difference from FIG. 2 is that there are two recovery logs in the embodiment of the present invention.
  • the process of writing data can be performed in a two-stage (2PC) transaction process. Proceed as follows.
  • the host splits the file or object to be written into data fragments, and selects a certain number of data fragments according to the number of storage nodes to calculate redundant fragments.
  • the selected data fragments and redundant fragments are sent to the respective storage nodes.
  • Each storage node sends one or more shards.
  • the storage node is temporarily stored in the memory after receiving the fragment. Then the storage node sends it to its own hard disk, and each hard disk sends a slice.
  • the host sends a prepare request to each storage node.
  • the storage node that received the prepare request will write the transaction log locally and then send a response message. Since the storage node 2 sends a failure, only the storage node 1 and the storage node 2 have issued a response message.
  • the storage node that receives the commit request stores the shards in the memory persistently, that is, in a non-volatile storage medium (such as a disk or a solid-state hard disk). Each hard disk stores one slice.
  • a non-volatile storage medium such as a disk or a solid-state hard disk.
  • the second recovery log can also be used to recover the data fragment 2.
  • the recovery process is as follows.
  • the storage node 1 and the storage node 3 After receiving the notification message, the storage node 1 and the storage node 3 send the locally stored recovery log to the storage node 2.
  • the storage node 2 obtains the data fragment 1 and the verification fragment 3 according to the first recovery log, and relies on the data fragment 1 and the verification fragment 3 to recover the data fragment 2.
  • the data slice 2 is sent to the storage node 2.
  • the storage node 2 holds the data slice 2 into a non-volatile storage medium.
  • any storage node can calculate the data slice 2 as long as the data slice 1 and the check slice 3 are obtained. Therefore, the recovery process of the data slice 2 can also be performed by the storage node 3 or the storage node 1, and Let me repeat.
  • the storage node 2 according to the second recovery log, the storage node 1 and the storage node 3 each send a local service log to the storage node 2. After the storage node 2 performs the operations of merging and removing duplicates, it generates its own service log and saves the generated service log.
  • the generated service log is sent to the storage node 2 for storage.
  • FIG. 5 it is a flowchart of an embodiment of a log recovery method of the present invention. The process of Figures 3 and 4 is described in more detail.
  • a storage node failure means that the storage node cannot write data, including the failure of the storage node itself, and the failure of the host to the storage node.
  • the failed storage connection is referred to as a target storage node.
  • the storage node range involved in this step is the storage node where the partition is located.
  • the storage space of each partition is saved by multiple Provided by the storage node.
  • Each storage node also provides storage for multiple partitions simultaneously.
  • the storage nodes in this step check whether the four storage nodes are faulty.
  • Storage nodes can be left unchecked.
  • the normal storage node and the target storage node mentioned in this embodiment and other embodiments are for a specific partition unless otherwise specified. Refers to the storage nodes distributed by this partition. In other words, the set of the normal storage node and the target storage node is composed of storage nodes where the storage space of the partition is located. Each storage space can be described by a range of addresses.
  • the first recovery log carries the information of the service log that needs to be restored, and the information of the service log can be located to the log that needs to be restored.
  • the information form of the business log can be various.
  • the information of the service log may be an identifier (I D) of the service log, and the identifier may be located to a specific service log;
  • the information of the service log may be a description of the service log type and the generation of the service log. period.
  • the fragment and the second recovery log are also written to the normal storage node. Specifically, it is stored in each storage space according to the amount of storage space. Different slices are stored in different storage spaces. Each slice is one of the N data slices of the object, or one of the M check slices; the second recovery log stored in different storage spaces is the same. . It should be noted that if the number of faulty storage nodes is too large, and the number of normal storage nodes (or the number of fragments provided by the normal storage node) is difficult to form a verification relationship, step 11 is not performed, and the entire process is terminated.
  • the second recovery log records the storage node ID and the object ID; it may also be other information, such as the storage node ID and the address of the fragment on the hard disk; there may be other forms as long as the fragmentation of the failed storage node is resumed. You can find the fragments in the normal storage node.
  • a partition In distributed storage, a partition consists of a set of storage space from hard disks of different storage nodes, each of which provides a storage space.
  • the service log and the first recovery log are stored in the normal storage node.
  • the service log and the first recovery log are mirrored in the storage space of the normal storage node.
  • the partition After storing the object in the partition, the partition provides data protection capabilities for the object, such as erasure code (EC) and mirror (mirror).
  • EC erasure code
  • mirror mirror
  • the component slice includes a verification slice and a data slice, the data slice is split by the object, the check slice is generated by the data slice, and the data slice and the check slice form a check relationship.
  • the fragmentation in the remaining storage nodes can be recovered by using the erasure code algorithm, and this step can be performed. If the number of normal storage nodes is less than N, the verification relationship is not met. This step is not performed.
  • the step may further include: writing, according to the quantity of the storage space provided by the normal storage node, the fragment into the normal storage node, where each storage space may store one fragment.
  • the normal storage node refers to a storage node that is originally in a normal state, or a storage that is in a normal state during a failure of the template storage node, unless otherwise specified. Node, so the storage node in the normal state does not include the target storage node.
  • the request to recover the log may carry a period of time during which the recovery log needs to be obtained (that is, a period of time during which the target storage node fails).
  • the target storage node can Know which storage nodes are located in this partition's other space.
  • This step can send a recovery log request to a normal storage node (such as the primary storage node in the partition).
  • a recovery log request can also be sent to all normal storage nodes.
  • the normal storage node obtains a recovery log by calling a log interface program after receiving the request for obtaining a recovery log.
  • a recovery log during a target storage node failure is sent to the target storage node.
  • the normal storage node feeds back the recovery log.
  • One solution is to obtain a local recovery log as feedback to the recovery log of the target storage node. This scheme can be applied to the complete situation of the local recovery log.
  • Another solution is to obtain the recovery log from the local, and obtain the recovery log from other normal storage nodes. After the merge and deduplication, the feedback log is fed back to the recovery log of the target storage node. This scheme can be applied to the local recovery log. Case.
  • step 14 If the plurality of normal storage nodes receive the request for obtaining the recovery log, the local recovery logs are respectively fed back to the target storage node, and after the target storage node merges and removes the duplicate items, the operation of step 14 is performed.
  • the normal storage node obtains the recovery log of the local storage, specifically, the log of the space storage belonging to the partition in the storage node. If there are multiple storage spaces belonging to the partition, each storage space stores one. If you want to recover the log, you can get one of the recovery logs at will.
  • the partition is composed of a plurality of storage spaces. Multiple storage spaces that make up the same partition come from multiple storage nodes. Each storage node can provide one storage space; it can also provide multiple storage spaces. Each storage space stores one fragment; and stores logs (recovery logs, service logs, etc.), and logs of different storage spaces are mirrored.
  • different storage spaces can come from different hard disks to improve system reliability.
  • the three storage spaces are respectively from the three hard disks of the storage node. Because the logs are accompanied by shard storage, each storage space can store a single log. Therefore the number of logs and the number of shards are the same.
  • the target storage node After receiving the recovery log, the target storage node sends a recovery request to the normal storage node according to the indication of the recovery log.
  • the sending range of the recovery request can be all normal storage nodes.
  • the target storage node For the recovery of the service log, the target storage node sends a service log recovery request to the normal storage node according to the information described in the first recovery log.
  • the target storage node For the recovery of the fragment, the target storage node sends a service log recovery request to the normal storage node according to the information described in the second recovery log.
  • the normal storage node after receiving the recovery request, sends the data of the information recorded in the recovery request to the target storage node.
  • the service log corresponding to the identifier is directly sent to the target storage node. If the information carried in the first recovery log describes the service log type and the time period during which the service log is generated. Then, the service log that satisfies the information of the service log (the specified service log type in the specified time period) is sent to the target storage node.
  • each The storage space stores a service log), which can be sent to the target storage node, or only one of them can be sent.
  • the fragment of the storage node is found according to the fragmentation information carried in the second recovery log, and then sent to the target storage node.
  • the slice of the storage node can be found according to the storage node ID and the object ID.
  • the range of transmission is all fragments that satisfy the condition at the partition. That is, if there are multiple shards of the same normal storage node, these shards are sent to the target storage node.
  • the target storage node receives the service logs sent by other storage nodes, and generates a target service log. Then save the generated business log.
  • the received service logs are combined to generate a target service log. This restores the business log that would otherwise need to be stored to the target storage node.
  • the deduplication operation can be performed after the merge to generate a service log that needs to be stored to the target storage node.
  • the information carried in the first recovery log describes the service log type and the time period during which the service log is generated. Then, the received service logs are merged, and then the target service logs are generated after the duplicates are removed. This restores the business log that would otherwise need to be stored to the target storage node.
  • the target storage node can also recover the fragments.
  • the method is: performing a check calculation on the received fragments, and generating a fragment that needs to be stored to the target storage node.
  • the target storage node obtains the fragmentation and performs the data recovery operation. It can be understood that, in other embodiments, the storage may be obtained by other storage nodes and the data recovery operation may be performed.
  • the fragment is sent to the target storage node.
  • the host reads the service log acquisition request from the target storage node, and performs the next operation according to the obtained service log. For example, if the service log describes the consumption of the storage space quota of the partition, it is determined whether to continue writing data to the partition according to the consumption of the storage space quota.
  • the host when the host needs to read the service log, the host only needs to read the service log to one storage node, and does not need to read multiple service logs, so the delay is shorter, and the process is also It's simpler, and it also saves network bandwidth between the host and the distributed storage system.
  • FIG. 6 it is a structural diagram of an embodiment of a service log storage device, which can perform the above-mentioned log recovery method. Since it has been described in detail in the embodiment of the log recovery method, only a brief introduction will be given below.
  • the service log storage device 2 is in communication with the normal storage device, and the service log storage device 2 includes:
  • the sending module 21 is configured to send a request for acquiring a recovery log to the normal storage device after the service log storage device recovers from a fault;
  • the receiving module 22 is configured to receive the recovery log returned by another storage node, where the recovery log includes a first recovery log record, the first recovery log indicates a service log that needs to be restored, and the processing module 23 is configured to An instruction to obtain a service log is sent to the normal storage device by the sending module 21; the processing module 23 is further configured to generate a target service log according to the received service log; 24, for saving the target service log, for example, saving to a storage medium.
  • the receiving module 22 is further configured to: after the service log storage device fails, receive the first recovery log sent by the host and store the first recovery log.
  • the processing module 23 generates the target service log according to the received service log, which includes: the processing module 23 generates the target service log according to combining the received service logs and removing duplicate items.
  • the recovery log record further includes a second recovery log record, wherein the second recovery log record records fragmentation location information;
  • the sending module 21 is further configured to send a fragment acquisition request to the storage device, where The fragmentation acquisition request carries the fragmentation location information;
  • the receiving module 22 is further configured to receive a fragment sent by another storage device;
  • the processing module 23 is further configured to obtain, by using the received fragmentation, Fragments of this storage node are saved.
  • the host communicates with the service log storage device.
  • the host is configured to send a service log obtaining request to the service log storage device.
  • the sending module 21 is further configured to send the target service log to the host; the host is further configured to perform operations according to the target service log. .
  • the service log storage device 2 and the normal storage device belong to the same partition.
  • the service log storage device may be hardware, such as a storage node, and the present invention may be referred to as a service log storage node. Physically it may be a storage controller, or a storage server, or a combination of a storage controller and a storage medium.
  • FIG. 7 is a topological diagram of an embodiment of a service log storage node of the present invention.
  • the service log storage node 3 is located in a distributed storage system, and the distributed storage system includes the service log storage node 3 and a normal storage node (not shown), and the service log storage node 3 includes a processor 31, a memory 32, and a storage medium 33.
  • the memory 32 has a program, and the processor 31 is configured to execute, by running a program in the memory, to send a request for acquiring a recovery log to the normal storage node after the service log storage node recovers from the fault.
  • the recovery log includes a first recovery log record, the first recovery log indicating a service log that needs to be restored; according to the indication of the first recovery log, to the
  • the normal storage node sends a request for obtaining a service log; and generates a target service log according to the received service log; and saves the target service log to the storage medium 33.
  • the processor itself has the ability to memorize programs and therefore does not require memory.
  • FPGA Field Programmable Gate Array
  • the present invention also provides an embodiment of a storage medium such as an optical disk/U disk, in which a computer program is stored in the optical disk/U disk, and after the program is installed in a computer, a storage server or a storage controller, steps 11-16 can be executed by running the program.
  • a storage medium such as an optical disk/U disk
  • steps 11-16 can be executed by running the program.

Abstract

Disclosed is a log recovery technology, which may be applied to a distributed storage system. The log recovery method comprises: after a target storage node recovers to normal from a failure, sending a request to obtain a recovery log to a normal storage node in a distributed storage system; the target storage node receiving a first recovery log record which is returned by another storage node, a first recovery log indicating a service log which requires recovery; according to an indication of the first recovery log, the target storage node sending to the normal storage node a request to obtain a service log; the target storage node generating a target service log according to a received service log; and saving the target service log.

Description

一种日志恢复方法、存储装置和存储节点Log recovery method, storage device and storage node 技术领域Technical field
本发明涉发明存储技术,涉及于存储领域技术的日志领域。The invention relates to the invention of storage technology, and relates to the field of logging in the field of storage technology.
背景技术Background technique
在存储领域,越来越多的分布式存储采用基于对象的架构,该架构分为上下两层,其中上层为文件系统,块,S3/Swift等服务层,下层为分布式键值(key value,KV)存储层。In the storage field, more and more distributed storage adopts an object-based architecture. The architecture is divided into upper and lower layers, where the upper layer is the file system, the block, the service layer such as S3/Swift, and the lower layer is the distributed key value. , KV) storage layer.
分布式KV存储层负责系统存储空间的分配,保存和释放,为上层服务层提供跨存储节点的可靠性,同时需要支持磁盘等存储介质故障的自我修复功能。分布式KV可以通过伴生日志来实现自我修复功能,修复类的伴生日志也称为恢复日志。The distributed KV storage layer is responsible for the allocation, storage, and release of system storage space, providing reliability for the upper service layer across storage nodes, and self-healing functions that support storage media failures such as disks. Distributed KV can implement self-healing through the associated log, and the associated log of the repair class is also called the recovery log.
服务层除了提供文件系统,块,S3/Swift等基本服务外,还需要提供配额,远程复制等增值特性,而这些增值特性也需要保存配额的消费记录,远程复制的增量变化记录等。于是,这些增值特性对分布式存储提出了额外的需求。分布式存储也可以通过伴生日志提供了这些额外记录的存储和读取方法,业务特性附加类伴生日志也称为业务日志。In addition to providing basic services such as file system, block, and S3/Swift, the service layer also needs to provide value-added features such as quotas and remote replication. These value-added features also need to store quota consumption records, remote replication incremental change records, and so on. As a result, these value-added features place additional demands on distributed storage. Distributed storage can also provide storage and reading methods for these additional records through companion logs. Service characteristics additional class associated logs are also called business logs.
现有技术中,一种分布式存储系统的校验方法是:把业务数据拆分成数据分片,由数据分片计算出冗余分片;一部分存储节点存储业务数据分片,另外一部分存储节点存储冗余分片。例如:数据存储节点的数量为N,每个数据存储节点存储一个数据分片;冗余存储节点的数量为M,每个冗余存储节点存储一个校验分片。In the prior art, a verification method of a distributed storage system is: splitting service data into data fragments, and computing redundant fragments by data fragmentation; some storage nodes storing service data fragments, and another part of storage The node stores redundant slices. For example, the number of data storage nodes is N, each data storage node stores one data fragment; the number of redundant storage nodes is M, and each redundant storage node stores one verification fragment.
对业务数据而言,如果分布式存储系统中存在故障存储节点,在故障存储节点的数量不超过M个的情况下,仅对余下存储节点正常写入数据(数据分片或者校验分片)并不会造成数据丢失,因此写入操作会被正常执行。业务日志伴随着分片进行存储,每个存储节点保存的业务日志相同。当故障存储节点恢复正常后,通过其他存储节点的分片,可以恢复出未写入的分片。然而,现有技术中,业务日志并没有伴随分片一起得到恢复,因此所述恢复正常的存储节点中虽然有分片,却没有业务日志。For service data, if there is a faulty storage node in the distributed storage system, if the number of faulty storage nodes does not exceed M, only the remaining storage nodes are normally written with data (data fragmentation or parity fragmentation). It does not cause data loss, so the write operation will be performed normally. The service log is stored along with the shards, and the service logs saved by each storage node are the same. After the faulty storage node returns to normal, the unwritten fragments can be recovered through the fragmentation of other storage nodes. However, in the prior art, the service log is not recovered together with the shards. Therefore, although there are shards in the restored storage node, there is no service log.
在这种情况下,由于原本正常存储节点,在接收完分片后也可能发生故障。这意味着,每个存储节点的业务日志都不可靠的,因此如果想要从分布式存储系统中读取业务日志,主机必须从M+1个存储节点中获得业务日志,对获得的业务日志进行合并归纳后才能获得可靠的业务日志。获取M+1个业务日志的并进行归纳整理的过程,加重了主机的运算资源消耗和时间等待。In this case, due to the original normal storage node, a failure may occur after receiving the slice. This means that the service log of each storage node is not reliable, so if you want to read the service log from the distributed storage system, the host must obtain the service log from the M+1 storage nodes, and obtain the service log. A consolidated business log can be obtained after a combined induction. The process of obtaining and sorting the M+1 service logs adds to the computing resource consumption and time waiting of the host.
发明内容Summary of the invention
提供一种日志,可以解决现有技术中,业务日志恢复速度慢的问题。A log is provided to solve the problem that the service log recovery speed is slow in the prior art.
第一方面,本发明提供了一种日志恢复方法的实施例,应用于目标存储节点,所述目标存储节点发生故障,所述目标存储节点位于分布式存储系统中,所述分布式存储系统包括所述目标存储节点和正常存储节点,该方法包括:所述目标存储节点从故障状态恢复到正常状态后,向所述分布式存储系统中的所述正常存储节点发送获取恢复日志的请求;所述目标存储节点接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;按照所述第一恢复日志的指示,所述目标存储节点向所述正常存储节点发送获取业务日志的请求;所述目标存储节点根据收到的业务日志,生成目标业务日志;对所述目标业务日志进行保存。应用该方案,主机可 以随时获取业务日志,并且只需要从一个存储节点获取业务日志即可。过程简单、延时低。In a first aspect, the present invention provides an embodiment of a log recovery method, which is applied to a target storage node, where the target storage node is faulty, the target storage node is located in a distributed storage system, and the distributed storage system includes The target storage node and the normal storage node, the method includes: after the target storage node recovers from the fault state to the normal state, sending a request for acquiring a recovery log to the normal storage node in the distributed storage system; The target storage node receives the recovery log returned by the other storage node, the recovery log includes a first recovery log record, and the first recovery log indicates a service log that needs to be restored; according to the indication of the first recovery log, The target storage node sends a request for obtaining a service log to the normal storage node; the target storage node generates a target service log according to the received service log; and saves the target service log. Apply this solution, the host can Get the business log at any time, and only need to get the business log from a storage node. The process is simple and the delay is low.
第一方面的第一种可能实现方式,在主机检测到所述目标存储节点发生故障后,所述主机向正常存储节点发送第一恢复日志,所述正常存储节点接收主机发送的所述第一恢复日志并存储。该方案为后续利用第一恢复日志来恢复业务日志提供了基础。The first possible implementation manner of the first aspect, after the host detects that the target storage node is faulty, the host sends a first recovery log to the normal storage node, where the normal storage node receives the first sent by the host Restore the log and store it. This solution provides the basis for subsequent use of the first recovery log to recover business logs.
第一方面的第二种可能实现方式,其中,所述目标存储节点根据收到的业务日志生成所述目标业务日志,具体包括:所述目标存储节点根据对收到的业务日志进行合并以及去除重复项目,生成所述目标业务日志。该方案描述了目标业务日志的生成过程,其他存储节点发来的业务日志可能并不完整,因此要合并;合并后可能有重复内容,因此要去除重复项。The second possible implementation manner of the first aspect, wherein the target storage node generates the target service log according to the received service log, specifically: the target storage node merges and removes the received service log according to the received service log. The project is repeated to generate the target business log. This scheme describes the process of generating the target service log. The service logs sent by other storage nodes may not be complete, so they should be merged; there may be duplicate content after the merge, so duplicates should be removed.
第一方面的第三种可能实现方式,所述恢复日志记录还包括第二恢复日志记录,所述第二恢复日志记录中记录有分片位置信息,所述方法进一步包括:所述目标存储节点向分布式存储系统中的其他存储节点发送分片获取请求,所述分片获取请求携带所述分片位置信息;所述目标存储节点接收其他存储节点发送的分片;所述目标存储节点使用接收到的分片,获得本存储节点的分片并保存。该方案描述了故障存储节点分片是如何恢复的。A third possible implementation manner of the first aspect, the recovery log record further includes a second recovery log record, wherein the second recovery log record records fragmentation location information, the method further includes: the target storage node Sending a fragmentation acquisition request to another storage node in the distributed storage system, where the fragment acquisition request carries the fragmentation location information; the target storage node receives a fragment sent by another storage node; and the target storage node uses Receive the fragment, obtain the fragment of the storage node and save it. This scenario describes how the failed storage node fragmentation is recovered.
第一方面的第四种可能实现方式,该方法进一步包括:主机向所述目标存储节点发送业务日志获取请求;所述目标存储节点把所述目标业务日志发送给所述主机;所述主机按照所述目标业务日志执行操作。该方案描述了主机仅仅获得目标业务日志就可以执行业务日志的操作,而不是从多个存储节点获得业务日志然后合并去重,才能执行业务日志的操作。本方案相较于现有技术,更加简单高效。A fourth possible implementation manner of the first aspect, the method further includes: the host sending a service log obtaining request to the target storage node; the target storage node sending the target service log to the host; The target service log performs an operation. The scheme describes that the host can perform the operation of the service log only by obtaining the target service log, instead of obtaining the service log from multiple storage nodes and then merging and deduplicating to perform the operation of the service log. This solution is simpler and more efficient than the prior art.
第一方面的第五种可能实现方式,所述目标存储节点和所述正常存储节点属于同一个分区。In a fifth possible implementation manner of the first aspect, the target storage node and the normal storage node belong to the same partition.
第二方面,本发明还提供一种业务日志存储装置的实施例,所述业务日志存储装置和正常存储装置通信,该业务日志存储装置包括:发送模块,用于在所述业务日志存储装置从故障恢复正常后,向所述正常存储装置发送获取恢复日志的请求;接收模块,用于接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;处理模块,用于按照所述第一恢复日志的指示,通过所述发送模块向所述正常存储装置发送获取业务日志的请求;所述处理模块,还用于根据收到的业务日志,生成目标业务日志;存储模块,用于对所述目标业务日志进行保存。该第二方面方案(以及该第二方面方案的各种实现方式)和第一方面(以及第一方面的各种实现方式)的方案细节和技术效果类似,因此不做赘述。In a second aspect, the present invention further provides an embodiment of a service log storage device, where the service log storage device is in communication with a normal storage device, the service log storage device includes: a sending module, configured to be used in the service log storage device After the fault is restored to normal, the request for obtaining the recovery log is sent to the normal storage device; the receiving module is configured to receive the recovery log returned by the other storage node, where the recovery log includes a first recovery log record, and the first recovery The log indicates a service log that needs to be restored; the processing module is configured to send, by the sending module, a request for obtaining a service log to the normal storage device according to the indication of the first recovery log; the processing module is further configured to: A target service log is generated according to the received service log, and a storage module is configured to save the target service log. The solution details and technical effects of the second aspect (and various implementations of the second aspect) and the first aspect (and various implementations of the first aspect) are similar and therefore are not described herein.
第三方面,还提供一种业务日志存储节点的实施例,所述业务日志存储节点位于分布式存储系统中,所述分布式存储系统包括所述业务日志存储节点和正常存储节点,所述业务日志存储节点包括处理器和存储介质,所述处理器用于执行:在所述业务日志存储节点从故障恢复正常后,向所述正常存储节点发送获取恢复日志的请求;接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;按照所述第一恢复日志的指示,向所述正常存储节点发送获取业务日志的请求;根据收到的业务日志,生成目标业务日志;把所述目标业务日志保存到所述存储介质中。该第三方面方案(以及该第三方面方案的各种实现方式)和第一方面(以及第一方面的各种实现方式)的方案细节和技术效果类似,因此不做赘述。 A third aspect of the present invention provides an embodiment of a service log storage node, where the service log storage node is located in a distributed storage system, where the distributed storage system includes the service log storage node and a normal storage node, and the service The log storage node includes a processor and a storage medium, and the processor is configured to: after the service log storage node recovers from the fault, send a request for acquiring the recovery log to the normal storage node; and receive the returned by the other storage node. The recovery log includes a first recovery log record, where the first recovery log indicates a service log that needs to be restored; and according to the indication of the first recovery log, sends a service log to the normal storage node. The request generates a target service log according to the received service log, and saves the target service log to the storage medium. The solution details and technical effects of the third aspect (and various implementations of the third aspect) and the first aspect (and various implementations of the first aspect) are similar and therefore are not described herein.
附图说明DRAWINGS
图1是存储节点故障示意图;Figure 1 is a schematic diagram of a storage node failure;
图2是故障存储节点中数据恢复后的示意图;2 is a schematic diagram of data recovery in a fault storage node;
图3是本发明实施例存储节点故障示意图;3 is a schematic diagram of a storage node failure according to an embodiment of the present invention;
图4是本发明实施例故障存储节点中数据恢复后的示意图;4 is a schematic diagram of data recovery in a fault storage node according to an embodiment of the present invention;
图5是本发明日志恢复方法实施例流程图;5 is a flow chart of an embodiment of a log recovery method of the present invention;
图6是本发明业务日志存储装置实施例结构图;6 is a structural diagram of an embodiment of a service log storage device of the present invention;
图7是本发明业务日志存储节点实施例的拓扑图。7 is a topological diagram of an embodiment of a service log storage node of the present invention.
具体实施方式detailed description
本申请方案可以适用于分布式存储系统。分布式存储系统保存的数据可以存储文件(或者文件的一部分)、对象或者键值(KV,key value)。下面的实施例中,在没有特别说明的情况下,仅以对象存储为例进行介绍。The solution of the present application can be applied to a distributed storage system. Data stored in a distributed storage system can store files (or parts of files), objects, or key values (KV, key value). In the following embodiments, only the object storage will be described as an example unless otherwise specified.
伴生日志可以是分布式存储系统在存储数据的过程中,伴随着数据的保存而产生的日志信息。伴生日志可以用于数据的故障恢复,也可以用于记录配额等增值特性,业务日志和恢复日志都属于伴生日志。伴生日志可以具有特征:伴生性,伴随对象的操作进行,不会单独存在;可靠性,伴随对象的操作进行,每个存储节点都以镜像的方式保存相同的伴生日志,其可靠性可以与对象的可靠性级别相同;一致性,伴随对象的操作进行,存储节点间一致性与对象相同;日志性,伴生日志对的操作是总是追加若干记录到伴生日志对象中。本发明实施例中提及的各种日志可以是日志文件中的日志记录。The associated log may be log information generated by the distributed storage system in the process of storing data along with the preservation of the data. The associated log can be used for data recovery, and can also be used to record value-added features such as quotas. The service log and recovery log are all associated logs. The associated log can have characteristics: companionship, with the operation of the object, does not exist alone; reliability, with the operation of the object, each storage node saves the same companion log in a mirrored manner, and its reliability can be related to the object. The reliability level is the same; consistency, with the operation of the object, the consistency between the storage nodes is the same as the object; the logistic, the operation of the associated log pair always adds a number of records to the associated log object. The various logs mentioned in the embodiments of the present invention may be log records in a log file.
参见图1,分布式存储系统包括:存储节点1、存储节点2和存储节点3。数据(例如文件或者value)拆分成数据分片。冗余分片3是由数据分片1和数据分片2形成的校验数据,这三个分片预备存储到不同的存储节点中。如果在写入数据的时候检测到存储节点2发生故障,那么原本需要写入存储节点2的数据分片2无法写成功,数据分片1成功写入存储节点1,冗余分片3成功写入存储节点3。Referring to FIG. 1, a distributed storage system includes: a storage node 1, a storage node 2, and a storage node 3. Data (such as files or values) is split into data fragments. The redundancy slice 3 is parity data formed by the data slice 1 and the data slice 2, and the three slices are prepared to be stored in different storage nodes. If the storage node 2 fails when the data is written, the data slice 2 that needs to be written to the storage node 2 cannot be successfully written, the data slice 1 is successfully written to the storage node 1, and the redundant slice 3 is successfully written. Enter storage node 3.
类似的,业务日志和第二恢复日志在存储节点1和存储节点3中写入成功,业务日志和第二恢复日志在存储节点2无法写入成功。数据分片和冗余分片统称为分片。图1中的第二恢复日志用于恢复数据分片或者冗余分片,对于N+M的冗余配比,只要获得不少于N个分片,就可以恢复出余下的分片。恢复日志记录了需要恢复分片的存储节点ID、需要恢复的分片以及需要恢复分片的版本号。Similarly, the service log and the second recovery log are successfully written in the storage node 1 and the storage node 3, and the service log and the second recovery log cannot be successfully written in the storage node 2. Data fragmentation and redundant fragmentation are collectively referred to as fragmentation. The second recovery log in FIG. 1 is used to recover data fragmentation or redundant fragmentation. For N+M redundancy ratio, as long as no less than N fragments are obtained, the remaining fragments can be recovered. The recovery log records the storage node IDs that need to be restored, the fragments that need to be restored, and the version number that needs to be restored.
参见图2,在存储节点2恢复正常工作后。通过存储在存储节点1的数据分片1和存储在存储节点3的冗余分片3可以计算出数据分片2,并把计算出的数据分片2写入存储节点2中,使得存储节点2中的数据分片2得到恢复。然而,本应该存储的存储节点2中伴生日志并没有得到恢复。如果主机(主机和所有存储节点通信)需要获取业务日志,不得不分别从存储节点1和存储节点3读取业务日志,进行合并整理后才能得到可靠的业务日志。Referring to Figure 2, after storage node 2 resumes normal operation. The data slice 2 can be calculated by the data slice 1 stored in the storage node 1 and the redundant slice 3 stored in the storage node 3, and the calculated data slice 2 is written into the storage node 2 so that the storage node The data slice 2 in 2 is recovered. However, the associated log in storage node 2 that should be stored is not recovered. If the host (the host and all the storage nodes communicate) need to obtain the service log, you must read the service logs from the storage node 1 and the storage node 3 separately, and then merge and collate to obtain a reliable service log.
参见图3,和图1类似,存储节点2发生了故障。和图2不同之处在于,本发明实施例中有两个恢复日志。除了第二恢复日志外,还有第一恢复日志,所述第一恢复日志用于恢复业务日志。Referring to FIG. 3, similar to FIG. 1, the storage node 2 has failed. The difference from FIG. 2 is that there are two recovery logs in the embodiment of the present invention. In addition to the second recovery log, there is a first recovery log, which is used to restore the service log.
在存储节点2故障的情况下,写数据的过程可以按照两阶段(2PC)事务处理流程执行, 步骤如下。In the case of storage node 2 failure, the process of writing data can be performed in a two-stage (2PC) transaction process. Proceed as follows.
(1)主机把待写的文件或者对象拆分成数据分片,按照存储节点的数量,选择一定数量的数据分片计算出冗余分片。把选择的数据分片和冗余分片发送给各个存储节点。每个存储节点发送一个或者多个分片。(1) The host splits the file or object to be written into data fragments, and selects a certain number of data fragments according to the number of storage nodes to calculate redundant fragments. The selected data fragments and redundant fragments are sent to the respective storage nodes. Each storage node sends one or more shards.
(2)存储节点接收分片后暂存在内存中。再由存储节点发送给自己的硬盘,每个硬盘发送一个分片。(2) The storage node is temporarily stored in the memory after receiving the fragment. Then the storage node sends it to its own hard disk, and each hard disk sends a slice.
(3)主机发送准备(prepare)请求给各个存储节点。(3) The host sends a prepare request to each storage node.
(4)收到prepare请求的存储节点会在本地写事务日志,然后发送响应消息。由于存储节点2发送故障,因此只有存储节点1和存储节点2发出了响应消息。(4) The storage node that received the prepare request will write the transaction log locally and then send a response message. Since the storage node 2 sends a failure, only the storage node 1 and the storage node 2 have issued a response message.
(5)主机收到存储节点1和存储节点3的响应消息后。进行判断:当所有收到prepare的存储节点的响应都被主机收到。主机下发交付(commit)请求给发出响应消息的存储节点;否则,主机下发中止(abort)请求给发送响应消息的存储节点。(5) After the host receives the response message from the storage node 1 and the storage node 3. Judgment: When all the storage nodes that received the prepare are responded to by the host. The host sends a request to the storage node that sends the response message; otherwise, the host issues an abort request to the storage node that sends the response message.
(6)收到commit请求的存储节点,把内存中的分片进行持久化存储,也就是存储到非易失性存储介质(例如磁盘、固态硬盘)中。每个硬盘存储一个分片。(6) The storage node that receives the commit request stores the shards in the memory persistently, that is, in a non-volatile storage medium (such as a disk or a solid-state hard disk). Each hard disk stores one slice.
参见图4,在存储节点2从故障恢复正常后。除了使用第一恢复日志恢复出业务日志之外,还可以使用第二恢复日志恢复出数据分片2。Referring to Figure 4, after storage node 2 recovers from a failure. In addition to recovering the service log using the first recovery log, the second recovery log can also be used to recover the data fragment 2.
恢复流程如下。The recovery process is as follows.
(1)存储节点2恢复正常后,把自己恢复正常的消息通告给存储节点1和存储节点3。(1) After the storage node 2 returns to normal, the message that it returns to normal is advertised to the storage node 1 and the storage node 3.
(2)存储节点1和存储节点3收到通告消息后,把本地存储的恢复日志发送给存储节点2。(2) After receiving the notification message, the storage node 1 and the storage node 3 send the locally stored recovery log to the storage node 2.
(3)存储节点2按照第一恢复日志,获取数据分片1和校验分片3,依靠数据分片1和校验分片3,可以恢复出数据分片2。把数据分片2发送给存储节点2。存储节点2保存数据分片2到非易失性存储介质中。(3) The storage node 2 obtains the data fragment 1 and the verification fragment 3 according to the first recovery log, and relies on the data fragment 1 and the verification fragment 3 to recover the data fragment 2. The data slice 2 is sent to the storage node 2. The storage node 2 holds the data slice 2 into a non-volatile storage medium.
实际上,任意存储节点只要获得数据分片1和校验分片3都可以计算出数据分片2,因此数据分片2的恢复过程也可以由存储节点3或者存储节点1执行,此处不再赘述。In fact, any storage node can calculate the data slice 2 as long as the data slice 1 and the check slice 3 are obtained. Therefore, the recovery process of the data slice 2 can also be performed by the storage node 3 or the storage node 1, and Let me repeat.
(4)存储节点2按照第二恢复日志,存储节点1和存储节点3各自把本地的业务日志发送给存储节点2。存储节点2进行合并、去除重复项这两项操作后就生成了自己的业务日志,并对生成的业务日志进行保存。(4) The storage node 2 according to the second recovery log, the storage node 1 and the storage node 3 each send a local service log to the storage node 2. After the storage node 2 performs the operations of merging and removing duplicates, it generates its own service log and saves the generated service log.
同样的,这个操作也可以由其他存储节点执行。把生成的业务日志发送给存储节点2进行保存即可。Again, this operation can be performed by other storage nodes. The generated service log is sent to the storage node 2 for storage.
参见图5,是本发明日志恢复方法实施例流程图。对图3、图4的过程进行更详细的介绍。Referring to FIG. 5, it is a flowchart of an embodiment of a log recovery method of the present invention. The process of Figures 3 and 4 is described in more detail.
11,主机在写入业务日志前,先对分布式系统中的存储节点进行检查,以便获知存储节点的状态(状态包括:正常、故障)。把业务日志和第一恢复日志写入正常存储节点。业务日志描述对象的属性,第一恢复日志用于对所述业务日志进行恢复。对于故障存储节点,不会写入业务日志和第一恢复日志。存储节点故障是指存储节点无法写入数据,既包括存储节点本身的故障,也包括主机通往存储节点的链路发生故障。为了方便描述,后面把发生故障的存储接的称为目标存储节点。11. Before the host writes the service log, check the storage node in the distributed system to know the status of the storage node (the status includes: normal, fault). Write the service log and the first recovery log to the normal storage node. The service log describes the attributes of the object, and the first recovery log is used to restore the service log. For the failed storage node, the service log and the first recovery log are not written. A storage node failure means that the storage node cannot write data, including the failure of the storage node itself, and the failure of the host to the storage node. For convenience of description, the failed storage connection is referred to as a target storage node.
本步骤所涉及的存储节点范围是分区所在的存储节点。每个分区的存储空间由多个存 储节点提供。每个存储节点也同时为多个分区提供存储空间。The storage node range involved in this step is the storage node where the partition is located. The storage space of each partition is saved by multiple Provided by the storage node. Each storage node also provides storage for multiple partitions simultaneously.
例如分布式存储系统中一个有10个存储节点,而本实施例所要写入数据的分区分布于4个存储节点,则本步骤检查的存储节点是这4个存储节点是否发生故障,对于其余6个存储节点,可以不予检查。在没有特别说明的情况下,本实施例及其他实施例所提及的正常存储节点和目标存储节点,都是针对一个特定的分区而言。指的是这个分区所分布的存储节点。换言之,所述正常存储节点和所述目标存储节点组成的集合,是由所述分区的存储空间所在的存储节点共同组成。每个存储空间可以用一段地址范围描述。For example, if there are 10 storage nodes in the distributed storage system, and the partitions to be written in the data are distributed in the four storage nodes, the storage nodes in this step check whether the four storage nodes are faulty. Storage nodes can be left unchecked. The normal storage node and the target storage node mentioned in this embodiment and other embodiments are for a specific partition unless otherwise specified. Refers to the storage nodes distributed by this partition. In other words, the set of the normal storage node and the target storage node is composed of storage nodes where the storage space of the partition is located. Each storage space can be described by a range of addresses.
第一恢复日志携带了需要恢复的业务日志的信息,由业务日志的信息可以定位到需要恢复的日志。业务日志的信息形式可以有多种。例如(1)业务日志的信息可以是业务日志的标识符(I D),由标识符可以定位到一条具体的业务日志;(2)业务日志的信息可以是描述业务日志类型和生成业务日志的时间段。The first recovery log carries the information of the service log that needs to be restored, and the information of the service log can be located to the log that needs to be restored. The information form of the business log can be various. For example, (1) the information of the service log may be an identifier (I D) of the service log, and the identifier may be located to a specific service log; (2) the information of the service log may be a description of the service log type and the generation of the service log. period.
可选的,还把分片和第二恢复日志写入正常存储节点。具体而言,按照存储空间的数量,存储到每个存储空间中。不同存储空间所存储的分片不同,每个分片是所述对象的N个数据分片中的一个,或者是M个校验分片中的一个;不同存储空间存储的第二恢复日志相同。需要说明的是,如果故障存储节点的数量过多,导致正常存储节点的数量(或者说正常存储节点提供的分片的数量)难以形成校验关系,则步骤11不执行,整个流程终止。Optionally, the fragment and the second recovery log are also written to the normal storage node. Specifically, it is stored in each storage space according to the amount of storage space. Different slices are stored in different storage spaces. Each slice is one of the N data slices of the object, or one of the M check slices; the second recovery log stored in different storage spaces is the same. . It should be noted that if the number of faulty storage nodes is too large, and the number of normal storage nodes (or the number of fragments provided by the normal storage node) is difficult to form a verification relationship, step 11 is not performed, and the entire process is terminated.
第二恢复日志记录存储节点ID和对象ID;也可以是其他信息,例如存储节点ID和分片在硬盘中的地址;还可以有其他形式,只要在对故障存储节点的分片进行恢复的时候,可以找到所述正常存储节点中的分片即可。The second recovery log records the storage node ID and the object ID; it may also be other information, such as the storage node ID and the address of the fragment on the hard disk; there may be other forms as long as the fragmentation of the failed storage node is resumed. You can find the fragments in the normal storage node.
分布式存储中,分区(partition)由一组存储空间组成,这一组存储空间来自于不同存储节点的硬盘,每个硬盘提供一个存储空间。把业务日志和第一恢复日志写入正常存储节点中,是指对业务日志和第一恢复日志在正常存储节点的各个存储空间中进行镜像存储。把对象存储到分区中后,分区为对象提供数据保护能力,保护算法例如是纠删码(erasure code,EC)和镜像(mirror)。以纠删码的保护算法为例,一种具体做法是:把一组分片存储到一个分区中,组成分区的空间来自于多个硬盘,每个硬盘提供存储一个分片的存储空间,这些硬盘来自于多个存储节点。这一组分片包括校验分片也包括数据分片,数据分片由对象拆分而成,校验分片由数据分片生成,数据分片和校验分片共同形成校验关系。以N+M的冗余配比为例,获得分区中至少N个存储节点的分片后,使用纠删码算法就可以恢复出其余存储节点中的分片,本步骤可以执行。如果正常存储节点数量小于N个,则不满足校验关系,本步骤不执行。In distributed storage, a partition consists of a set of storage space from hard disks of different storage nodes, each of which provides a storage space. The service log and the first recovery log are stored in the normal storage node. The service log and the first recovery log are mirrored in the storage space of the normal storage node. After storing the object in the partition, the partition provides data protection capabilities for the object, such as erasure code (EC) and mirror (mirror). Taking the protection algorithm of the erasure code as an example, a specific method is: storing a group of slices into a partition, the space forming the partition comes from a plurality of hard disks, and each hard disk provides a storage space for storing a slice. The hard disk comes from multiple storage nodes. The component slice includes a verification slice and a data slice, the data slice is split by the object, the check slice is generated by the data slice, and the data slice and the check slice form a check relationship. Taking the redundancy ratio of N+M as an example, after obtaining the fragmentation of at least N storage nodes in the partition, the fragmentation in the remaining storage nodes can be recovered by using the erasure code algorithm, and this step can be performed. If the number of normal storage nodes is less than N, the verification relationship is not met. This step is not performed.
可选的,本步骤还可以包括:按照所述正常存储节点所提供的存储空间的数量,把分片写入所述正常存储节点,每个存储空间可以存储一个分片。Optionally, the step may further include: writing, according to the quantity of the storage space provided by the normal storage node, the fragment into the normal storage node, where each storage space may store one fragment.
12,所述目标存储节点恢复正常后,向所述正常存储节点发送获取恢复日志的请求。12. After the target storage node returns to normal, send a request for obtaining a recovery log to the normal storage node.
为了方便进行区分,在没有特别说明的情况下,本实施例以及其他各个实施例中,正常存储节点指的是原本就是正常状态的存储节点,或者说在模板存储节点故障期间处于正常状态的存储节点,因此正常状态的存储节点不包括目标存储节点。恢复日志的请求中可以携带需要获得恢复日志的时间段(也就是目标存储节点发生故障的时间段)。In order to facilitate the distinction, in the present embodiment and other various embodiments, the normal storage node refers to a storage node that is originally in a normal state, or a storage that is in a normal state during a failure of the template storage node, unless otherwise specified. Node, so the storage node in the normal state does not include the target storage node. The request to recover the log may carry a period of time during which the recovery log needs to be obtained (that is, a period of time during which the target storage node fails).
需要说明的是,包括目标存储节点在内的、分区所分布的各个存储节点中,记录有分区的信息,例如分区分布在哪些存储节点。因此对一个特定分区而言,目标存储节点可以 获知这个分区的其他空间位于哪些存储节点。It should be noted that, among the storage nodes distributed by the partition including the target storage node, information of the partition, such as which storage nodes the partition is distributed, is recorded. So for a particular partition, the target storage node can Know which storage nodes are located in this partition's other space.
本步骤可以向一个正常存储节点(例如分区中的主存储节点)发送恢复日志请求。也可以向所有正常存储节点发送恢复日志请求。This step can send a recovery log request to a normal storage node (such as the primary storage node in the partition). A recovery log request can also be sent to all normal storage nodes.
13,所述正常存储节点,在收到所述获取恢复日志的请求后,通过调用日志接口程序,获取恢复日志。把目标存储节点故障期间的恢复日志发送给所述目标存储节点。13. The normal storage node obtains a recovery log by calling a log interface program after receiving the request for obtaining a recovery log. A recovery log during a target storage node failure is sent to the target storage node.
如果仅一个正常存储节点收到所述获取恢复日志的请求,则由这个正常存储节点反馈恢复日志。一种方案是获取本地恢复日志,作为反馈给目标存储节点的恢复日志,这种方案可以适用于本地恢复日志完整的情况。另外一种方案是从本地获得恢复日志,还从其他正常存储节点获取恢复日志,进行合并、去除重复项后,作为反馈给目标存储节点的恢复日志,这种方案可以适用于本地恢复日志不完整的情况。If only one normal storage node receives the request to obtain a recovery log, the normal storage node feeds back the recovery log. One solution is to obtain a local recovery log as feedback to the recovery log of the target storage node. This scheme can be applied to the complete situation of the local recovery log. Another solution is to obtain the recovery log from the local, and obtain the recovery log from other normal storage nodes. After the merge and deduplication, the feedback log is fed back to the recovery log of the target storage node. This scheme can be applied to the local recovery log. Case.
对于被获取本地恢复日志的正常存储节点,如果同一个正常存储节点中有多份恢复日志(例如有多个存储空间,每个存储空间存储一个恢复日志),可以只发送其中一个恢复日志,也可以发送其中多个恢复日志。For a normal storage node that has obtained the local recovery log, if there are multiple recovery logs in the same normal storage node (for example, there are multiple storage spaces, each storage space stores one recovery log), you can send only one recovery log. You can send multiple recovery logs.
如果多个正常存储节点收到所述获取恢复日志的请求,则各自把本地的恢复日志反馈给所述目标存储节点,由目标存储节点进行合并、去除重复项后,再执行步骤14的操作。If the plurality of normal storage nodes receive the request for obtaining the recovery log, the local recovery logs are respectively fed back to the target storage node, and after the target storage node merges and removes the duplicate items, the operation of step 14 is performed.
对本步骤而言,正常存储节点获取本地存储的恢复日志,具体是指获取本存储节点中属于所述分区的空间存储的日志,如果有多个存储空间属于所述分区,每个存储空间存储一份恢复日志,则任意获取其中一份恢复日志即可。For this step, the normal storage node obtains the recovery log of the local storage, specifically, the log of the space storage belonging to the partition in the storage node. If there are multiple storage spaces belonging to the partition, each storage space stores one. If you want to recover the log, you can get one of the recovery logs at will.
需要说明的是,如前所述,分区由多个存储空间组成。组成同一个分区的多个存储空间来自于多个存储节点。其中,每个存储节点可以提供一个存储空间;也可以提供多个存储空间。每个存储空间存储一个分片;以及存储日志(恢复日志、业务日志等),不同存储空间的日志是镜像关系。It should be noted that, as described above, the partition is composed of a plurality of storage spaces. Multiple storage spaces that make up the same partition come from multiple storage nodes. Each storage node can provide one storage space; it can also provide multiple storage spaces. Each storage space stores one fragment; and stores logs (recovery logs, service logs, etc.), and logs of different storage spaces are mirrored.
在同一个存储节点提供多个存储空间的情况下,不同的存储空间可以来自于不同的硬盘,以提高系统可靠性。例如,一个存储节点提供3个存储空间时,这3个存储空间分别来自于这个存储节点的三个硬盘。由于日志伴随分片存储,所以每个存储空间都可以存储一份日志。因此日志数量和分片数量相同。In the case where the same storage node provides multiple storage spaces, different storage spaces can come from different hard disks to improve system reliability. For example, when a storage node provides three storage spaces, the three storage spaces are respectively from the three hard disks of the storage node. Because the logs are accompanied by shard storage, each storage space can store a single log. Therefore the number of logs and the number of shards are the same.
14,目标存储节点接收到恢复日志后,按照恢复日志的指示,向所述正常存储节点发送恢复请求。恢复请求的发送范围可以是所有正常存储节点。14. After receiving the recovery log, the target storage node sends a recovery request to the normal storage node according to the indication of the recovery log. The sending range of the recovery request can be all normal storage nodes.
对于业务日志的恢复,目标存储节点按照第一恢复日志描述的信息向所述正常存储节点发送业务日志恢复请求。For the recovery of the service log, the target storage node sends a service log recovery request to the normal storage node according to the information described in the first recovery log.
对于分片的恢复,目标存储节点按照第二恢复日志描述的信息向所述正常存储节点发送业务日志恢复请求。For the recovery of the fragment, the target storage node sends a service log recovery request to the normal storage node according to the information described in the second recovery log.
15,正常存储节点,在收到恢复请求后,按照恢复请求记录的信息的发送数据给所述目标存储节点。15. The normal storage node, after receiving the recovery request, sends the data of the information recorded in the recovery request to the target storage node.
对于第一恢复日志,如果第一恢复日志所携带的信息是业务日志的标识符,则直接把标识符对应的业务日志给所述目标存储节点。如果第一恢复日志所携带的信息描述的是业务日志类型和生成业务日志的时间段。则把满足业务日志的信息(指定的时间段内的、指定业务日志类型)的业务日志发送给所述目标存储节点。For the first recovery log, if the information carried in the first recovery log is an identifier of the service log, the service log corresponding to the identifier is directly sent to the target storage node. If the information carried in the first recovery log describes the service log type and the time period during which the service log is generated. Then, the service log that satisfies the information of the service log (the specified service log type in the specified time period) is sent to the target storage node.
如果在同一个存储节点中,有多份满足条件的业务日志(例如有多个存储空间,每个 存储空间存储一个业务日志),可以全部发送给所述目标存储节点,也可以只发其中一份。If there are multiple service logs that meet the conditions in the same storage node (for example, there are multiple storage spaces, each The storage space stores a service log), which can be sent to the target storage node, or only one of them can be sent.
对于第二恢复日志,按照第二恢复日志携带的分片信息找到本存储节点的分片,然后发送给所述目标存储节点。例如可以按照所述存储节点I D和所述对象ID找到本存储节点的分片。对于每个发送分片的正常存储节点,其发送的范围是位于所述分区的满足条件的所有分片。也就是说,如果同一个正常存储节点有多个分片,这些分片都要发送给所述目标存储节点。For the second recovery log, the fragment of the storage node is found according to the fragmentation information carried in the second recovery log, and then sent to the target storage node. For example, the slice of the storage node can be found according to the storage node ID and the object ID. For each normal storage node that sends a slice, the range of transmission is all fragments that satisfy the condition at the partition. That is, if there are multiple shards of the same normal storage node, these shards are sent to the target storage node.
16,目标存储节点接收其他存储节点发来的业务日志,生成目标业务日志。然后对生成的业务日志进行保存。16. The target storage node receives the service logs sent by other storage nodes, and generates a target service log. Then save the generated business log.
当第一恢复日志的信息是业务日志的标识符,则把收到的业务日志进行合并,就可以生成目标业务日志。这样就恢复了原本需要存储到目标存储节点的业务日志。在一些情况下,合并之后还可以再做去除重复项的操作,以生成原本需要存储到目标存储节点的业务日志。When the information of the first recovery log is an identifier of the service log, the received service logs are combined to generate a target service log. This restores the business log that would otherwise need to be stored to the target storage node. In some cases, the deduplication operation can be performed after the merge to generate a service log that needs to be stored to the target storage node.
当第一恢复日志所携带的信息描述的是业务日志类型和生成业务日志的时间段。则把收到的业务日志进行合并,然后去除重复项后就生成所述目标业务日志。这样就恢复了原本需要存储到目标存储节点的业务日志。The information carried in the first recovery log describes the service log type and the time period during which the service log is generated. Then, the received service logs are merged, and then the target service logs are generated after the duplicates are removed. This restores the business log that would otherwise need to be stored to the target storage node.
此外,目标存储节点还可以对分片进行恢复。其做法是:对收到的分片进行校验计算,生成原本需要存储到目标存储节点的分片。本实施例由所述目标存储节点获取分片并进行数据恢复的操作;可以理解的是,在其他实施例中,也可以由其他存储节点获取分片并进行数据恢复的操作,把恢复得到的分片发送给所述目标存储节点即可。In addition, the target storage node can also recover the fragments. The method is: performing a check calculation on the received fragments, and generating a fragment that needs to be stored to the target storage node. In this embodiment, the target storage node obtains the fragmentation and performs the data recovery operation. It can be understood that, in other embodiments, the storage may be obtained by other storage nodes and the data recovery operation may be performed. The fragment is sent to the target storage node.
17,主机从目标存储节点读取业务日志获取请求,按照获得的业务日志执行下一步操作。例如,业务日志描述分区的存储空间配额的消费情况,则按照存储空间配额的消费情况确定是否继续往分区写入数据。17. The host reads the service log acquisition request from the target storage node, and performs the next operation according to the obtained service log. For example, if the service log describes the consumption of the storage space quota of the partition, it is determined whether to continue writing data to the partition according to the consumption of the storage space quota.
基于本实施例提供的业务日志恢复方案,主机需要读取业务日志时,只需要向一个存储节点读取业务日志即可,不需要读取多份业务日志,因此时延更短,而且过程也更简单,此外也节约了主机和分布式存储系统之间的网络带宽。Based on the service log recovery solution provided by the embodiment, when the host needs to read the service log, the host only needs to read the service log to one storage node, and does not need to read multiple service logs, so the delay is shorter, and the process is also It's simpler, and it also saves network bandwidth between the host and the distributed storage system.
参见图6,是一种业务日志存储装置实施例结构图,可以执行上述日志恢复方法,由于在日志恢复方法实施例中已经有详细说明,因此下面仅进行简单介绍。所述业务日志存储装置2和正常存储装置通信,其特征在于,该业务日志存储装置2包括:Referring to FIG. 6 , it is a structural diagram of an embodiment of a service log storage device, which can perform the above-mentioned log recovery method. Since it has been described in detail in the embodiment of the log recovery method, only a brief introduction will be given below. The service log storage device 2 is in communication with the normal storage device, and the service log storage device 2 includes:
发送模块21,用于在所述业务日志存储装置从故障恢复正常后,向所述正常存储装置发送获取恢复日志的请求;The sending module 21 is configured to send a request for acquiring a recovery log to the normal storage device after the service log storage device recovers from a fault;
接收模块22,用于接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;处理模块23,用于按照所述第一恢复日志的指示,通过所述发送模块21向所述正常存储装置发送获取业务日志的请求;所述处理模块23,还用于根据收到的业务日志,生成目标业务日志;存储模块24,用于对所述目标业务日志进行保存,例如保存到存储介质中。The receiving module 22 is configured to receive the recovery log returned by another storage node, where the recovery log includes a first recovery log record, the first recovery log indicates a service log that needs to be restored, and the processing module 23 is configured to An instruction to obtain a service log is sent to the normal storage device by the sending module 21; the processing module 23 is further configured to generate a target service log according to the received service log; 24, for saving the target service log, for example, saving to a storage medium.
其中,所述接收模块22还用于:在所述业务日志存储装置发生故障后,接收主机发送的所述第一恢复日志并存储。The receiving module 22 is further configured to: after the service log storage device fails, receive the first recovery log sent by the host and store the first recovery log.
其中,所述处理模块23根据收到的业务日志生成所述目标业务日志,具体包括:所述处理模块23根据对收到的业务日志进行合并以及去除重复项目,生成所述目标业务日志。 The processing module 23 generates the target service log according to the received service log, which includes: the processing module 23 generates the target service log according to combining the received service logs and removing duplicate items.
其中,所述恢复日志记录还包括第二恢复日志记录,所述第二恢复日志记录中记录有分片位置信息;所述发送模块21还用于向所述存储装置发送分片获取请求,所述分片获取请求携带所述分片位置信息;所述接收模块22,还用于接收其他存储装置发送的分片;所述处理模块23,还用于使用接收到的所述分片,获得本存储节点的分片并保存。The recovery log record further includes a second recovery log record, wherein the second recovery log record records fragmentation location information; the sending module 21 is further configured to send a fragment acquisition request to the storage device, where The fragmentation acquisition request carries the fragmentation location information; the receiving module 22 is further configured to receive a fragment sent by another storage device; the processing module 23 is further configured to obtain, by using the received fragmentation, Fragments of this storage node are saved.
更进一步的,主机和所述业务日志存储装置通信。主机用于向所述业务日志存储装置发送业务日志获取请求。所述接收模块22收到所述业务日志获取请求后,所述发送模块21,还用于把所述目标业务日志发送给所述主机;所述主机还用于按照所述目标业务日志执行操作。Further, the host communicates with the service log storage device. The host is configured to send a service log obtaining request to the service log storage device. After the receiving module 22 receives the service log obtaining request, the sending module 21 is further configured to send the target service log to the host; the host is further configured to perform operations according to the target service log. .
所述业务日志存储装置2和所述正常存储装置属于同一个分区。The service log storage device 2 and the normal storage device belong to the same partition.
所述业务日志存储装置可以是硬件,例如存储节点,本发明可以称之为业务日志存储节点。物理上可以是存储控制器,或者存储服务器,或者存储控制器和存储介质的组合。The service log storage device may be hardware, such as a storage node, and the present invention may be referred to as a service log storage node. Physically it may be a storage controller, or a storage server, or a combination of a storage controller and a storage medium.
参见图7,是本发明业务日志存储节点实施例的拓扑图。FIG. 7 is a topological diagram of an embodiment of a service log storage node of the present invention.
业务日志存储节点3,所述业务日志存储节点3位于分布式存储系统中,所述分布式存储系统包括所述业务日志存储节点3和正常存储节点(未图示),所述业务日志存储节点3包括处理器31、内存32和存储介质33。所述内存32中有程序,所述处理器31用于通过运行内存中的程序,以执行:在所述业务日志存储节点从故障恢复正常后,向所述正常存储节点发送获取恢复日志的请求;接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;按照所述第一恢复日志的指示,向所述正常存储节点发送获取业务日志的请求;根据收到的业务日志,生成目标业务日志;把所述目标业务日志保存到所述存储介质33中。The service log storage node 3 is located in a distributed storage system, and the distributed storage system includes the service log storage node 3 and a normal storage node (not shown), and the service log storage node 3 includes a processor 31, a memory 32, and a storage medium 33. The memory 32 has a program, and the processor 31 is configured to execute, by running a program in the memory, to send a request for acquiring a recovery log to the normal storage node after the service log storage node recovers from the fault. Receiving the recovery log returned by the other storage node, the recovery log includes a first recovery log record, the first recovery log indicating a service log that needs to be restored; according to the indication of the first recovery log, to the The normal storage node sends a request for obtaining a service log; and generates a target service log according to the received service log; and saves the target service log to the storage medium 33.
在另外一些实施例中,例如处理器是现场可编程门阵列(FPGA)时,处理器本身有记忆程序的能力,因此不需要内存。In other embodiments, such as when the processor is a Field Programmable Gate Array (FPGA), the processor itself has the ability to memorize programs and therefore does not require memory.
本发明还提供光盘/U盘等存储介质的实施例,光盘/U盘中存储计算机程序,把该程序安装到计算机、存储服务器或者存储控制器中后,通过运行程序,可以执行步骤11-16提及的方法。 The present invention also provides an embodiment of a storage medium such as an optical disk/U disk, in which a computer program is stored in the optical disk/U disk, and after the program is installed in a computer, a storage server or a storage controller, steps 11-16 can be executed by running the program. The method mentioned.

Claims (13)

  1. 一种日志恢复方法,应用于目标存储节点,所述目标存储节点位于分布式存储系统中,所述分布式存储系统包括所述目标存储节点和正常存储节点,其特征在于,该方法包括:A log recovery method is applied to a target storage node, where the target storage node is located in a distributed storage system, the distributed storage system includes the target storage node and a normal storage node, and the method includes:
    所述目标存储节点从故障恢复正常后,向所述分布式存储系统中的所述正常存储节点发送获取恢复日志的请求;After the target storage node recovers from the fault, sending a request for acquiring a recovery log to the normal storage node in the distributed storage system;
    所述目标存储节点接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;The target storage node receives the recovery log returned by another storage node, where the recovery log includes a first recovery log record, and the first recovery log indicates a service log that needs to be restored;
    按照所述第一恢复日志的指示,所述目标存储节点向所述正常存储节点发送获取业务日志的请求;And the target storage node sends a request for acquiring a service log to the normal storage node according to the indication of the first recovery log;
    所述目标存储节点根据收到的业务日志,生成目标业务日志;Generating, by the target storage node, a target service log according to the received service log;
    对所述目标业务日志进行保存。The target service log is saved.
  2. 根据权利要求1所述的方法,该方法之前,进一步包括:The method of claim 1 before the method further comprising:
    所述目标存储节点发生故障后,所述正常存储节点接收主机发送的所述第一恢复日志并存储。After the target storage node fails, the normal storage node receives the first recovery log sent by the host and stores it.
  3. 根据权利要求1所述的方法,其中,所述目标存储节点根据收到的业务日志生成所述目标业务日志,具体包括:The method according to claim 1, wherein the target storage node generates the target service log according to the received service log, which specifically includes:
    所述目标存储节点根据对收到的业务日志进行合并以及去除重复项目,生成所述目标业务日志。The target storage node generates the target service log according to combining the received service logs and removing duplicate items.
  4. 根据权利要求1所述的方法,其中,所述恢复日志记录还包括第二恢复日志记录,所述第二恢复日志记录中记录有分片位置信息,所述方法进一步包括:The method of claim 1, wherein the recovery log record further comprises a second recovery log record, wherein the second recovery log record records fragmentation location information, the method further comprising:
    所述目标存储节点向分布式存储系统中的其他存储节点发送分片获取请求,所述分片获取请求携带所述分片位置信息;Sending, by the target storage node, a fragmentation acquisition request to other storage nodes in the distributed storage system, where the fragment acquisition request carries the fragmentation location information;
    所述目标存储节点接收其他存储节点发送的分片;The target storage node receives the fragments sent by other storage nodes;
    所述目标存储节点使用接收到的分片,获得本存储节点的分片并保存。The target storage node obtains the fragment of the storage node and saves it by using the received fragment.
  5. 根据权利要求1所述的方法,该方法进一步包括:The method of claim 1 further comprising:
    主机向所述目标存储节点发送业务日志获取请求;The host sends a service log obtaining request to the target storage node;
    所述目标存储节点把所述目标业务日志发送给所述主机;Sending, by the target storage node, the target service log to the host;
    所述主机按照所述目标业务日志执行操作。The host performs an operation according to the target service log.
  6. 根据权利要求1所述的方法,其中:The method of claim 1 wherein:
    所述目标存储节点和所述正常存储节点属于同一个分区。The target storage node and the normal storage node belong to the same partition.
  7. 一种业务日志存储装置,所述业务日志存储装置和正常存储装置通信,其特征在于,该业务日志存储装置包括:A service log storage device, wherein the service log storage device communicates with a normal storage device, wherein the service log storage device comprises:
    发送模块,用于在所述业务日志存储装置从故障恢复正常后,向所述正常存储装置发送获取恢复日志的请求;a sending module, configured to send, to the normal storage device, a request for acquiring a recovery log after the service log storage device recovers from a fault;
    接收模块,用于接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;a receiving module, configured to receive the recovery log returned by another storage node, where the recovery log includes a first recovery log record, where the first recovery log indicates a service log that needs to be restored;
    处理模块,用于按照所述第一恢复日志的指示,通过所述发送模块向所述正常存储装置发送获取业务日志的请求; a processing module, configured to send, by using the sending module, a request for acquiring a service log to the normal storage device according to the indication of the first recovery log;
    所述处理模块,还用于根据收到的业务日志,生成目标业务日志;The processing module is further configured to generate a target service log according to the received service log;
    存储模块,用于对所述目标业务日志进行保存。a storage module, configured to save the target service log.
  8. 根据权利要求7所述的业务日志存储装置,所述接收模块还用于:The service log storage device of claim 7, the receiving module is further configured to:
    在所述业务日志存储装置发生故障后,接收主机发送的所述第一恢复日志并存储。After the service log storage device fails, the first recovery log sent by the host is received and stored.
  9. 根据权利要求7所述的业务日志存储装置,其中,所述处理模块根据收到的业务日志生成所述目标业务日志,具体包括:The service log storage device according to claim 7, wherein the processing module generates the target service log according to the received service log, which specifically includes:
    所述处理模块根据对收到的业务日志进行合并以及去除重复项目,生成所述目标业务日志。The processing module generates the target service log according to combining the received service logs and removing duplicate items.
  10. 根据权利要求7所述的业务日志存储装置,其中,所述恢复日志记录还包括第二恢复日志记录,所述第二恢复日志记录中记录有分片位置信息,其中:The service log storage device of claim 7, wherein the recovery log record further comprises a second recovery log record, wherein the second recovery log record records fragmentation location information, wherein:
    所述发送模块还用于向所述存储装置发送分片获取请求,所述分片获取请求携带所述分片位置信息;The sending module is further configured to send a fragment acquisition request to the storage device, where the fragment acquisition request carries the fragmentation location information;
    所述接收模块,还用于接收其他存储装置发送的分片;The receiving module is further configured to receive a fragment sent by another storage device;
    所述处理模块,还用于使用接收到的所述分片,获得本存储节点的分片并保存。The processing module is further configured to use the received fragment to obtain a fragment of the storage node and save the fragment.
  11. 根据权利要求7所述的业务日志存储装置,其中:A service log storage device according to claim 7, wherein:
    所述接收模块,还用于接收主机发送的业务日志获取请求;The receiving module is further configured to receive a service log obtaining request sent by the host;
    所述发送模块,还用于把所述目标业务日志发送给所述主机,以使所述主机按照所述目标业务日志执行操作。The sending module is further configured to send the target service log to the host, so that the host performs an operation according to the target service log.
  12. 根据权利要求7所述的业务日志存储装置,其中:A service log storage device according to claim 7, wherein:
    所述业务日志存储装置和所述正常存储装置属于同一个分区。The service log storage device and the normal storage device belong to the same partition.
  13. 一种业务日志存储节点,所述业务日志存储节点位于分布式存储系统中,所述分布式存储系统包括所述业务日志存储节点和正常存储节点,其特征在于,所述业务日志存储节点包括处理器和存储介质,所述处理器用于执行:A service log storage node, where the service log storage node is located in a distributed storage system, the distributed storage system includes the service log storage node and a normal storage node, wherein the service log storage node includes processing And a storage medium for executing:
    在所述业务日志存储节点从故障恢复正常后,向所述正常存储节点发送获取恢复日志的请求;After the service log storage node recovers from the fault, sending a request for obtaining a recovery log to the normal storage node;
    接收其他存储节点返回的所述恢复日志,所述恢复日志包括第一恢复日志记录,所述第一恢复日志指示了需要恢复的业务日志;Receiving, by the other storage node, the recovery log, where the recovery log includes a first recovery log record, where the first recovery log indicates a service log that needs to be restored;
    按照所述第一恢复日志的指示,向所述正常存储节点发送获取业务日志的请求;And sending, according to the indication of the first recovery log, a request for acquiring a service log to the normal storage node;
    根据收到的业务日志,生成目标业务日志;Generate a target service log based on the received service logs.
    把所述目标业务日志保存到所述存储介质中。 Saving the target business log to the storage medium.
PCT/CN2017/081334 2016-11-30 2017-04-21 Log recovery method, storage device and storage node WO2018098972A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611089594.3A CN106776130B (en) 2016-11-30 2016-11-30 Log recovery method, storage device and storage node
CN201611089594.3 2016-11-30

Publications (1)

Publication Number Publication Date
WO2018098972A1 true WO2018098972A1 (en) 2018-06-07

Family

ID=58914287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/081334 WO2018098972A1 (en) 2016-11-30 2017-04-21 Log recovery method, storage device and storage node

Country Status (2)

Country Link
CN (1) CN106776130B (en)
WO (1) WO2018098972A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176945A (en) * 2019-12-28 2020-05-19 浪潮电子信息产业股份有限公司 Node fault positioning method, device, equipment and computer readable storage medium
CN111176900A (en) * 2019-12-30 2020-05-19 浪潮电子信息产业股份有限公司 Distributed storage system and data recovery method, device and medium thereof
CN112711382A (en) * 2020-12-31 2021-04-27 百果园技术(新加坡)有限公司 Data storage method and device based on distributed system and storage node
CN112711382B (en) * 2020-12-31 2024-04-26 百果园技术(新加坡)有限公司 Data storage method and device based on distributed system and storage node

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959390B (en) * 2018-06-01 2019-10-18 新华三云计算技术有限公司 Resource-area synchronous method and device after shared-file system node failure
CN109165117B (en) * 2018-06-29 2022-05-31 华为技术有限公司 Data processing method and system
CN109491773B (en) * 2018-09-28 2021-07-27 创新先进技术有限公司 Compensation task scheduling method, device and system based on time slicing
CN111198853B (en) * 2018-11-16 2023-08-22 北京微播视界科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN111382007A (en) * 2018-12-28 2020-07-07 深圳市茁壮网络股份有限公司 Data recovery method and device and electronic equipment
CN110442560B (en) * 2019-08-14 2022-03-08 上海达梦数据库有限公司 Log replay method, device, server and storage medium
CN111142792B (en) * 2019-12-17 2022-11-22 尧云科技(西安)有限公司 Power-down protection method of storage device
CN111432280B (en) * 2020-03-19 2021-10-01 福建捷联电子有限公司 Block chain based automatic repair method for protected data of television
CN111488238B (en) * 2020-06-24 2020-09-18 南京鹏云网络科技有限公司 Block storage node data restoration method and storage medium
CN113301397A (en) * 2021-02-19 2021-08-24 阿里巴巴集团控股有限公司 CDN-based audio and video transmission, playing and delay detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706811A (en) * 2009-11-24 2010-05-12 中国科学院软件研究所 Transaction commit method of distributed database system
CN103051681A (en) * 2012-12-06 2013-04-17 华中科技大学 Collaborative type log system facing to distribution-type file system
US20130117237A1 (en) * 2011-11-07 2013-05-09 Sap Ag Distributed Database Log Recovery
CN103761161A (en) * 2013-12-31 2014-04-30 华为技术有限公司 Method, server and system for data recovery
CN105159818A (en) * 2015-08-28 2015-12-16 东北大学 Log recovery method in memory data management and log recovery simulation system in memory data management

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102891873B (en) * 2011-07-21 2017-02-15 腾讯科技(深圳)有限公司 Method for storing log data and log data storage system
CN104601354B (en) * 2013-10-31 2019-05-17 深圳市腾讯计算机系统有限公司 A kind of business diary storage method, apparatus and system
CN105095013B (en) * 2015-06-04 2017-11-21 华为技术有限公司 Date storage method, restoration methods, relevant apparatus and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706811A (en) * 2009-11-24 2010-05-12 中国科学院软件研究所 Transaction commit method of distributed database system
US20130117237A1 (en) * 2011-11-07 2013-05-09 Sap Ag Distributed Database Log Recovery
CN103051681A (en) * 2012-12-06 2013-04-17 华中科技大学 Collaborative type log system facing to distribution-type file system
CN103761161A (en) * 2013-12-31 2014-04-30 华为技术有限公司 Method, server and system for data recovery
CN105159818A (en) * 2015-08-28 2015-12-16 东北大学 Log recovery method in memory data management and log recovery simulation system in memory data management

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176945A (en) * 2019-12-28 2020-05-19 浪潮电子信息产业股份有限公司 Node fault positioning method, device, equipment and computer readable storage medium
CN111176900A (en) * 2019-12-30 2020-05-19 浪潮电子信息产业股份有限公司 Distributed storage system and data recovery method, device and medium thereof
CN112711382A (en) * 2020-12-31 2021-04-27 百果园技术(新加坡)有限公司 Data storage method and device based on distributed system and storage node
CN112711382B (en) * 2020-12-31 2024-04-26 百果园技术(新加坡)有限公司 Data storage method and device based on distributed system and storage node

Also Published As

Publication number Publication date
CN106776130B (en) 2020-07-28
CN106776130A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2018098972A1 (en) Log recovery method, storage device and storage node
US11734306B2 (en) Data replication method and storage system
US9454446B2 (en) System and method for using local storage to emulate centralized storage
US10114580B1 (en) Data backup management on distributed storage systems
US8874508B1 (en) Systems and methods for enabling database disaster recovery using replicated volumes
US7552295B2 (en) Maintaining consistency when mirroring data using different copy technologies
US7801867B2 (en) Optimizing backup and recovery utilizing change tracking
CN108351821B (en) Data recovery method and storage device
US7761431B2 (en) Consolidating session information for a cluster of sessions in a coupled session environment
US20080162599A1 (en) Optimizing backup and recovery utilizing change tracking
CN105302667B (en) High reliability data backup and restoration methods based on aggregated structure
CN105824846B (en) Data migration method and device
CN110825562B (en) Data backup method, device, system and storage medium
WO2017097233A1 (en) Fault tolerance method for data storage load and iptv system
CN106528338A (en) Remote data replication method, storage equipment and storage system
CN113885809B (en) Data management system and method
JP2015527620A (en) Computer system, server, and data management method
WO2019109256A1 (en) Log management method, server and database system
KR20120090320A (en) Method for effective data recovery in distributed file system
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
WO2020034695A1 (en) Data storage method, data recovery method, apparatus, device and storage medium
WO2018107460A1 (en) Object-based copying method and apparatus, and object-based storage device
CN117851121A (en) Object-based remote snapshot copying method and system
CN117851102A (en) Data restoration method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17876464

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17876464

Country of ref document: EP

Kind code of ref document: A1