CN111488238B - Block storage node data restoration method and storage medium - Google Patents

Block storage node data restoration method and storage medium Download PDF

Info

Publication number
CN111488238B
CN111488238B CN202010588697.4A CN202010588697A CN111488238B CN 111488238 B CN111488238 B CN 111488238B CN 202010588697 A CN202010588697 A CN 202010588697A CN 111488238 B CN111488238 B CN 111488238B
Authority
CN
China
Prior art keywords
repair
data
node
page
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010588697.4A
Other languages
Chinese (zh)
Other versions
CN111488238A (en
Inventor
邱重阳
童颖睿
陈靓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Peng Yun Network Technology Co ltd
Original Assignee
Nanjing Peng Yun Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Peng Yun Network Technology Co ltd filed Critical Nanjing Peng Yun Network Technology Co ltd
Priority to CN202010588697.4A priority Critical patent/CN111488238B/en
Publication of CN111488238A publication Critical patent/CN111488238A/en
Application granted granted Critical
Publication of CN111488238B publication Critical patent/CN111488238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Abstract

The invention discloses a data recovery method for a block storage node, which comprises the steps that a fault recovery node initiates a data recovery starting request to a main node, and the main node receives the data recovery request and returns the latest log ID to the fault recovery node; carrying out log synchronization between the nodes and the main node in fault repair, and marking whether the current page needs to be repaired or not according to the synchronized log condition; for the pages marked as needing to be repaired, the nodes in fault repair register to the QOS controller and apply for the number of the pages needed for repair; and the node in fault repair sends a data repair request to the main node to repair the data. The invention also provides a storage medium, which can ensure that the read-write service to the client is not interrupted while the data is repaired.

Description

Block storage node data restoration method and storage medium
Technical Field
The present invention belongs to the field of distributed storage, and more particularly, to a method for restoring data of a block storage node and a storage medium.
Background
With the rapid development of the internet and the arrival of a big data era, the dependence of enterprises on storage is increasingly increased, and the cost of a large number of high-end hosts and traditional storage is very high; and the low-end blade machine and the cheap disk are the preferred storage architecture of more and more enterprises together with the distributed storage software. In the context of large-scale data storage nodes, host failures and disk failures of the storage nodes are not sporadic events but are normal. How to solve the problem of providing a storage service with high availability and high security in the case of a normalized hardware failure is a problem to be considered by all distributed storage service providers.
Currently, repair techniques for distributed data storage are copy-based repair, code-based repair, and router-acceleration-based repair.
Copy-based data repair: the storage node stores copy data of a source file, and the newNode acquires data from any provider during repair and can also download data from multiple providers in parallel to reduce transmission time. When a copy is lost or damaged, the system needs to establish a new copy, and therefore the system selects a storage node as a newNode, the newNode receives data from at least one storage node, and a node providing data to the newNode is called provider.
The disadvantages of this technique are: the nodes need to store a large amount of data, each storage node needs to store one file copy, the storage redundancy is large, and a large amount of storage resources are wasted. The repair time is long, the whole file needs to be transmitted, and a large amount of network bandwidth resources are occupied.
Data repair based on coding (erasure coding): the source file is encoded before being stored to the storage node. The whole file is divided into k blocks, n coding blocks can be obtained after coding, and any k blocks in the n coding blocks can restore the source file. Each storage node stores one coding block. When repairing, the newNode at least needs to download the coding blocks from the k providers, and the newNode re-encodes the received coding blocks to obtain a new coding block.
However, erasure codes present a problem in repairing corrupted data nodes: repairing a data block of size M1 requires downloading a total of data blocks of size k × M1 from k different nodes over the network connection, which makes repairing bandwidth expensive.
The data repair method based on the router acceleration improves the repair efficiency, but because all repair management is also responsible for the management nodes, the management nodes have larger load and have certain requirements on the performance and the function of the router.
Currently, the main adopted method for repairing damaged data nodes is as follows: and physically isolating the storage nodes with the damaged data, identifying the damaged parts in the storage nodes, and performing overwriting operation on the originally stored data in the damaged parts, wherein if the data is written successfully, the damaged parts are repaired.
When the damaged part is repaired, the storage node of the damaged part needs to be isolated from the system, and the storage node is repaired independently, so that the distributed storage system cannot feed back the reading request of the client in the process, and only after the repair is successful, the distributed storage system can feed back the reading request of the client normally, namely, the service interruption phenomenon occurs in the distributed storage system in the data repair process, and the service performance of the distributed storage system is influenced.
The invention improves the data repair technology based on the copy, so as to overcome the problem that the service interruption occurs in the data repair process in the prior art.
Disclosure of Invention
1. Problems to be solved
Aiming at the problems that in the data repair process of the data repair technology based on the copy in the prior art, the nodes in the copy repair process need to process both data repair and external service, and the data repair and the external service are easy to influence each other to cause, the data repair technology based on the copy comprises the following three points:
(1) the data writing in the data repair process causes the reduction of the read-write performance of the external service,
(2) data writing for data repair conflicts with data writing for external services easily occur, which causes interruption of the external services,
(3) data repair and external services may operate the disk at the same time, and the performance of the disk cannot be effectively utilized, resulting in an elongated data repair process.
The invention provides a data recovery method for a block storage node and a storage medium.
2. Technical scheme
In order to solve the problems, the technical scheme adopted by the invention is as follows: a data repair method for a block storage node comprises the following steps:
s1, the node in fault repair sends a request for starting data repair to the main node, and the main node receives the data repair request and returns the latest log ID to the node in fault repair;
s2, carrying out log synchronization between the nodes and the main node in fault recovery, and marking whether the current page needs to be recovered according to the synchronized log condition;
if PageRIf the log ID is less than or equal to the ID of the latest log of the main node, the Page is directly discardedRAnd sets PageRIn order to need to repair;
if PageRIf the log ID is greater than the ID of the latest log of the master node, then:
a、if PageRCan cover the data of the whole Page, then PageRWrite the Page to the journal of, and set PageRTo have completed the repair;
b. if PageRCannot cover the entire Page of data, and PageRIf the repair is not completed, the Page is directly connectedRThe log data is lost and the client is informed of the completion of the repair through the data distribution module, and then the Page is setRLog synchronization is carried out after the ID of the latest Log of the main node reaches the Max Log ID, wherein PageRFor the Page of the node in fault repair, Max Log ID is PageRLog ID of (c);
and S3, performing data repair on the pages marked as needing repair.
When the technical scheme is used for repairing data, the minimum IO reading unit page is used as a unit for repairing, the accuracy of data repairing can be guaranteed, and if the content in the log cannot cover the whole page, namely the data repairing cannot be carried out temporarily, the node loses the content in the log in fault repairing, and sends information to the client to indicate that the repairing is finished, so that the interruption of reading and writing services of the client in the data repairing process is avoided, and the data repairing is carried out when the latest log of the main node is waited to be as new as the log of the original data repairing node, so that the correctness of the data repairing is guaranteed.
Further, the step S3 includes the following steps:
s31, the node in fault repairing applies for registration to the QOS controller, if no other data segment unit is repaired at present, the node in fault repairing is allowed to register, otherwise, the node is not registered;
s32, the successfully registered failure repairing node applies for repairing data volume to the QOS controller, and the QOS controller determines the repairing data volume distributed to the failure repairing node according to the current disk idle rate and the disk throughput;
and S33, the QOS controller adjusts the repair data volume according to the disk idle rate and the disk throughput.
The QOS controller controls and adjusts the repair data volume, so that a good repair rate can be obtained.
Further, the step S32 includes the following steps:
a. if the read-write service of the client does not exist currently, the QOS controller sets the repair data volume as the throughput/P of the disk, wherein P is the size of a page;
b. if the client read-write service exists, the QOS controller further judges whether the current disk has idle resources;
c. if no free resources exist, setting the repair data volume as a preset value;
d. if the node has idle resources, further judging whether the node in fault repair is the first application repair; if so, setting the repair data volume of the node in fault repair as a preset value;
if not, the QOS controller determines the repair data quantity applied at the time according to the repair data quantity applied at the last time and the disk throughput.
The QOS controller determines the repair data volume distributed to the node in fault repair according to the current disk idle rate and the disk throughput, so that better repair rate can be further obtained, and the utilization rate of the disk is ensured.
Further, if the current disk has idle resources and the node in the fault repair is not the first application repair, the QOS controller determines the repair data volume according to the current disk utilization rate.
Further, if the disk utilization rate is less than or equal to 75%, the applied repair data volume is 1M (1+ (1-disk utilization rate)/P); if the disk utilization ratio is more than 75%, the applied repair data amount = Max (1M/P, 1M (1-disk utilization ratio)/P).
Further, the step S33 includes:
a. if the data is repaired according to the preset repair data volume and the disk idle rate is more than or equal to 50 percent, increasing the repair data volume;
b. if the client requests an increase during repair, the QOS controller limits the amount of repair data for the subsequent application.
In the data repair process, the QOS controller adjusts the data repair amount in the data repair process, and improves the data repair rate while ensuring the utilization rate of the disk.
The invention also provides a storage medium on which a computer program is stored, which when executed, implements the method described above.
3. Advantageous effects
Compared with the prior art, the invention has the beneficial effects that:
(1) during the data recovery period, the external read-write service of the client is not interrupted, the read-write speed of the client is not influenced, and better availability is provided;
(2) the invention utilizes the QOS controller to adjust the data repair quantity, so that the applied data repair quantity is more consistent with the repair rate of the current disk throughput, thereby improving the data repair efficiency.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of parallel access of a data repair flow and a client write flow in the present invention;
fig. 3 is a flow chart of the QOS controller operation in the present invention.
Detailed Description
The invention is further described with reference to specific examples.
As is known, in order to improve the security of data, distributed block storage sets multiple backups for each data, and the common data backup strategies include two backups, three backups, five backups, and the like. The data nodes are generally divided into main nodes and standby nodes, and the main nodes and the standby nodes form a node group to provide distributed storage service externally.
The invention is mainly a solution proposed on the premise of a plurality of backups, and the precondition is that any Quorum backups (in general, Quorum refers to most of the backups, more than half) in the plurality of backups can form complete data, even if one backup is in a Joining state in fault repair; it should be noted that, in the case of two backups, since the join state backup in the failure repair does not have the full amount of data, before the node is completely repaired, the fault tolerance of the system is reduced to the extent that the master node cannot fail, otherwise, the normal service cannot be provided. For clarity, the three backup policies are described as an example in this embodiment.
The whole block storage System is divided into a drive module, a data distribution module Coordinator and a data storage module DataNode, wherein the drive module is an external Interface provided by block storage, and a user can access the block storage System through the Interface and also can provide standard block storage access service such as ISCSI (Internet Small Computer System Interface) on the basis of the drive module. The data distribution module Coordinator is mainly responsible for data distribution, receives the data request of the drive module, distributes the data from the client to all backup nodes, distributes the data to different Segment units under the Segment, and returns the result to the drive module under the condition of ensuring the requirement of distributed consistency. The data storage module DataNode is responsible for data storage, namely, data is written into a disk, and the position indexes of the disk and the data are managed at the same time.
The invention divides the stored data into data Segment segments for management, the data Segment segments generate different data Segment units according to the backup strategy, and the data Segment units are generated under different data storage module DataNode nodes. The data backup of each Segment Unit in the Segment ensures the security of the data in the whole Segment.
Segment is a data Segment obtained by dividing storage data, external write data is stored in the Segment, data needs to be backed up in multiple ways according to a backup strategy, and a Segment Unit is a data Segment Unit. In the case of a triple backup policy, one Segment includes three Segment units, one of which is a master node, and the other two are backup nodes.
Data consistency is ensured among the Segment units through a Paxos protocol (consistency protocol based on message delivery), and of course, the data consistency can also be ensured through other similar protocols; when all Segment units of the Segment normally work, we call the stable state; when only half of the data Segment units are in normal operation, the data Segment units are called as available states; otherwise, it is referred to as an unavailable state.
The available state can also provide the block storage service normally, but the available state does not have high available fault tolerance capability any more, and if one node is abnormal, the data service is interrupted. Therefore, it is necessary to keep the data Segment in a stable state as much as possible, and to repair the failed node and reach the stable state as soon as possible.
When a data storage module DataNode node is recovered to use after a fault occurs, the node will add a data Segment to become a data Segment Unit constituting the data Segment, because the data Segment Unit loses the data write-in coming during the fault, the data needs to be synchronized with other members of the data Segment to reach the latest state, and at this time, the state of the data Segment Unit is called as the Joining state in fault repair.
When a brand-new data storage module DataNode node is added, the block storage system can balance the pressure of each node, a new data Segment Unit is created in the new node to replace the data Segment Unit of the old node, at this time, the newly created data Segment Unit needs to synchronize data to other members to reach the latest state, and at this time, the state of the data Segment Unit is also called a Joining state in fault repair.
The Joining state in fault repair is a state set by a data Segment Unit in the repair process, the data Segment Unit in data repair is called a Joining Segment Unit in fault repair, and only a standby node can have the state in fault repair, so the state is also called a standby node in fault repair; it should be noted that, the state in fault repair only occurs in the standby node, that is, the master node does not fail, but the standby node becomes the standby node after the master node fails and is repaired, and the standby node that does not fail is changed into the master node, so that the node that performs fault repair is the standby node.
The invention mainly solves the problem of high availability of data when the data Segment has a data Segment unit Joining Segment unit in fault repair; and meanwhile, under the control of a QOS (Quality of Service) controller, the fault or newly-added backup node data is restored to the latest state as soon as possible on the premise of not influencing the Service of the client. Simply, during data repair, the block store does not interrupt the external service and does not affect client IO throughput.
For easy understanding, we introduce the concept of Segment list Segment Membership, and for the Segment Unit in repair state, the Segment list Segment Membership is composed of a primary node, a backup node, and a repair backup node, and their respective state information. The Segment list Segment Membership is a data structure that exclusively stores the members of the Segment and their status. For example, in the three-backup strategy based on the present invention, under normal conditions, the data Segment has three members, one master node and two backup nodes; if a node is in the repair state, a member of the Segment becomes the node in the fault repair, and all the information is stored in the Segment list Segment Membership. In this case, any two nodes may constitute complete data.
The Segment Unit in Joining state in fault repair needs to perform new data writing and synchronization of lost data during fault at the same time. The problems of no interruption to external service, quality of service control and the like need to be considered at this time.
As shown in fig. 1 and fig. 2, the data repair process of the present invention is as follows:
1. and the node in fault repair initiates a data repair request to the main node according to the latest Log ID, namely the Log ID, owned by the node.
2. The main node returns to receive data restoration, and notifies the node in fault restoration of a latest Log IDLog ID, which is recorded as a Catch Up Log ID, wherein the Log Log is a method for writing recorded data in a storage system, and the Log Log is written first and then stores the data according to the content of the Log Log to avoid data abnormality, each Log has a Log ID, the Log ID is a natural number, the size of the Log ID is also the sequence of data writing, and the latest Log ID is also the largest, so the main node notifies the node in fault restoration of the latest Log ID of the current main node as a necessary step for data restoration, and the data restoration is performed according to the data of the Log.
3. The method comprises the following steps that a node performs data Log Log synchronization to a main node in fault recovery, and specifically comprises the following steps:
3.1 if the Log ID (identity) of the node in fault restoration is less than or equal to the Catch Up Log ID, directly discarding the data of the Log of the node in fault restoration and setting that the Page Page where the Log is located needs to copy the data from the main node; in this case, the latest data stored in the master node is more recent than the latest data stored in the Log of the node under fault recovery, so that the data in the Log of the node under fault recovery can be directly discarded and then copied from the master node.
3.2 if the Log ID Log ID of the node in fault repairing is larger than the Catch Up Log ID, setting the Log ID Log ID of the node in fault repairing to be Max Log ID. If the data owned by the Log of the node in fault repair can cover the whole Page, writing the data into the Page, and setting the Page to complete data repair, for convenience of description, the Page of the node in fault repair can be named as the PageR(ii) a If the Log Log of the node in fault repair has insufficient data to cover the whole Page, and the Page of the node in fault repairRIf the Log is marked to be required to repair data and the repair is not finished, directly dropping the data of the Log and informing the client of Page of the node in the fault repairRAfter the Page is repaired, setting the Log ID of the main node required by the Page to reach the Max Log ID writing completion, and then performing Log LAnd g synchronization. If the data in the Log of the node in fault repair can cover the whole Page, writing the data in the Log of the node in fault repair into a disk; if the data in the Log is not enough to cover the whole Page, directly discarding the data in the Log, and repairing the fault to the Page of the nodeRAnd marking the state which needs data repair and does not finish repair, and performing Log synchronization after the Log ID (identity) of the main node reaches the Max Log ID.
4. And the nodes in fault repair the pages marked as needing repair one by one, firstly apply for registration and repair data volume to the QOS controller, and then send data replication requests to the main nodes for repair.
5. And when all the pages marked as needing to be copied are repaired completely, marking that the data repair is completed, changing the nodes in the fault repair into normal standby nodes, and changing the data Segment into a stable state.
During node repair, the node in failover also needs to accept writes of new data so as not to be in a state that it needs to copy data to other nodes all the time. The new data writing and the data repair lost during the failure may involve the same Page, and if the new writing depends on the repair completion of the Page, the new writing may take a long time to succeed, which may eventually lead to service interruption of the client.
The invention can avoid the interruption of the service of the client in the data repair process, and sets a repair state for each Page in the data repair process, including the completion of the repair, the repair neutralization and the repair. The repaired Page can normally accept new writes; when a Page is to be repaired or repaired, if a new write can cover the whole Page, the data lost during the previous fault period can be considered to be covered, and the repair is not needed, so that the new write is directly accepted, and then the Page is set to be in a state of being repaired; if the new write-in can not cover the whole Page, the node directly discards the new write-in content in the fault repair, the data storage module DataNode indicates to the client through the data distribution module Coordinator that the Page has received the newly written-in content, that is, the repair is completed, so as to avoid interrupting the read-write service to the client, but in order to ensure the consistency and accuracy of the data, the Page needs to be set to a state to be repaired, and the host node needs to wait for the write-in of the latest Log, that is, the data repair is performed when the new write-in content can cover the whole Page.
It should be noted that a Page is the minimum management Unit of the storage space in the storage system, for example, if one Segment Unit includes 1G of storage space, it is divided into N Page pages, where N is a positive integer, and if the size of a Page is set to 8K, N = 128. Each Log belongs to one Page, and only belongs to one Page, and one Page can have a plurality of logs, so that Log writing is Page writing, namely data is stored in a disk; the Log is stored in the cache disk before being written into the disk, and the data recovery according to the present invention is to write the data corresponding to the latest Log from the cache disk into the disk, which should be understood by those skilled in the art. If a certain Log of a node in fault repair needs to be repaired, the whole Page needs to be repaired, namely all logs related to the whole Page are repaired, so that the Max Log ID of the Page needs to be submitted to the main node, namely data replication needs to be performed when the Log of the main node is up to date.
During data recovery, the data storage module DataNode performs effective flow control on data recovery by monitoring the throughput of read-write data of the disk. The specific control flow of the QOS controller is as follows:
1. the Segment Unit of the data Segment needing data repair applies for registration to the QOS controller;
2. the QOS controller checks whether the disk where the Segment Unit is located has the Segment Unit which is performing data repair, if so, the Segment Unit which is replying needs to be re-registered for a period of time; otherwise, the successful registration of the Segment Unit is returned to control that only one Segment Unit of the Segment Unit can be used for data repair in one disk at the same time.
3. And the successfully registered data Segment Unit generates a list according to the Page to be copied, and the QOS controller is required to apply for the repair data volume before sending a Page repair request to the main node every time. So that the amount of repair data to send a Page to the master node, i.e., the number of Page pages repaired at a time, can be determined.
4. The QOS controller collects the throughput of the disk and the idle rate of the disk to set the repair data volume which can be repaired currently, wherein the throughput of the disk refers to the flow of disk I/O (input/output) per second, namely the size of data written in the disk and read out of the disk.
a) If external reading and writing exist, the repair data volume applied by the Segment Unit for the first time is always a preset value, the normal reading and writing outside is not affected by repairing by using the value, and in the specific implementation, the preset value of the data repair volume applied by the Segment Unit for the first time can be set by a user according to the requirement, the preset value is related to the performance of a magnetic disk, and in the embodiment, the preset value can be set to be 1 MB/S;
b) if the data restoration is carried out according to the preset value of the data restoration amount and the disk has more idle rate, the QOS controller can provide more resources for the data restoration, namely, the restoration data amount is increased;
c) if more client read-write requests come in the period, the QOS controller can limit the repair data volume of the later application;
d) in the state of external read-write, the QOS controller controls the disk utilization rate caused by data repair to be maintained within a certain range, in this embodiment, this ratio may be set to 75% so as to avoid affecting the client IO experience, and of course, other values may be set as needed, and generally, the disk utilization rate may not be set to 75% or more without affecting the client IO experience.
e) If no external read-write request exists, the QOS controller can completely use the disk resources for data repair.
5. After one Segment Unit is repaired, the registration is canceled from the QOS controller, and another Segment Unit can be re-registered to continue working.
For example, the throughput of a disk is 200M/s, the minimum data volume provided for data repair is 1M/8K, the repair data volume is applied for the first time, if no client is written in, the repair data volume of 200M/8K is directly provided, otherwise, the repair data volume is 1M/8K; applying for repairing data volume for the second time, according to the utilization rate of the disk, if no client is writing, directly giving the repairing data volume of 200M/8K, if an external client is writing, seeing that the utilization rate of the disk does not exceed 75%, and if the utilization rate is less than or equal to 75%, setting the data volume applying for repairing as 1M (1+ (1-disk utilization rate))/8K; if the utilization rate of the disk exceeds 75%, setting the data volume applied for repairing as Max (1M/8K, 1M (1-disk utilization rate)/8K), namely the greater of 1M/8K and 1M (1-disk utilization rate)/8K; in this embodiment, the size of each Page is 8K for explanation, and in specific implementation, the size of one Page may be set according to the user requirement, and may be 16K or another value.
And repeating the algorithm for the second time in the later application, and finally, gradually adjusting the data repair to the repair data volume which is more consistent with the current disk throughput in the process of applying the repair data volume for the first time by the QOS controller so as to maintain a better repair rate.
The functions, if implemented in the form of software functional units and used as a stand-alone product, may be stored in a computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (7)

1. A data repair method for a block storage node is characterized by comprising the following steps: the method comprises the following steps:
s1, the node in fault repair sends a request for starting data repair to the main node, and the main node receives the data repair request and returns the latest log ID to the node in fault repair;
s2, carrying out log synchronization between the node in fault repair and the main node, and marking whether the page of the node in fault repair needs to be repaired according to the synchronized log condition;
if PageRIf the log ID is less than or equal to the ID of the latest log of the main node, the Page is directly discardedRAnd sets PageRIn order to need to repair;
if PageRIf the log ID is greater than the ID of the latest log of the master node, then:
a. if PageRCan cover the whole Page, then PageRThe data corresponding to the log is written into the Page, and the Page is setRTo have completed the repair;
b. if PageRThe log of (1) can not cover the whole Page, and the repair is not completed, then the Page is directly processedRThe log data is lost and the client is informed to show that the repair is completed, and the Page is sentRSetting that the ID of the latest log needing the primary node reaches the MaxLog ID and then performing data correctionAnd wherein, PageRFor the Page of the node in fault repair, Max Log ID is PageRLogID of (1);
and S3, performing data repair on the pages marked as needing repair.
2. The block storage node data repair method of claim 1, wherein: the step S3 includes the steps of:
s31, the node in fault repairing applies for registration to the QOS controller, if no other data segment unit is repaired at present, the node in fault repairing is allowed to register, otherwise, the node is not registered;
s32, the successfully registered failure repairing node applies for repairing data volume to the QOS controller, and the QOS controller determines the repairing data volume distributed to the failure repairing node according to the current disk idle rate and the disk throughput;
and S33, the QOS controller adjusts the repair data volume according to the disk idle rate and the disk throughput.
3. The block storage node data repair method of claim 2, wherein: the step S32 includes the steps of:
a. if the read-write service of the client does not exist currently, the QOS controller sets the repair data volume as the throughput/P of the disk, wherein P is the size of a page;
b. if the client read-write service exists, the QOS controller further judges whether the current disk has idle resources;
c. if no free resources exist, setting the repair data volume as a preset value;
d. if the nodes have idle resources, further judging whether the nodes in fault restoration are applied for restoration for the first time, and if so, setting the restoration data volume of the nodes in fault restoration to be a preset value;
if not, the QOS controller determines the repair data quantity applied at the time according to the repair data quantity applied at the last time and the disk throughput.
4. The block storage node data repair method of claim 3, wherein: and if the current disk has free resources and the node in the fault repairing is not the first application repairing, the QOS controller determines the repairing data volume according to the current disk utilization rate.
5. The block storage node data repair method of claim 4, wherein: if the disk utilization rate is less than or equal to 75%, the applied repair data volume is 1M (1+ (1-disk utilization rate)/P); if the disk utilization ratio is more than 75%, the applied repair data amount = Max (1M/P, 1M (1-disk utilization ratio)/P).
6. The block storage node data repair method of claim 3, wherein: the step S33 includes:
a. if the data is repaired according to the preset repair data volume and the disk idle rate is more than or equal to 50 percent, increasing the repair data volume;
b. if the client requests an increase during repair, the QOS controller limits the amount of repair data for the subsequent application.
7. A storage medium, characterized by: stored thereon a computer program which, when executed, carries out the method of any one of claims 1 to 6.
CN202010588697.4A 2020-06-24 2020-06-24 Block storage node data restoration method and storage medium Active CN111488238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010588697.4A CN111488238B (en) 2020-06-24 2020-06-24 Block storage node data restoration method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010588697.4A CN111488238B (en) 2020-06-24 2020-06-24 Block storage node data restoration method and storage medium

Publications (2)

Publication Number Publication Date
CN111488238A CN111488238A (en) 2020-08-04
CN111488238B true CN111488238B (en) 2020-09-18

Family

ID=71813531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010588697.4A Active CN111488238B (en) 2020-06-24 2020-06-24 Block storage node data restoration method and storage medium

Country Status (1)

Country Link
CN (1) CN111488238B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112506710B (en) * 2020-12-16 2024-02-23 深信服科技股份有限公司 Distributed file system data restoration method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1196057A (en) * 1997-09-19 1999-04-09 Fujitsu Ltd Device and method for batch-changing names of tree structure
CN103546579B (en) * 2013-11-07 2017-01-04 陈靓 A kind of data logging improves the method for distributed memory system availability
CN103761161B (en) * 2013-12-31 2017-01-04 华为技术有限公司 Recover the method for data, server and system
CN106776130B (en) * 2016-11-30 2020-07-28 华为技术有限公司 Log recovery method, storage device and storage node

Also Published As

Publication number Publication date
CN111488238A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
US11360854B2 (en) Storage cluster configuration change method, storage cluster, and computer system
JP6404907B2 (en) Efficient read replica
US6823349B1 (en) Method and system for establishing, maintaining, and using a persistent fracture log
US7389312B2 (en) Mirroring network data to establish virtual storage area network
US8069218B1 (en) System, method and computer program product for process migration with planned minimized down-time
US8127174B1 (en) Method and apparatus for performing transparent in-memory checkpointing
US8495319B2 (en) Data processing system providing remote copy in a system having first, second and third storage systems and establishing first, second and third copy pairs
CN106776130B (en) Log recovery method, storage device and storage node
JP7050955B2 (en) Prioritize storage of shared blockchain data
US20050055523A1 (en) Data processing system
WO2001013235A9 (en) Remote mirroring system, device, and method
CN111131451A (en) Service processing system and service processing method
US20090292891A1 (en) Memory-mirroring control apparatus and memory-mirroring control method
WO2021139571A1 (en) Data storage method, apparatus, and system and data reading method, apparatus, and system in storage system
WO2021249335A1 (en) Input/output system applied to network security defense system
CN116680256B (en) Database node upgrading method and device and computer equipment
CN116107516B (en) Data writing method and device, solid state disk, electronic equipment and storage medium
US20090282204A1 (en) Method and apparatus for backing up storage system data
CN111488238B (en) Block storage node data restoration method and storage medium
CN110442601B (en) Openstack mirror image data parallel acceleration method and device
CN111240899B (en) State machine copying method, device, system and storage medium
CN113204424A (en) Method and device for optimizing Raft cluster and storage medium
US20020116659A1 (en) Fault tolerant storage system and method
CN113326006A (en) Distributed block storage system based on erasure codes
US20040160975A1 (en) Multicast communication protocols, systems and methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant