CN108595287B - Data truncation method and device based on erasure codes - Google Patents

Data truncation method and device based on erasure codes Download PDF

Info

Publication number
CN108595287B
CN108595287B CN201810393095.6A CN201810393095A CN108595287B CN 108595287 B CN108595287 B CN 108595287B CN 201810393095 A CN201810393095 A CN 201810393095A CN 108595287 B CN108595287 B CN 108595287B
Authority
CN
China
Prior art keywords
data
storage node
truncation
log
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810393095.6A
Other languages
Chinese (zh)
Other versions
CN108595287A (en
Inventor
王文锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Technologies Co Ltd Chengdu Branch
Original Assignee
New H3C Technologies Co Ltd Chengdu Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd Chengdu Branch filed Critical New H3C Technologies Co Ltd Chengdu Branch
Priority to CN201810393095.6A priority Critical patent/CN108595287B/en
Publication of CN108595287A publication Critical patent/CN108595287A/en
Application granted granted Critical
Publication of CN108595287B publication Critical patent/CN108595287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum

Abstract

The invention relates to the technical field of distributed storage, and provides a data truncation method and device based on erasure codes, wherein the method comprises the following steps: receiving a data truncation command sent by a client, wherein the data truncation command is generated by the client in response to a file truncation request so as to truncate data to be truncated and stored in a storage node; judging whether a data maintenance unit to which data to be intercepted belongs has a data write transaction which is being processed, if so, waiting for the data write transaction which is being processed to be processed to finish executing a data interception command; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs. The invention avoids the backup and deletion operation of partial data in the data to be truncated by ensuring that no data write transaction is executed before the truncation command is executed, solves the problem of write amplification during data truncation, and improves the write performance of the distributed storage system.

Description

Data truncation method and device based on erasure codes
Technical Field
The invention relates to the technical field of distributed storage, in particular to a data truncation method and device based on erasure codes.
Background
In distributed storage systems, erasure codes have been applied to object storage, block storage, and file system storage. In the service processing process, erasure codes need to perform distributed fault-tolerant processing on all write operations (including truncation operations), so that data is prevented from being damaged by node faults. However, since the effect of other write transactions on the truncating operation needs to be considered, additional fault tolerance processing must be performed on the truncating operation, resulting in amplification of the truncating operation. The amplification of the truncation operation comprises the backup and deletion of partial data in the data to be truncated, so that the performance of erasure codes is influenced, and the writing performance of the distributed storage system is reduced.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a data truncation method and apparatus based on erasure codes, so as to solve the problem that the truncation operation is amplified due to the additional fault-tolerant processing performed on the truncation operation.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, an embodiment of the present invention provides a data truncation method based on erasure codes, which is applied to a storage node in a distributed storage system, where the storage node is in communication connection with a client, and the method includes: receiving a data truncation command sent by a client, wherein the data truncation command is generated by the client in response to a file truncation request so as to truncate data to be truncated and stored in a storage node; judging whether a data maintenance unit to which data to be intercepted belongs has a data write transaction in process, if so, waiting for the data write transaction in process to be processed to be completed and then executing a data interception command; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs.
In a second aspect, an embodiment of the present invention further provides a data truncation device based on erasure codes, which is applied to a storage node in a distributed storage system, where the storage node is in communication connection with a client, and the device includes a receiving module, a determining module, and a data truncation module. The receiving module is used for generating a data truncation command by a client in response to a file truncation request so as to truncate data to be truncated, wherein the data to be truncated is stored in the storage node; the judging module is used for judging whether the data maintenance unit to which the data to be intercepted belongs has the data write transaction which is being processed; the data truncation module is used for waiting for the data write transaction which is processed to be completed and then executing the data truncation command when the data write transaction which is processed exists on the data maintenance unit to which the data to be truncated belongs; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs.
Compared with the prior art, the data truncation method and device based on the erasure codes provided by the embodiment of the invention have the advantages that firstly, a storage node receives a data truncation command sent by a client, wherein the data truncation command is generated by the client responding to a file truncation request so as to truncate data to be truncated and stored in the storage node; then, judging whether the data maintenance unit to which the data to be intercepted belongs has a data write transaction which is being processed, if so, waiting for the data write transaction which is being processed to be completed and then executing a data interception command; and finally, executing the data truncation command when no data write transaction in progress exists on the data maintenance unit to which the data to be truncated belongs. Compared with the prior art, the embodiment of the invention has no executing data write transaction before the execution of the data truncation command, so that the influence of the executing data write transaction on the data truncation command is not required to be considered, and the backup and deletion operations on partial data in the data to be truncated are not required, thereby solving the problem of write amplification during data truncation and improving the write performance of the distributed storage system.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic view illustrating an application scenario of a data truncation method based on erasure codes according to an embodiment of the present invention.
Fig. 2 is a block diagram illustrating a storage node according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating an erasure code-based data truncation method according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating a storage node exception recovery processing method according to an embodiment of the present invention.
Fig. 5 is a flowchart illustrating the sub-steps of step S105 in fig. 4.
Fig. 6 is a block diagram illustrating an erasure code-based data truncation apparatus according to an embodiment of the present invention.
Icon: 10, 30, 40-storage nodes; 20-a client; 101-a memory; 102-a memory controller; 103-a processor; 200-erasure code based data truncation means; 201-a receiving module; 202-a judging module; 203-a data truncation module; 204-a determination module; 205-control module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a schematic view illustrating an application scenario of a data truncation method based on erasure codes according to an embodiment of the present invention. The distributed storage system is in communication connection with the client 20, the distributed storage system includes a plurality of storage nodes, and the distributed storage system dispersedly stores data that needs to be stored and is sent to the distributed storage system by the client 20 on the plurality of storage nodes, where the plurality of storage nodes may be storage nodes 10, storage nodes 30, storage nodes 40, and the like, where any one of the plurality of storage nodes may be a master storage node, and the rest of the plurality of storage nodes except the master storage node may be slave storage nodes, that is, the master storage node in fig. 1 may be any one of the storage nodes 10, the storage nodes 30, the storage nodes 40, and the like.
For convenience of description, in the embodiment of the present invention, the storage node 10 is determined as a master storage node, and storage nodes other than the storage node 10, for example, the storage node 30, the storage node 40, and the like are slave storage nodes. The storage node 10 is in communication connection with the client 20 and is in communication connection with the storage nodes 30, the storage nodes 40 and the like from the storage nodes, the user sends a file truncation request to the client 20 so that the client 20 responds to the truncation request and generates a corresponding data truncation command, and the storage node 10 receives the data truncation command sent by the client 20 and controls the storage nodes 30, the storage nodes 40 and the like to truncate the data to be truncated corresponding to the data from the storage nodes. When at least one of the storage nodes 10, 30, 40, etc. is abnormally recovered, the storage node 10 first determines a storage node to be synchronized, which needs to execute a data truncation command, and then controls the storage node to be synchronized to execute the data truncation command to truncate data to be truncated corresponding to the storage node to be synchronized.
Referring to fig. 2, fig. 2 is a block diagram illustrating a storage node 10 according to an embodiment of the present invention, where the storage node 10 may be, but is not limited to, a Personal Computer (PC), a server, a storage array, and the like. The operating system of the storage node 10 may be, but is not limited to, a Windows system, a Linux system, a Unix system, etc. The storage node 10 includes an erasure code-based data truncation device 200, a memory 101, a memory controller 102, and a processor 103.
The memory 101, memory controller 102, and processor 103 are electrically connected to each other directly or indirectly to enable data transfer or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The erasure code-based data truncation device 200 includes at least one software functional module that may be stored in the memory 101 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the storage node 10. The processor 103 is used for executing executable modules stored in the memory 101, such as software functional modules and computer programs included in the erasure code-based data truncation device 200.
The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 101 is configured to store a program, and the processor 103 executes the program after receiving the execution instruction.
The processor 103 may be an integrated circuit chip having signal processing capabilities. The Processor 103 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), a voice Processor, a video Processor, and the like; but may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor 103 may be any conventional processor or the like.
First embodiment
The distributed storage system dispersedly stores data which is sent by a user through a client and needs to be stored on a plurality of storage nodes, in order to enhance the reliability, capacity expansion and capacity reduction capability of the distributed storage system, the distributed storage system is required to be capable of easily migrating the data from one storage node to another storage node, in order to facilitate the migration of the data among the storage nodes, the data which the user needs to store is organized, migrated and maintained by taking a data maintenance unit as a unit, and the recovery of the data after the abnormal recovery of the storage nodes is also recovered by taking the data maintenance unit as a unit. The data required to be stored by the user is firstly divided into a plurality of data maintenance units, and each data maintenance unit is dispersedly stored on different storage nodes. Therefore, the storage node is an indispensable component in the distributed storage system, and the data maintenance unit is also an indispensable concept in the distributed storage system. However, the names of the Storage nodes and the data maintenance units may be different for different specific distributed Storage systems, for example, in one embodiment, the distributed Storage system may be a Ceph system (an open source distributed Storage system), the Storage nodes may be Object-based Storage Device (OSD) nodes, and the data maintenance units may be Placement Groups (PGs). For another embodiment, the distributed Storage system may be a fusion Storage system (huaye distributed Storage system), the Storage nodes may be Object-based Storage Device (OSD) nodes, and the data maintenance unit may be data fragmentation (Partition). For the sake of clearly describing the flow of the erasure code-based data truncation method, in the first embodiment, a specific distributed storage system Ceph is taken as an example for explanation.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for data truncation based on erasure codes according to an embodiment of the present invention. The method for truncating data based on erasure codes comprises the following steps:
and step S101, receiving a data truncation command sent by the client, wherein the data truncation command is generated by the client in response to the file truncation request so as to truncate the data to be truncated stored in the storage node.
In the embodiment of the present invention, taking the distributed storage system Ceph as an example, the storage nodes in the distributed storage system are equivalent to the OSD nodes in the Ceph, and the storage nodes 10, 30, and 40 are equivalent to the OSD nodes 30 and 40 in the Ceph, respectively. The data maintenance unit in the distributed storage system corresponds to a PG in Ceph, and the processing procedure of step S101 may be: first, the client 20 responds to a file truncation request sent by a user, where the file truncation request includes a name of a file to be truncated and truncation position information thereof, and since each file is divided into one or more objects by the client 20 when being stored, the objects are stored in a plurality of OSD nodes such as the corresponding OSD node 10, OSD node 30, and OSD node 40 according to a predetermined policy. The client 20 locates an object ID (IDentity) and truncation position information in the object, where the corresponding to-be-truncated data belongs, according to the file name to be truncated and the truncation position information thereof, where the object ID is a unique number of the object in Ceph. Then, the client 20 obtains a PGID (place Group IDentity number) corresponding to the object through a preset Hash algorithm according to the object ID, where the PGID is a unique number of the PG in the Ceph, determines a serial number of the OSD node 10 storing the PGID through a crush (controlled Replication Under Scalable hashing) algorithm and a preset storage rule according to the PGID, and encapsulates the object ID and truncation position information in the object into a data truncation command and sends the data truncation command to the OSD node 10, so that the OSD node 10 controls a plurality of OSD nodes such as the OSD node 10, the OSD node 30, and the OSD node 40 storing the object to truncate the data to be truncated stored in the respective OSD node.
Step S102, judging whether the data maintenance unit to which the data to be intercepted belongs has the data write transaction which is being processed.
In the embodiment of the present invention, taking a distributed storage system Ceph as an example, a data maintenance unit in the distributed storage system is equivalent to a PG in the Ceph, the PG is a unit for organizing, migrating, and maintaining data stored in the Ceph, and is also used for organizing and mapping positions of storage of objects, and one PG is responsible for organizing a plurality of objects, but one object can only be mapped to one PG. In Ceph, a data write operation for an object often needs to be decomposed into a series of write sub-operations executed in a predetermined order, a data write transaction refers to packaging the data write sub-operations into a data write transaction for processing, so as to ensure atomicity, consistency, isolation and persistence of the data write operation, wherein atomicity refers to that operations in the data write transaction are either completely done or not done, consistency refers to that data to be processed is consistent before the data write transaction is executed, the processed data is also consistent after the data write transaction is executed, the isolation requires that a system must ensure that the data write transaction is not affected by other concurrently executed transactions, and persistence refers to that changes to the data after the data write transaction are processed are permanent, and the changes are not lost even if the Ceph encounters an exception. The write data of the write transaction being processed and the data to be truncated belong to the same PG, and the processing of the write transaction will affect the execution of the data truncation command in step S101. The processing procedure of step S102 may be: step S103 is performed when there is a data write transaction being processed on the PG to which the data to be truncated belongs, and step S104 is performed when there is no data write transaction being processed on the PG to which the data to be truncated belongs.
Step S103 waits for the data write transaction being processed to be completed.
In the embodiment of the present invention, since the truncate operation is also a write operation, the data write transaction being processed may include both the write operation being processed and the truncate operation being processed. When the data maintenance unit to which the data to be truncated belongs has the data write transaction which is being processed, the data truncation command is continuously executed after the data write transaction is processed.
It should be noted that, if there are other pending data write transactions after the data write transaction being processed, the pending data write transaction needs to be temporarily blocked so as to prevent the pending data write transaction from being processed after the data write transaction being processed, and when the data write transaction being processed is processed, step S104 is executed first, and then the other pending data write transactions are processed again.
In step S104, a data truncation command is executed.
In the embodiment of the present invention, taking the distributed storage system Ceph as an example, the storage nodes in the distributed storage system are equivalent to OSD nodes in the Ceph, the storage nodes 10, 30, and 40 are equivalent to the OSD nodes 10, 30, and 40 in the Ceph, respectively, and the processing flow of step S104 is as follows: when the OSD node 10 determines that no write transaction is currently processed, first, a log version number corresponding to a data truncation command is generated, where the log version number is a number generated according to a preset rule, and the number is a unique number capable of identifying the log, and the OSD node 10 executes the data truncation command, and simultaneously sends both the log version number and the data truncation command to slave OSD nodes such as an OSD node 30 and an OSD node 40 storing data to be truncated, so that the OSD node 30 and the OSD node 40 execute the data truncation command from the OSD nodes.
As one embodiment, a method of executing a data truncation command may include:
first, the OSD node 10 generates a log version number according to a preset rule. The log version number is the unique number of the operation log of the data truncation command, and each data truncation command corresponds to one log version number.
Next, the OSD node 10 obtains a slave OSD node list storing the data to be truncated corresponding to the object, that is, the slave OSD node list of the OSD node 30, the OSD node 40, and the like, according to the object ID, and sends the log version number, the object ID in the data truncation command, and the truncation position in the object to the slave OSD node 30, the OSD node 40, and the like, so that the OSD node 30, the OSD node 40, and the like execute the data truncation command from the OSD node. Meanwhile, the OSD node 10 performs a data truncation command to truncate the data to be truncated corresponding to the OSD node 10. A piece of Log information corresponding to the data truncation command is recorded in the PGLog of each OSD node after the data truncation command is successfully executed by each OSD node, and the Log information is used for recovering data corresponding to the Log information when at least one of the OSD nodes 10, 30, 40 is abnormally recovered.
When at least one of the plurality of storage nodes is abnormal, wherein the storage node abnormality means that the storage node can no longer provide a safe and effective data storage function for the client 20, and the abnormality may be, but is not limited to, a problem occurring in a process running on the storage node, a problem occurring in a communication module of the storage node, and the like. Before the storage nodes are restored from the abnormal state to the normal state, the data on the plurality of storage nodes are inconsistent, and in order to make the data on the plurality of storage nodes consistent after the abnormal state is restored, the following steps may be adopted, please refer to fig. 4:
step S105, the main storage node acquires data maintenance unit logs of the plurality of storage nodes, and determines a storage node to be synchronized from the plurality of storage nodes according to the data maintenance unit logs of the plurality of storage nodes, where the storage node to be synchronized stores data to be truncated.
In the embodiment of the present invention, the abnormal storage node may be at least one of the storage nodes 10, 30, 40, and the like. The timing for determining the storage nodes to be synchronized is that when at least one of the plurality of storage nodes recovers abnormally, the storage node 10 acquires data maintenance unit logs on the storage nodes such as the storage node 30 and the storage node 40, and determines the storage nodes to be synchronized according to the data maintenance unit logs on the storage nodes such as the storage node 10, the storage node 30 and the storage node 40, wherein the storage nodes to be synchronized store data to be intercepted.
As an embodiment, the method of determining storage nodes to be synchronized may include the following sub-steps, please refer to fig. 5:
substep S1051: and finding out log records which exist in the logs of the data maintenance units and have the largest serial number as authoritative logs.
In the embodiment of the invention, the data maintenance unit log comprises a plurality of log records of write transactions for writing data belonging to the data maintenance unit, wherein the write transactions comprise truncation transactions, when each write transaction is processed, a log record corresponding to the write transaction is written in the data maintenance unit log, each log record has a serial number generated according to a preset rule, and the serial number is a unique serial number for identifying one log record. The authoritative logs represent logs corresponding to data operation commands which are successfully executed on the plurality of storage nodes at the last time, and the data operation commands corresponding to the logs with the smaller number than the authoritative logs are the data operation commands which are successfully executed on the plurality of storage nodes before the authoritative logs. Taking a distributed storage system Ceph as an example, a storage node in the distributed storage system is equivalent to an OSD node in the Ceph, a storage node 10, a storage node 30, and a storage node 40 are respectively equivalent to the OSD node 30 and the OSD node 40 in the Ceph, a data maintenance unit and a data maintenance unit log in the distributed storage system are respectively equivalent to PG and PGLog in the Ceph, and the number of each log record in the data maintenance unit log is equivalent to the log version number of each log record in the PGLog in the Ceph. The processing procedure of the sub-step S1051 may be: first, the OSD node 10 acquires PGLog on the OSD nodes such as the OSD node 30 and the OSD node 40. Under the condition that a plurality of OSD nodes such as the OSD node 10, the OSD node 30 and the OSD node 40 are normal, logs recorded in PGLog on each OSD node are consistent, and only when at least one of the OSD nodes is abnormal, the logs recorded in PGLog on each OSD node are inconsistent. For example, Ceph includes three OSD nodes, which are respectively denoted as: OSD node No. 1, OSD node No. 2 and OSD node No. 3. The log version number 8 corresponds to a write operation command for writing data, a log record corresponding to the version number 8 is recorded in the PGLog on the OSD node No. 1 after the write operation command is successfully executed on the OSD node No. 1, a log record corresponding to the version number 8 is recorded in the PGLog on the OSD node No. 2 after the write operation command is successfully executed on the OSD node No. 2, a log record corresponding to the version number 8 is recorded in the PGLog on the OSD node No. 3 after the write operation command is successfully executed on the OSD node No. 3, and at this time, the log records in the PGLog on the OSD node No. 1, the OSD node No. 2 and the OSD node No. 3 are identical. The log version number 9 corresponds to an intercepting operation command, the intercepting operation command is successfully executed on the OSD node 1 and the OSD node 3, log records corresponding to the log version number 9 are recorded in PGLog on the OSD node 1 and the OSD node 3, and when the intercepting operation command is executed on the OSD node 2, the OSD node 2 is abnormal, so that no log record corresponding to the log version number 9 is recorded on the OSD node 2, the log records recorded in PGLog on the OSD node 2 are not consistent with the log records recorded in PGLog on the OSD node 1 and the OSD node 3, and the log records with the largest number in PGLog records on the OSD node 1, the OSD node 2, and the OSD node 3 are log records corresponding to the log 8, and the log record corresponding to the log version number 8 is taken as an authoritative log.
Substep S1052: and when the next log of the authority log is the log of the data truncation operation, determining the log of the data truncation operation as the truncation log.
In the embodiment of the present invention, the log of the data truncation operation is a log record recorded in the data maintenance unit log after the data truncation command is successfully executed. For example, Ceph includes three OSD nodes, which are respectively denoted as: OSD node No. 1, OSD node No. 2 and OSD node No. 3. The log version number 8 corresponds to a write operation command for writing data, and after the write operation command is successfully executed on the OSD node No. 1, the OSD node No. 2, and the OSD node No. 3, a log record corresponding to the version number 8 is recorded in the PGLog on the OSD node No. 1, the PGLog on the OSD node No. 2, and the PGLog on the OSD node No. 3. The log version number 9 corresponds to an intercepting operation command, the intercepting operation command is successfully executed on the OSD node 1 and the OSD node 3, log records corresponding to the log version number 9 are recorded in PGLog on the OSD node 1 and the OSD node 3, when the intercepting operation command is executed on the OSD node 2, the OSD node 2 is abnormal, therefore, no log record corresponding to the log version number 9 is recorded on the OSD node 2, the log record with the largest number exists in PGLog on the OSD node 1, PGLog on the OSD node 2 and PGLog on the OSD node 3 is a record corresponding to the log version number 8, the log record corresponding to the log version number 8 is used as an authoritative log, the next log of the authoritative log is a log record corresponding to the log version number 9, the log record corresponding to the log version number 9 is recorded when the intercepting operation command is executed, and if the log is the log of the data truncation operation, the log corresponding to the log version number 9 is recorded as a truncation log.
Substep S1053: and taking the storage node which does not comprise the truncated log in the logs of the plurality of data maintenance units as the storage node to be synchronized.
In the embodiment of the present invention, the data maintenance unit corresponds to one data maintenance unit log on each storage node, wherein a storage node that does not include a truncation log is a storage node to be synchronized, and the storage node to be synchronized is a storage node that does not execute a truncation command when an exception occurs in the storage node, that is, the storage node to be synchronized stores data to be truncated. For example, Ceph includes three OSD nodes, which are respectively denoted as: OSD node No. 1, OSD node No. 2 and OSD node No. 3. Log version numbers recorded in the PGLog on OSD node No. 1 are 7, 8, and 9, log version numbers recorded in the PGLog on OSD node No. 2 are 7 and 8, log version numbers recorded in the PGLog on OSD node No. 3 are 7, 8, and 9, and at this time, the log with log version number 8 is the log with log version number 9 next to the authoritative log version number 8, the log with the log version number of 9 is a log of data truncation operation, the log with the log version number of 9 is a truncation log, the storage node which does not comprise the truncation log is an OSD node No. 2, the OSD node No. 2 is a storage node to be synchronized, namely, the data truncation command corresponding to the log with the log version number of 9 is successfully executed on the OSD node No. 1 and the OSD node No. 3, and the OSD node No. 2 is a storage node where the truncation command is not executed, and data to be truncated is stored thereon.
It should be noted that, if the next log of the authority log is not the truncated log, according to the processing in the prior art, the data rollback operation is performed according to the next log of the authority log, that is, the data is restored to the state before the operation command corresponding to the next log of the authority log is executed.
And step S106, the main storage node controls the storage node to be synchronized to execute the data truncation command so as to truncate the data to be truncated stored on the storage node to be synchronized.
In the embodiment of the present invention, the abnormal storage node may be a master storage node, that is, the storage node 10 in fig. 1, or may be a slave storage node, that is, at least one of the plurality of slave storage nodes, that is, the storage node 30 and the storage node 40 in fig. 1, when the storage node to be synchronized is a slave storage node, the master storage node sends a data truncation command to the storage node to be synchronized, so that the storage node to be synchronized truncates the data to be truncated corresponding to the storage node to be synchronized, it is ensured that the data on the plurality of storage nodes, that is, the storage node 10, the storage node 30 and the storage node 40, after the data truncation command is executed, are consistent, when the abnormal storage node is the master storage node, the master storage node executes the data truncation command to truncate the data corresponding to the master storage node, and ensures that the storage node 10 and the storage node 30 after the data truncation command is executed, The data on a plurality of storage nodes such as the storage node 40 is consistent.
In the embodiment of the present invention, it should be noted that step S105 and sub-steps S1051, S1052, S1053, and S106 thereof are all executed when the storage node in which the exception occurs recovers the exception and before the storage node can provide external services. In order to make data on a plurality of storage nodes consistent after abnormal storage nodes are recovered abnormally, the method adopted in the prior art is as follows: compared with the prior art, in the embodiment of the invention, as the abnormal storage node is recovered in an abnormal way, only the storage node to be synchronized needs to be determined first and the data truncation command needs to be re-executed on the storage node to be synchronized, the data rollback does not need to be carried out, and the data truncation command does not need to be re-executed on other storage nodes except the storage node to be synchronized, so that the operation of re-executing the data truncation command when the abnormal storage node is recovered in an abnormal way is reduced, and the fault recovery time of the abnormal storage node is shortened.
In the embodiment of the present invention, first, a data truncation command sent by the client 20 is received, where the data truncation command is generated by the client 20 in response to a file truncation request to truncate data to be truncated stored in the storage node; secondly, judging whether the data maintenance unit to which the data to be intercepted belongs has the data write transaction which is being processed or not, and finally, when the data maintenance unit to which the data to be intercepted belongs has the data write transaction which is being processed, waiting for the data write transaction which is being processed to be completed and then executing the data interception command; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs. Because the data write transaction which is being executed is not generated before the data truncation command is executed, the influence of the data write transaction which is being executed on the data truncation command is not required to be considered, and therefore the backup and deletion operation of partial data in the data to be truncated is not required, the problem of write amplification during data truncation is solved, and the write performance of the distributed storage system is improved.
Second embodiment
Referring to fig. 6, fig. 6 is a block diagram illustrating an erasure code-based data truncation apparatus 200 according to an embodiment of the present invention. The erasure code-based data truncation apparatus 200 is applied to a storage node in a distributed storage system, and includes a receiving module 201, a determining module 202, a data truncation module 203, a determining module 204, and a control module 205.
The receiving module 201 is configured to receive a data truncation command sent by a client, where the data truncation command is generated by the client in response to a file truncation request to truncate data to be truncated stored in a storage node.
In this embodiment of the present invention, the receiving module 201 may be configured to execute step S101.
The determining module 202 is configured to determine whether there is a data write transaction being processed on a data maintenance unit to which the data to be truncated belongs.
In this embodiment of the present invention, the determining module 202 may be configured to execute step S102.
The data truncation module 203 is configured to wait for the data write transaction being processed to complete the data truncation command when the data maintenance unit to which the data to be truncated belongs has the data write transaction being processed; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs.
In this embodiment of the present invention, the data truncation module 203 may be configured to perform steps S103 to S104.
The determining module 204 is configured to obtain data maintenance unit logs of the plurality of storage nodes, and determine a storage node to be synchronized from the plurality of storage nodes according to the data maintenance unit logs of the plurality of storage nodes, where the storage node to be synchronized stores data to be truncated.
In an embodiment of the present invention, the determining module 204 may be configured to perform step S105 and sub-steps S1051-S1053 thereof.
And the control module 205 is configured to control the storage node to be synchronized to execute a data truncation command to truncate the data to be truncated stored on the storage node to be synchronized.
In this embodiment of the present invention, the control module 205 may be configured to execute step S106.
In summary, the present invention provides a data truncation method and apparatus based on erasure codes, where the method includes: receiving a data truncation command sent by a client, wherein the data truncation command is generated by the client in response to a file truncation request so as to truncate data to be truncated and stored in a storage node; judging whether a data maintenance unit to which data to be intercepted belongs has a data write transaction in process, if so, waiting for the data write transaction in process to be processed to be completed and then executing a data interception command; the data truncation command is executed when there are no pending data write transactions on the data maintenance unit to which the data to be truncated belongs. Compared with the prior art, the method and the device ensure that no data write transaction is executed before data truncation operation, thereby avoiding backup and deletion operation of partial data in the data truncation operation, solving the problem of write amplification during data truncation, and improving the write performance of the distributed storage system.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims (10)

1. A data truncation method based on erasure codes is applied to storage nodes in a distributed storage system, wherein the storage nodes are in communication connection with clients, and the method comprises the following steps:
receiving a data truncation command sent by the client, wherein the data truncation command is generated by the client in response to a file truncation request so as to truncate the data to be truncated stored in the storage node;
judging whether a data maintenance unit to which the data to be intercepted belongs has a data write transaction which is being processed, if so, waiting for the data write transaction which is being processed to be completed and then executing the data interception command, wherein the data write transaction which is being processed comprises a write operation which is being processed;
and when no data write transaction in progress exists on the data maintenance unit to which the data to be truncated belongs, executing the data truncation command.
2. The method of claim 1, wherein the storage node is a plurality of storage nodes, the plurality of storage nodes are communicatively coupled, each storage node maintains a data maintenance unit log, the plurality of storage nodes include a master storage node and a slave storage node, the method further comprising:
the main storage node acquires data maintenance unit logs of a plurality of storage nodes, and determines a storage node to be synchronized from the plurality of storage nodes according to the data maintenance unit logs of the plurality of storage nodes, wherein the storage node to be synchronized stores data to be intercepted;
and the main storage node controls the storage node to be synchronized to execute the data truncation command so as to truncate the data to be truncated stored on the storage node to be synchronized.
3. The method of claim 2, wherein the data maintenance unit log comprises a plurality of log records, each log record comprises a number generated according to a preset rule, and the step of determining the storage node to be synchronized from the plurality of storage nodes according to the data maintenance unit log of the plurality of storage nodes comprises:
finding out log records which exist in the logs of the data maintenance units and have the largest serial number as authoritative logs;
when the next log of the authoritative logs is a log of data truncation operation, determining the log of the data truncation operation as a truncation log;
and taking the storage node which does not comprise the truncation log in the logs of the plurality of data maintenance units as the storage node to be synchronized.
4. The method of claim 2, wherein when the storage node to be synchronized is the slave storage node, the step of the master storage node controlling the storage node to be synchronized to execute the data truncation command to truncate the data to be truncated stored on the storage node to be synchronized comprises:
and the main storage node sends the data truncation command to the storage node to be synchronized so as to truncate the data to be truncated stored on the storage node to be synchronized.
5. The method of claim 2, wherein when the storage node to be synchronized is the primary storage node, the step of the primary storage node controlling the storage node to be synchronized to execute the data truncate command to truncate the data to be truncated stored on the storage node to be synchronized comprises:
the main storage node executes the data truncation command to truncate the data to be truncated stored on the storage node to be synchronized.
6. An erasure code-based data truncation apparatus, applied to a storage node in a distributed storage system, wherein the storage node is in communication connection with a client, the apparatus comprising:
the receiving module is used for receiving a data truncation command sent by the client, wherein the data truncation command is generated by the client responding to a file truncation request so as to truncate the data to be truncated stored in the storage node;
the judging module is used for judging whether the data maintenance unit to which the data to be intercepted belongs has the data write transaction which is being processed;
the data truncation module is used for waiting for the completion of the processing of the data write transaction and then executing the data truncation command when the data write transaction which is being processed exists on the data maintenance unit to which the data to be truncated belongs; and when no data write transaction in process exists on the data maintenance unit to which the data to be truncated belongs, executing the data truncation command, wherein the data write transaction in process comprises a write operation in process.
7. The apparatus of claim 6, wherein the storage node is a plurality of storage nodes, the plurality of storage nodes are communicatively coupled, each storage node maintains a data maintenance unit log, the plurality of storage nodes comprise a master storage node and a slave storage node, the apparatus further comprising:
the determining module is used for acquiring data maintenance unit logs of a plurality of storage nodes and determining a storage node to be synchronized from the plurality of storage nodes according to the data maintenance unit logs of the plurality of storage nodes, wherein the storage node to be synchronized stores data to be intercepted;
and the control module is used for controlling the storage node to be synchronized to execute the data truncation command so as to truncate the data to be truncated stored on the storage node to be synchronized.
8. The apparatus of claim 7, wherein the log of the data maintenance unit includes a plurality of log records, each log record includes a number generated according to a preset rule, and the determining module is further configured to:
finding out log records which exist in logs of a plurality of data maintenance units and have the largest serial number as authoritative logs;
when the next log of the authoritative logs is a log of data truncation operation, determining the log of the data truncation operation as a truncation log;
and taking the storage nodes which do not comprise the truncated logs in the logs of the plurality of data maintenance units as the storage nodes to be synchronized.
9. The apparatus of claim 7, wherein when the storage node to be synchronized is the slave storage node, the master storage node sends the data truncation command to the slave storage node to truncate the data to be truncated stored on the slave storage node.
10. The apparatus of claim 7, wherein when the storage node to be synchronized is the primary storage node, the primary storage node executes the data truncate command to truncate the data to be truncated stored on the primary storage node.
CN201810393095.6A 2018-04-27 2018-04-27 Data truncation method and device based on erasure codes Active CN108595287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810393095.6A CN108595287B (en) 2018-04-27 2018-04-27 Data truncation method and device based on erasure codes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810393095.6A CN108595287B (en) 2018-04-27 2018-04-27 Data truncation method and device based on erasure codes

Publications (2)

Publication Number Publication Date
CN108595287A CN108595287A (en) 2018-09-28
CN108595287B true CN108595287B (en) 2021-11-05

Family

ID=63610306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810393095.6A Active CN108595287B (en) 2018-04-27 2018-04-27 Data truncation method and device based on erasure codes

Country Status (1)

Country Link
CN (1) CN108595287B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522283B (en) * 2018-10-30 2021-09-21 深圳先进技术研究院 Method and system for deleting repeated data
CN111737220B (en) * 2020-05-29 2023-01-06 苏州浪潮智能科技有限公司 Optimization method and system for truncation operation in distributed file storage system
CN113391946B (en) * 2021-05-25 2022-06-17 杭州电子科技大学 Coding and decoding method for erasure codes in distributed storage

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256526A (en) * 2008-03-10 2008-09-03 清华大学 Method for implementing document condition compatibility maintenance in inspection point fault-tolerant technique
CN101447851A (en) * 2007-11-26 2009-06-03 清华大学 Generation method of quasi-cyclic low-density parity check codes
CN102541668A (en) * 2011-12-05 2012-07-04 清华大学 Method for analyzing reliability of flash file system
US8417982B1 (en) * 2009-08-11 2013-04-09 Marvell Israel (M.I.S.L.) Ltd. Dual clock first-in first-out (FIFO) memory system
CN103577329A (en) * 2013-10-18 2014-02-12 华为技术有限公司 Snapshot management method and device
CN103870352A (en) * 2012-12-10 2014-06-18 财团法人工业技术研究院 Method and system for data storage and reconstruction
CN104462403A (en) * 2014-12-11 2015-03-25 华为技术有限公司 File intercepting method and device
CN106020975A (en) * 2016-05-13 2016-10-12 华为技术有限公司 Data operation method, device and system
CN106599323A (en) * 2017-01-03 2017-04-26 北京百度网讯科技有限公司 Method and apparatus for realizing distributed pipeline in distributed file system
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN107122133A (en) * 2017-04-24 2017-09-01 珠海全志科技股份有限公司 Date storage method and device
CN107168816A (en) * 2016-03-07 2017-09-15 北京忆恒创源科技有限公司 ECC Frame Size Adjustments method and its device
CN107229694A (en) * 2017-05-22 2017-10-03 北京红马传媒文化发展有限公司 A kind of data message consistency processing method, system and device based on big data
CN107608820A (en) * 2017-09-26 2018-01-19 郑州云海信息技术有限公司 A kind of file wiring method and relevant apparatus based on correcting and eleting codes

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9619487B2 (en) * 2012-06-18 2017-04-11 International Business Machines Corporation Method and system for the normalization, filtering and securing of associated metadata information on file objects deposited into an object store
US9405783B2 (en) * 2013-10-02 2016-08-02 Netapp, Inc. Extent hashing technique for distributed storage architecture
US9021296B1 (en) * 2013-10-18 2015-04-28 Hitachi Data Systems Engineering UK Limited Independent data integrity and redundancy recovery in a storage system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101447851A (en) * 2007-11-26 2009-06-03 清华大学 Generation method of quasi-cyclic low-density parity check codes
CN101256526A (en) * 2008-03-10 2008-09-03 清华大学 Method for implementing document condition compatibility maintenance in inspection point fault-tolerant technique
US8417982B1 (en) * 2009-08-11 2013-04-09 Marvell Israel (M.I.S.L.) Ltd. Dual clock first-in first-out (FIFO) memory system
CN102541668A (en) * 2011-12-05 2012-07-04 清华大学 Method for analyzing reliability of flash file system
CN103870352A (en) * 2012-12-10 2014-06-18 财团法人工业技术研究院 Method and system for data storage and reconstruction
CN103577329A (en) * 2013-10-18 2014-02-12 华为技术有限公司 Snapshot management method and device
CN104462403A (en) * 2014-12-11 2015-03-25 华为技术有限公司 File intercepting method and device
CN107168816A (en) * 2016-03-07 2017-09-15 北京忆恒创源科技有限公司 ECC Frame Size Adjustments method and its device
CN106020975A (en) * 2016-05-13 2016-10-12 华为技术有限公司 Data operation method, device and system
CN106599323A (en) * 2017-01-03 2017-04-26 北京百度网讯科技有限公司 Method and apparatus for realizing distributed pipeline in distributed file system
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN107122133A (en) * 2017-04-24 2017-09-01 珠海全志科技股份有限公司 Date storage method and device
CN107229694A (en) * 2017-05-22 2017-10-03 北京红马传媒文化发展有限公司 A kind of data message consistency processing method, system and device based on big data
CN107608820A (en) * 2017-09-26 2018-01-19 郑州云海信息技术有限公司 A kind of file wiring method and relevant apparatus based on correcting and eleting codes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Improving write operations in MLC phase change memory;Lei Jiang;《IEEE International Symposium on High-Performance Comp Architecture》;20120315;全文 *
基于Hadoop的轴承监测海量数据的存储与管理;马新娜;《信息技术》;20171130;全文 *
基于MooseFS的纠错码存储方法设计和实现;刘海波;《计算机工程与应用》;20170331;全文 *

Also Published As

Publication number Publication date
CN108595287A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
US20220171544A1 (en) Enabling data integrity checking and faster application recovery in synchronous replicated datasets
US10467246B2 (en) Content-based replication of data in scale out system
US8874508B1 (en) Systems and methods for enabling database disaster recovery using replicated volumes
US10346369B2 (en) Retrieving point-in-time copies of a source database for creating virtual databases
EP3179359B1 (en) Data sending method, data receiving method, and storage device
US9588856B2 (en) Restoring redundancy in a storage group when a storage device in the storage group fails
US7739677B1 (en) System and method to prevent data corruption due to split brain in shared data clusters
US10146646B1 (en) Synchronizing RAID configuration changes across storage processors
CN106776130B (en) Log recovery method, storage device and storage node
US8533171B2 (en) Method and system for restarting file lock services at an adoptive node during a network filesystem server migration or failover
CN106776147B (en) Differential data backup method and differential data backup device
US10860447B2 (en) Database cluster architecture based on dual port solid state disk
JP2005301497A (en) Storage management system, restoration method and its program
CN108595287B (en) Data truncation method and device based on erasure codes
US8429359B1 (en) Method and apparatus for dynamically backing up database files
US10503620B1 (en) Parity log with delta bitmap
US8140886B2 (en) Apparatus, system, and method for virtual storage access method volume data set recovery
US20170161150A1 (en) Method and system for efficient replication of files using shared null mappings when having trim operations on files
US10372554B1 (en) Verification and restore of replicated data using a cloud storing chunks of data and a plurality of hashes
JP2015049633A (en) Information processing apparatus, data repair program, and data repair method
CN117178265A (en) Snapshot-based data corruption detection
US9959278B1 (en) Method and system for supporting block-level incremental backups of file system volumes using volume pseudo devices
WO2019221951A1 (en) Parity log with by-pass
CN109582497A (en) One kind being based on the quick emergency starting method of dynamic data increment
US10684918B2 (en) Granular dump backup restart

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant