CN116601609A - Storing data in computer storage - Google Patents

Storing data in computer storage Download PDF

Info

Publication number
CN116601609A
CN116601609A CN202080107869.XA CN202080107869A CN116601609A CN 116601609 A CN116601609 A CN 116601609A CN 202080107869 A CN202080107869 A CN 202080107869A CN 116601609 A CN116601609 A CN 116601609A
Authority
CN
China
Prior art keywords
data
computer storage
storing
copy
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080107869.XA
Other languages
Chinese (zh)
Inventor
亚伦·莫
阿萨夫·纳塔逊
阿维夫·库温特
阿萨夫·耶格尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116601609A publication Critical patent/CN116601609A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data

Abstract

A method for storing data in a computer storage is disclosed. The method comprises the following steps: storing the data in a computer storage; storing erasure codes of the data in the computer storage; storing a copy of the data in the computer storage; based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data. The erasure code of stored data may advantageously enable recovery of the data when it is corrupted. Disadvantageously, however, erasure codes occupy memory space. After storing the copy of the data, the memory storing the erasure code is identified as available for storing other data, advantageously freeing up memory space to store other data.

Description

Storing data in computer storage
Technical Field
The present disclosure relates to storing data in computer storage.
Background
Data stored in a computer storage system may be protected from loss or corruption by erasure coding (erasure coding) or replication (duplication). Erasure coding creates a mathematical function that describes portions of data as a function of other portions of data so that the complete data is recovered when partial loss or corruption occurs. The replication creates additional copies of the data, which may be stored in a different storage device (e.g., a different disk drive) than the original data. One advantage of erasure coding for protecting data is that the memory footprint of erasure codes is typically less than the footprint of the data protected by erasure codes. One advantage of copying for protecting data is that the computational complexity of creating a copy and recovering the data using the copy is relatively low.
Disclosure of Invention
It is an object of the present disclosure to provide a method for storing data in a computer storage, wherein the data is protected from being lost or damaged.
The above and other objects are achieved by the features of the independent claims. Other implementations are apparent in the dependent claims, the description and the drawings.
A first aspect of the present disclosure provides a method for storing data in a computer storage. The method comprises the following steps: storing the data in a computer storage; storing erasure codes of the data in the computer storage; storing a copy of the data in the computer storage; and validating said computer storage storing said erasure codes of said data as available for storing other data based on a result of said storing a copy of said data in said computer storage.
In the above method, data stored in a computer storage is protected by an erasure code (e.g., parity information) of the data. Erasure codes for stored data can advantageously achieve relatively fast protection of data. Thus, the risk of data loss or corruption in a short period of time may be reduced. For example, the erasure code occupies a relatively small memory space, and may be stored in a local storage, i.e., in a data local storage. For example, when data is stored in one or more storage devices by the above-described method, erasure codes may be stored in the same one or more storage devices even if the one or more storage devices have a relatively small spare storage capacity. Thus, by avoiding the need to transfer data to external backup resources and the delays associated therewith, relatively fast protection of the data may be achieved. In addition, because the erasure code can be stored locally, recovery of data using the erasure code can be relatively faster when the data is lost than recovery of data from a remote storage device.
However, recovering data using erasure codes can be relatively computationally complex, and thus has the disadvantage of consuming computing resources of the storage system, e.g., processor time. Thus, the above method further comprises: a copy of the data, i.e., duplicate data, is stored. For example, copies of data may be stored through a periodic data backup process whereby the copy data is stored in a different storage device than the original data. The creation and storage time of the duplicate data may be relatively long compared to erasure codes, for example, because the memory footprint of the duplicate data is relatively large, requiring the duplicate data to be transferred to a suitably large storage device, which may be remote from the storage location of the data. However, duplicate data may advantageously allow for computationally simple recovery of data when a portion of the stored data is lost or corrupted.
Thus, erasure codes of stored data in combination with subsequent copies of the stored data may advantageously provide early protection of the data by erasure codes and enhanced long-term protection of the data by the copy data.
However, after storing a copy of the data, i.e., after copying the data, the erasure code may be considered redundant, as the copy of the data may then be used to recover the data in the event of loss or corruption of the data. At the same time, erasure codes occupy storage space, which can be particularly problematic when erasure codes are stored in storage devices having relatively small storage capacities. Thus, the above method comprises the further step of: based on the results of storing copies of the data in computer storage, computer storage that acknowledges the storage of the erasure code can be used to store other data. In other words, in the above method, once the data is copied and the copy data is stored, the above method includes: the storage space occupied by the validation erasure codes may be covered by new data. In an example, the above method may even further comprise: after storing the replica data, the erasure code is erased from the storage. Thus, by this step of the method described above, the total memory footprint of the data and protection arrangement can advantageously be reduced. However, since this step of the above method is performed based on the result of storing a copy of the data, the above method is secure because the performance of this step depends on the storage of the copy data. Thus, the risk of data having neither erasure codes nor duplicate data is reduced.
In an example, the method may include: two or more of the data, erasure codes, and replica data are stored by a common storage device, e.g., at different locations in the same disk drive. In other examples, two or more of the data, erasure codes, and duplicate data may be stored by mutually different storage devices. For example, the data and erasure codes may be stored in a first set of one or more storage devices (e.g., disk drives) and the duplicate data may be stored in one or more different storage devices (e.g., disk drives).
The method may comprise the steps of: the erasure code, e.g., parity information, is created before the erasure code is stored. Methods for creating erasure codes for data are known to those skilled in the art.
In the context of this specification, unless otherwise indicated, the terms "memory" (e.g., computer memory) and "memory" (e.g., computer memory) are used to generally refer to storage resources for storing data. In this regard, each term is intended to include storage resources (e.g., disk drives, solid state drives, or flash memory) for long-term storage of data as well as storage resources (e.g., random access memory) for short-term storage of data.
In one implementation, the storing the data in computer storage includes: a first segment of the data is stored in a first computer storage device and a second segment of the data is stored in a second computer storage device.
In other words, the above method may include: data is distributed among multiple storage devices, hereinafter referred to as striping. Striping of data may advantageously improve data read/write performance because segments of data may be read from/written to their respective storage devices simultaneously. Furthermore, striping data across multiple devices reduces the proportion of data that would be lost if a subset of the storage devices failed. Therefore, erasure codes have the ability to adapt only to relatively little data that is lost, which is generally acceptable. This may advantageously reduce the complexity of the erasure code, thereby reducing the complexity of generating the erasure code and recovering the data using the erasure code. In an example, the method may include: the data is stored in more than two segments, optionally more than two storage devices. For example, in an example, the above method may include: the data is stored in five or six segments in five or six storage devices. The method may comprise the steps of: the data is partitioned into two or more segments prior to storing the data.
In one implementation, the storing a copy of the data in the computer storage includes: the copy of the data is stored in one or more computer storage devices separate from the computer storage device storing the data and/or separate from the computer storage device storing the erasure code.
In other words, in an example, the above method includes: the replica data is stored in a storage device that is physically separate from at least one of the storage device storing the data and the storage device storing the erasure code. This separation of duplicate data from data and/or erasure codes advantageously reduces the likelihood of losing duplicate data and data or erasure codes when a storage device fails. In an example, the data and erasure codes may be stored in one or more storage devices (e.g., hard disk) while the duplicate data may be stored in one or more separate storage devices.
In one implementation, the method further comprises: location data identifying the location of the copy of the data is stored in the computer storage.
In other words, the location data defines a machine readable address of the replica data in the computer storage. For example, the location data may identify the storage device number and/or offset in which the copy data is located. Thus, the location data may advantageously allow the duplicate data to be easily retrieved from computer storage, which may be useful when the data is lost. For example, the location data may be generated by the memory controller when storing the copy data.
In one implementation, the storing location data in the computer storage that identifies a location of the copy of the data includes: the location data is stored in a computer storage device that stores the copy of the data.
In other words, the location data may be stored in the same storage device as at least a portion of the replica data. An advantage of this arrangement is that the location data can be readily accessed by the memory controller for accessing the copy data. Thus, when a data loss results in the need for duplicate data, the same memory controller can read the location data and retrieve the duplicate data. This may advantageously reduce the time required to recover the data using the duplicate data. For example, the duplicate data may be stored in a single storage device, while the location data may be stored in that same storage device. In this example, the location data may identify an offset in the copy data in the storage device. In another example, the replica data may be distributed across multiple storage devices, and the location data may be stored in one of the multiple devices.
In one implementation, the method further comprises: and storing a flag of the computer storage that marks the storage of the data in the computer storage according to the result of storing the copy of the data in the computer storage.
In other words, the above method may include: after storing the replica data, the location in the store where the data is stored is marked. Thus, the flag may identify where the computer is stored, including all or part of the data that has been protected by replication (i.e., by storing the replica data). Thus, the flag may be used to conveniently confirm whether the data has been copied, i.e., whether the copied data has been stored. One advantage of this is that the memory manager can determine whether the data has been copied simply by reading the flag, so that it can know if the copied data can be retrieved when the data is lost.
In one implementation, the method further comprises: other data is stored in the computer storage that confirms that other data is available for storage. In other words, the method may further include: after copying the data, one or more locations in the computer storage where the erasure codes are stored are overwritten.
In one implementation, the storing the erasure code of the data in the computer storage includes: the erasure code is stored in one or more computer storage devices that store the data. In other words, the erasure code may be stored in the same storage device or devices as the data. In view of the erasure code being generated from data, this arrangement may reduce the time required to generate and store erasure code because the erasure code generator may read from and write to the same storage device or devices.
In one implementation, the storing the erasure code of the data in the computer storage includes: storing a first erasure code and a second erasure code in the computer storage, wherein the first erasure code and the second erasure code are independent of each other; based on the results of storing the copy of the data in the computer storage, validating the computer storage storing the erasure code of the data is available for storing other data including: based on the results of storing a copy of the data in the computer storage, it is confirmed that only the computer storage storing the second erasure code of the data is available for storing other data.
In other words, the above method may include: two mutually independent erasure codes are stored, i.e., both erasure codes can operate independently of the other erasure codes on various portions of the data to recreate the data. Creating/storing two erasure codes for data may advantageously improve data adaptation compared to a single erasure code, as one of the erasure codes itself may be lost or corrupted while the data may still be recovered using the remaining erasure codes. However, in the described implementation of the above method, only one of the erasure codes (i.e., the second erasure code) is selected to be covered after copying the data, and the second erasure code is retained in storage. Such an arrangement may advantageously balance data recovery capacity with protected memory footprint. In other examples of generating multiple erasure codes, all erasure codes may be selected to be covered after copying the data.
In one implementation, the method further comprises: performing an error detection operation on the data stored in the computer storage; in response to detecting that the data stored in the computer storage is in error, the copy of the data stored in the computer storage is retrieved.
In other words, in an example, the above method may include: testing the data to check for errors, e.g., partial or complete loss or corruption of the data, and in response to detecting the loss/corruption of the data, the method may include: the replica data is retrieved. This process of "actively" testing data and retrieving duplicate data may reduce the time between data loss and using the duplicate data to recover the data, which may advantageously minimize the time a client accesses the data after the data loss.
A second aspect of the present disclosure provides a computer program. The computer program comprises instructions that, when executed by a processor, cause the processor to: storing the data in a computer storage; storing erasure codes of the data in the computer storage; storing a copy of the data in the computer storage; based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data.
In one implementation of the second aspect, the computer program comprises instructions which, when executed by a processor, cause the processor to perform the method provided by any implementation of the first aspect of the present disclosure.
A third aspect of the present disclosure provides a computer readable data carrier. The computer readable data carrier has stored therein a computer program provided by the second aspect of the present disclosure.
A fourth aspect of the present disclosure provides a computing system. The computing system includes a computer storage and a processor coupled to the computer storage for: storing the data in a computer storage; storing erasure codes of the data in the computer storage; storing a copy of the data in the computer storage; based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data.
In an implementation manner of the third aspect, the processor is configured to perform a method provided by any implementation manner of the first aspect of the disclosure.
These and other aspects of the disclosure will be apparent from and elucidated with reference to one or more embodiments described hereinafter.
Drawings
For a more complete understanding of the present disclosure, embodiments of the present disclosure are described below, by way of example, in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of one example of a computing system embodying an aspect of the present disclosure;
FIG. 2 is a schematic diagram of computer memory in a computing system;
FIG. 3 is a schematic diagram of a computer storage device in a computing system;
FIG. 4 is a schematic diagram of a process included in an exemplary method for storing data in the computer storage device identified with reference to FIG. 3, the method including a method of partitioning data for storage;
FIG. 5 is a schematic diagram of a process included in a method of partitioning data for storage;
FIG. 6 is a schematic diagram of a process of storing location data;
FIG. 7 is a schematic diagram of a process for storing a flag in computer storage;
FIG. 8 is a schematic diagram of a process for validating that computer storage storing erasure codes is available for storing other data;
FIG. 9 is a schematic diagram of a process for storing other data at identified locations stored by a computer; and
fig. 10 is a schematic diagram of a process of performing an error detection operation.
Detailed Description
Referring first to fig. 1 and 2 in general, a computing system 101 embodying one example of an aspect of the present disclosure includes a plurality of client devices 102 and 103, a computer storage system 104, and a backup storage system 105. Client devices 102 and 103 are communicatively coupled to computer storage system 104 through network 106. Computer storage system 104 is communicatively coupled to backup storage system 105 through network 107.
Client devices 102 and 103 use computer storage system 104 to store data. For example, client devices 102 and 103 may send requests to computer storage system 104 to store or retrieve data over network 106, and may similarly exchange data for storage or retrieval with computer storage system 104 over network 106. Client devices 102 and 103 may be, for example, desktop computers, portable computers, or smartphones. Client devices 102 and 103 may include computing functionality, such as a computer processor. In the above examples, two client devices 102 and 103 are depicted as coupled to computer storage system 104. In other examples, the number of client devices coupled to computer storage system 104 may be greater than or less than two. Client devices 102 and 103 may be remote from computer storage system 104, and in fact, relatively remote from each other. For example, computer storage system 104 may be located within a central data center. Thus, client devices 102 and 103 can store data using storage resources in computer storage system 104.
Computer storage system 104 is used to provide storage resources for a plurality of client devices (e.g., client devices 102 and 103). As described below, in the above examples, computer storage system 104 is used to communicate with client devices 102 and 103 to store and retrieve data, and is also used to communicate with backup storage system 105 to store copies of data in computer storage system 104, i.e., into backup storage system 104.
Computer storage system 104 includes processor 108, memory 109, storage 110, input/output interface 111, and system bus 112. The processor 108 is used to control the operation of the computer storage system, for example, to process storage and retrieval requests by the client devices 102 and 103. In an example, processor 108 is configured to control the process of data storage operations according to data storage computer programs stored in memory 109, as described herein. Memory 109 is configured as a non-volatile read/write memory for storing computer programs executed by processor 108 and operational data associated with operations performed by processor 108. In an example, the memory 109 stores a computer program 201 for controlling data storage. In the example, memory 109 is flash memory, but in other examples, flash memory may be replaced with alternative forms of memory. Memory 110 is used to store data, for example, data for client devices 102 and 103. In an example, as described in detail below with reference to later figures, the memory 110 includes a plurality of storage devices, which in an example are a plurality of disk drives. The input/output interface 111 is used to connect the client devices 102 and 103 to the computer storage system 104, and to connect the computer storage system 104 to the backup storage and deduplication system 105. The components 108 through 111 in the computer 104 communicate via a system bus 112.
Backup storage system 105 is used to provide backup storage resources to computer storage system 104, i.e., to store one or more copies of data stored in computer storage system 104. Backup storage of data provides useful redundancy to allow recovery of data in the event of loss or corruption of original data (e.g., upon failure of memory 110). The backup storage system 105 includes a processor 112, a memory 113, an input/output interface 115, and a system bus 116. The processor 112 is used to control the operation of the backup storage system 105. As described herein, in an example, the processor 112 is used to control the process of communicating with the computer storage system 104 and for controlling data backup operations in cooperation with the computer storage system 104. The memory 113 is used for non-volatile storage of data, for example, for storing copies of data received from the computer storage system 104 for backup. In an example, the memory 113 includes one or more disk drives. The input/output interface 115 is used to connect the backup storage system 105 to the computer storage system 104. The components 112 through 115 communicate via a system bus 116.
The backup storage system 105 may be remote from the computer storage system 104. In practice, to provide backup storage for computer storage system 104, it may be desirable for backup storage system 105 to include physical storage resources, such as one or more disk drives, separate from the storage resources in computer storage system 104 to avoid the risk of both storage resources failing at the same time. In particular, in some examples, it may be advantageous for the backup storage system 105 to be geographically remote from the storage system 104, thereby reducing the risk of both storage systems failing at the same time.
In an example, both networks 106 and 107 may be implemented by wide area networks (wide area network, WAN) such as the internet, local area networks (local area network, LAN), metropolitan area networks (metropolitan area network, MAN), and/or personal area networks (personal area network, PAN), among others. The two networks may be implemented using wired technologies (e.g., ethernet, wired data transmission service interface specification (Data Over Cable Service Interface Specification, DOCSIS), synchronous optical network (synchronous optical networking, SONET) and/or synchronous digital hierarchy (synchronous digital hierarchy, SOH), etc.) and/or wireless technologies (e.g., institute of electrical and electronics engineers (Institute of Electrical and Electronics, IEEE) 802.11 (Wi-Fi), IEEE 802.15 (WiMAX), bluetooth, zigBee, near field communication (near-field communication, NFC), and/or Long Term Evolution (LTE), etc.). The two networks may include at least one device for communicating data in the network. For example, networks 106 and 107 may each include computing devices, routers, switches, gateways, access points, and/or modems.
Referring next to FIG. 3, in an example, memory 110 in computer storage system 104 includes a plurality of storage devices for storing data. In the above example, the memory 110 includes five separate storage devices 301-305, which in the example are all disk drives. Five memory devices 301 through 305 are coupled to system bus 116 through system bus 306 to serve as logical units individually addressable by processor 108. In an example, as detailed below with reference to later figures, the computer program 201 executed by the processor 108 causes the data to be stored in the memory 110 in a plurality of segments, each segment in the data structure being stored in a different one of the plurality of disk drives. In an example, the data segments are stored in a redundant array of independent disks (redundant array of independent disk, RAID) level 6 configuration, for example, by block-level striping of the data segments across multiple storage devices, where two parity blocks are distributed across the storage devices, as described below.
Thus, in the above example, data items (e.g., files) A through E are all stored in memory 110. The data items a to E are each divided into three segments or blocks 1 to 3, which may be of equal size. For each data item a to E, the constituent segments or blocks 1 to 3 are striped, i.e. distributed, over respective groups of three storage devices such that each storage device holds at most one of the respective three segments/blocks of data item a to E. In addition, in the example, for each data item a through E, two erasure codes (e.g., parity checks Pp and Pq) are stored in the memory 110. In an example, for each data item a to E, data segments/blocks 1 to 3 and the respective parity Pp and Pq are distributed over storage devices 301 to 305 such that each storage device holds at most one of these data segments/blocks or checks.
As described in detail below, the checks Pp and Pq of each data item a-E are configured to be independent of each other such that both parity checks Pp and Pq can operate independently of the other checks on a subset of the respective data segments/blocks 1-3 to recover the data. An advantage of this configuration is that the memory 110 has a relatively strong recovery capability for a subset of lost segments/blocks or corresponding checks. Specifically, in the above configuration, for each data item a to E, the memory has a recovery capability for loss of at most two or parity Pp and Pq in the blocks/blocks 1 to 3, since data can be reconstructed generally from any three of the blocks/blocks or parity.
Referring next to FIG. 4, in an example, a computer program 201 stored in memory 109 of computer storage system 104 is executed by computer storage system 104 to cause processor 108 to perform a method comprising four phases.
In stage 401, computer program 201 may cause processor 108 to store data in memory 110 of storage system 103. For example, stage 401 may be initiated in response to a storage request received by the storage system from one of client devices 102 and 103 (i.e., a request by the respective client device to store data in storage system 104). In response to this request, the processor 108 may then store the received data in the memory 110. In an example, as described above with reference to fig. 3, and as shown in detail in fig. 5, stage 401 may include: the processor 108 segments items (e.g., files) of received data into segments to stripe the segments across the storage devices 301-305 of the memory 110. Thus, for example, stage 401 may include: the processor 108 partitions a plurality of files (e.g., files a through E) of the received data into a plurality of blocks/segments, e.g., three blocks/segments 1 through 3, and stripes the segments across three of the storage devices 301 through 305. Thus, in an example, stage 401 may include three stages 501-503 to stripe data, as shown in fig. 5.
In stage 402, computer program 201 may cause processor 108 to create an erasure code, e.g., parity, for the data stored in stage 401 and store the erasure code in memory 110 as well. In an example, stage 402 may include: the processor 108 creates two mutually independent erasure codes for the data. Suitable procedures for creating erasure codes (e.g., parity) for data are known to those skilled in the art. For example, for each data item (e.g., file) a through E, parity "p" and "q" may be calculated using two different functions, e.g., parity "p" may be calculated from array data using XOR functions, and parity "q" may be calculated using Reed Solomon (Reed-Solomon) codes. Thus, erasure codes (e.g., parity) may have recovery capability for failure of at most two of the storage devices 301-305.
In stage 403, computer program 201 may cause processor 108 to perform a replication process by creating and storing a copy of the data stored in stage 401 in memory 113 of backup storage system 105. In an example, there may be a delay between execution stage 402 and stage 403. In an example, for example, in response to a repeated storage request by a client device 102, 103, stage 401 and stage 402 in the above method may be repeatedly performed, while stage 403 may only be performed after performing some cases of stage 401 and stage 402. For example, the backup operation of stage 403 may be performed periodically, e.g., daily, such that copies of all data stored in storage system 104 during that period may be backed up in a single time step. This may be a computationally efficient method of data replication. Thus, in stage 403, processor 108 in storage system 104 may create a copy of one or more data items stored in memory 110 in one or more stages 401 and may send the copy data to backup storage system 105 over network 107. The processor 108 in the storage system 104 may communicate with the processor 112 in the backup storage system 105 to cause the backup storage system 105 to save the replica data in the memory 113. Stage 403 may include: backup storage system 105 sends a report back to storage system 104 to confirm the storage of the duplicate data.
In an example, stage 403 may further include: the processor 108 causes location data to be stored that identifies the location of the copy data in the memory 113 of the backup storage system 105. For example, referring to fig. 6, in an example, stage 403 may include the process of step 601 of storing the location data in memory 113 along with the replica data. Thus, when it is desired to retrieve the duplicate data from memory, the location of the duplicate data in memory 113 may be conveniently identified, for example, by processor 108 or 112.
In an example, stage 403 may further include: one or more flags are stored in memory 110 of storage system 104 that mark the locations in computer memory 110 where the data copied in stage 403 was stored. Thus, these flags may identify, for example, to the processor 108, portions of data stored in the memory 110 that have been replicated, i.e., data for which backup copies have been stored in the backup storage system 105. Thus, when data in the memory 110 is lost, the processor 108 can confirm whether a copy of the data exists in the backup storage system 105, thereby quickly retrieving the copy data.
In stage 404, computer program 201 may cause processor 108 to confirm that the location in memory 110 of the erasure code used to store the corresponding data in stage 402 is available for storing other information based on the result of storing the copy of the data in the backup storage in stage 403. In other words, the data is replicated in stage 403 such that a copy of the data is stored in backup system 105, after which processor 108 in storage system 104 may determine that the erasure code of the data stored in memory 110 is no longer needed, and the location occupied by the erasure code in memory may be overwritten by other (e.g., new) data. For example, the processor 108 may save a register in the memory 110 that is available (i.e., that may be allowed to be used) in the memory 110 to store the location of other data. In an example, this stage may thus include: the processor 108 modifies this register to confirm in the register that the location where the erasure code was stored in stage 402 is available for storing other data. Thus, at a later time step, when the processor 108 wants to confirm the available location in the memory 110 for storing other data, the processor can consult the register. In other examples, stage 404 may include: the processor 108 sets a flag in the memory 110 at the location where the erasure code is stored, the flag being used to confirm the storage location where other data can be stored. In an example, stage 404 may include: the processor 108 causes the erasure codes stored in stage 402 to be erased from their respective locations in the memory 110.
In an example, stage 404 may include the process of stage 801 described with reference to fig. 8. Thus, stage 404 may include erasing only one set of erasure codes (e.g., parity "p"), while a second set of erasure codes (e.g., parity "q") is stored in memory 110 even after the data is copied in stage 403. This may advantageously balance the conflicting requirements of minimizing memory footprint and maintaining the recovery capability of the data stored in memory 110.
In an example, stage 404 may include the process of stage 901 described with reference to fig. 9. Thus, stage 404 may include a further step of: storing the other data in computer memory 110 confirms in stage 404 that the other data is available for storage.
Referring finally to fig. 10, in an example, a computer program 201 executed by the processor 108 may cause the processor 108 to perform an error detection operation 1001 whereby the processor 108 checks the integrity of data stored in the memory 110 to check whether the data is lost or corrupted. When the processor detects a loss or corruption of data, the processor 108 may retrieve a copy of the lost/corrupted data from the memory 113 of the backup storage 105. In other words, in an example, the above method may include: testing the data to check for errors, e.g., partial or complete loss or corruption of the data, and in response to detecting the loss/corruption of the data, the method may include: the replica data is retrieved. This process of "actively" testing data and retrieving duplicate data may reduce the time between data loss and using the duplicate data to recover the data, which may advantageously minimize the time a client accesses the data after the data loss.
Various aspects of the present disclosure have been described herein in the context of a computing system including a primary storage system 104 serving remote client devices 102 and 103 and a backup storage system 105 remote from the storage system 104 and serving the storage system 104. However, the various aspects of the present disclosure are more practical than such a configuration. For example, in a simple example of aspects of the present disclosure, computing system 101 may comprise a monolithic device, and the data, one or more erasure codes, and replica data may all be stored in a store of the monolithic device. In an example, the data, one or more erasure codes, and replica data may all be stored in a single storage device, e.g., in a single disk drive. Such a computing system may include a computer storage, such as one or more storage devices, e.g., a disk drive, and a processor coupled to the storage for storing data in the computer storage, storing erasure codes for the data in the computer storage, storing copies of the data in the computer storage, and validating the computer storage storing erasure codes for the data based on the results of storing copies of the data in the computer storage may be used to store other data.
Although various aspects of the present disclosure and its associated advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. Although the processes of the methods provided by the present disclosure have been described herein as occurring in a particular order, in other examples of the present disclosure, the processes of the methods may be performed in an alternative order, or may even be omitted from the methods.

Claims (15)

1. A method for storing data in a computer storage, the method comprising:
storing the data in a computer storage;
storing erasure codes of the data in the computer storage;
storing a copy of the data in the computer storage; and
based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data.
2. The method of claim 1, wherein storing the data in a computer store comprises: a first segment of the data is stored in a first computer storage device and a second segment of the data is stored in a second computer storage device.
3. The method of claim 1 or 2, wherein the storing the copy of the data in the computer storage comprises: the copy of the data is stored in one or more computer storage devices separate from the computer storage device storing the data and/or separate from the computer storage device storing the erasure code.
4. The method according to any of the preceding claims, wherein the method further comprises: location data identifying the location of the copy of the data is stored in the computer storage.
5. The method of claim 4, wherein the storing location data in the computer storage that identifies a location of the copy of the data comprises: the location data is stored in a computer storage device that stores the copy of the data.
6. The method according to any of the preceding claims, wherein the method further comprises: and storing a flag of the computer storage that marks the storage of the data in the computer storage according to the result of storing the copy of the data in the computer storage.
7. The method according to any of the preceding claims, wherein the method further comprises: other data is stored in the computer storage that confirms that other data is available for storage.
8. The method of any of the preceding claims, wherein the storing the erasure code of the data in the computer storage comprises: the erasure code is stored in one or more computer storage devices that store the data.
9. The method of any of the preceding claims, wherein the storing the erasure code of the data in the computer storage comprises: storing a first erasure code and a second erasure code in the computer storage, wherein the first erasure code and the second erasure code are independent of each other; based on the results of storing the copy of the data in the computer storage, validating the computer storage storing the erasure code of the data is available for storing other data including: based on the results of storing a copy of the data in the computer storage, it is confirmed that only the computer storage storing the second erasure code of the data is available for storing other data.
10. The method according to any of the preceding claims, wherein the method further comprises: performing an error detection operation on the data stored in the computer storage; in response to detecting that the data stored in the computer storage is in error, the copy of the data stored in the computer storage is retrieved.
11. A computer program comprising instructions which, when executed by a processor, cause the processor to:
storing the data in a computer storage;
storing erasure codes of the data in the computer storage;
storing a copy of the data in the computer storage; and
based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data.
12. The computer program according to claim 11, characterized in that the computer program comprises instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 2 to 10.
13. A computer readable data carrier, characterized in that the computer readable data carrier has stored therein a computer program according to claim 11 or 12.
14. A computing system comprising a computer storage and a processor coupled to the computer storage, the processor configured to:
storing data in the computer storage;
storing erasure codes of the data in the computer storage;
storing a copy of the data in the computer storage; and
based on the results of storing the copy of the data in the computer storage, the computer storage that acknowledges the erasure code storing the data is available to store other data.
15. The computing system of claim 14, wherein the processor is configured to perform the method of any of claims 2-10.
CN202080107869.XA 2020-12-16 2020-12-16 Storing data in computer storage Pending CN116601609A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/086478 WO2022128080A1 (en) 2020-12-16 2020-12-16 Storing data in computer storage

Publications (1)

Publication Number Publication Date
CN116601609A true CN116601609A (en) 2023-08-15

Family

ID=74106020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080107869.XA Pending CN116601609A (en) 2020-12-16 2020-12-16 Storing data in computer storage

Country Status (2)

Country Link
CN (1) CN116601609A (en)
WO (1) WO2022128080A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8799746B2 (en) * 2012-06-13 2014-08-05 Caringo, Inc. Erasure coding and replication in storage clusters
US9286163B2 (en) * 2013-01-14 2016-03-15 International Business Machines Corporation Data recovery scheme based on data backup status
US20190163374A1 (en) * 2017-11-28 2019-05-30 Entit Software Llc Storing data objects using different redundancy schemes

Also Published As

Publication number Publication date
WO2022128080A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
CN106407040B (en) A kind of duplicating remote data method and system
US8433849B2 (en) Hierarchical, distributed object storage system
US9495255B2 (en) Error recovery in a storage cluster
EP3519965B1 (en) Systems and methods for healing images in deduplication storage
US10133616B2 (en) Hybrid distributed storage system
US6970987B1 (en) Method for storing data in a geographically-diverse data-storing system providing cross-site redundancy
US8738582B2 (en) Distributed object storage system comprising performance optimizations
CN106776130B (en) Log recovery method, storage device and storage node
US10120769B2 (en) Raid rebuild algorithm with low I/O impact
EP2908254A1 (en) Data redundancy implementation method and device
US7975171B2 (en) Automated file recovery based on subsystem error detection results
CN102521058A (en) Disk data pre-migration method of RAID (Redundant Array of Independent Disks) group
JPH04230512A (en) Method and apparatus for updating record for dasd array
US10503620B1 (en) Parity log with delta bitmap
GB2510178A (en) System and method for replicating data
US6363457B1 (en) Method and system for non-disruptive addition and deletion of logical devices
CN113552998B (en) Method, apparatus and program product for managing stripes in a storage system
CN113377569A (en) Method, apparatus and computer program product for recovering data
US11487628B1 (en) System and method for rapidly transferring and recovering large data sets
US10997040B1 (en) System and method for weight based data protection
US10664346B2 (en) Parity log with by-pass
CN105159790A (en) Data rescue method and file server
CN116601609A (en) Storing data in computer storage
US11055190B1 (en) System and method for facilitating storage system operation with global mapping to provide maintenance without a service interrupt
JP6556980B2 (en) Storage control device, storage control method, and storage control program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination