CN111857540A - Data access method, device and computer program product - Google Patents

Data access method, device and computer program product Download PDF

Info

Publication number
CN111857540A
CN111857540A CN201910340221.6A CN201910340221A CN111857540A CN 111857540 A CN111857540 A CN 111857540A CN 201910340221 A CN201910340221 A CN 201910340221A CN 111857540 A CN111857540 A CN 111857540A
Authority
CN
China
Prior art keywords
disk
data
storage location
information
degraded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910340221.6A
Other languages
Chinese (zh)
Inventor
董继炳
高健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EMC Corp
Original Assignee
EMC IP Holding Co LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EMC IP Holding Co LLC filed Critical EMC IP Holding Co LLC
Priority to CN201910340221.6A priority Critical patent/CN111857540A/en
Priority to US16/824,032 priority patent/US11379326B2/en
Publication of CN111857540A publication Critical patent/CN111857540A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/18Error detection or correction; Testing, e.g. of drop-outs
    • G11B20/1883Methods for assignment of alternate areas for defective areas
    • G11B20/1889Methods for assignment of alternate areas for defective areas with discs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1084Degraded mode, e.g. caused by single or multiple storage removals or disk failures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B20/1217Formatting, e.g. arrangement of data block or words on the record carriers on discs
    • G11B2020/1218Formatting, e.g. arrangement of data block or words on the record carriers on discs wherein the formatting concerns a specific area of the disc
    • G11B2020/1222ECC block, i.e. a block of error correction encoded symbols which includes all parity data needed for decoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present disclosure provide a method, apparatus, and computer program product for performing data access for a disk array. The disk array comprises a check disk and a plurality of data disks. In one method, in response to a write request to a failed data disk in the disk array while the disk array is in a degraded state, writing data to a parity disk in the disk array; and setting corresponding degraded storage location information in the disk resource mapping information to indicate that the data is stored in the check disk. By the implementation of the present disclosure, a large amount of computing resources can be saved, and the I/O operations required for reading of the degraded state can be reduced.

Description

Data access method, device and computer program product
Technical Field
Some embodiments of the present disclosure relate to the field of data storage, and more particularly, to a method, apparatus, and computer program product for performing data access for a disk array.
Background
A storage system may be built based on one or more physical storage devices to provide data storage capabilities. Redundant Array of Independent Disks (RAID) is a storage technology that achieves data redundancy and increases access rates by combining multiple disks. According to RAID technology, a large-capacity disk group is configured by using a plurality of independent disks, and a striping technology is used to distribute load of data input/output (I/O) to a plurality of physical disks in a balanced manner. In other words, a contiguous block of data is divided evenly into several smaller blocks, each stored on multiple disks that logically belong to the same storage device. Thus, by storing and reading data in parallel on multiple disks, the throughput and access rate of the storage system may be improved.
In addition, in the RAID technology, a fault tolerance function is provided by using data verification to improve the stability of the system, and a relatively complete verification/recovery mechanism is provided in many RAID modes, and some of the mechanisms are even mirror image backups of each other directly.
A RAID disk group may crash in the event of a hardware failure, such as a power loss or a disk being removed. The disk is powered back up and a failure recovery operation is performed after the hardware failure, which immediately enters a degraded state from an optimized state indicating normal operation. In general, in order to make the user have a good experience, it is still necessary to allow the user to access the disk group even when the host is in a degraded state.
However, in the prior art, the I/O operation in the degraded state is very cumbersome and inefficient. For example, in RAID 5 mode, if an I/O write operation is performed to a failed disk in a degraded state, the target disk cannot be written with data because it failed, but rather the corresponding parity information is calculated and written to the parity disk. When an I/O read operation is performed on the data in the degraded state, the parity information and the data in the other storage disk are first read, the user data is reconstructed based on the parity information and the data read from the other storage disk, and the reconstructed data is then returned to the user. In addition, when the subsequent disk is brought online again and data needs to be recovered, the user data needs to be reconstructed again based on the parity information and the data of other disks, and the reconstructed data needs to be rewritten on the disk brought online again. Therefore, compared with the read-write under the optimized state, the read-write operation under the degraded state is very complicated, and the efficiency is lower.
Disclosure of Invention
In some embodiments of the present disclosure, a solution for performing data access with respect to a disk array is provided.
In a first aspect of the present disclosure, a method for performing data access with respect to a disk array is provided. The disk array includes a check disk and a plurality of data disks, and the method includes: in response to a write request for a failed data disk in the disk array while the disk array is in a degraded state, writing data to a check disk in the disk array; and setting corresponding degraded storage location information in the disk resource mapping information to indicate that the data is stored in the check disk.
In a second aspect of the present disclosure, an apparatus for performing data access with respect to a disk array is provided. The device includes: a processor; and a memory coupled with the processor, the memory having instructions stored therein that, when executed by the processor, cause the apparatus to perform acts comprising: in response to a write request for a failed data disk in the disk array while the disk array is in a degraded state, writing data to a check disk in the disk array; and setting corresponding degraded storage location information in the disk resource mapping information to indicate that the data is stored in the check disk.
In a third aspect of the present disclosure, there is provided a computer program product tangibly stored on a computer-readable medium and comprising machine executable instructions that, when executed, cause a machine to perform actions in a method according to the first aspect of the present disclosure.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout the exemplary embodiments of the present disclosure.
FIG. 1 illustrates a schematic diagram of an example environment in which some embodiments of the present disclosure may be implemented;
FIG. 2 illustrates an example method of building different storage levels in a tiered storage system according to the prior art;
FIG. 3 illustrates a flow diagram of a method 300 for performing data access with respect to a disk array, according to some embodiments of the present disclosure;
FIG. 4 schematically illustrates an example of a disk resource mapping employed in an embodiment in accordance with the present disclosure;
FIG. 5 illustrates a schematic diagram of a data write operation in a degraded state, according to some embodiments of the present disclosure.
FIG. 6 illustrates a flow diagram of a method 600 for performing data access with respect to a disk array, according to some embodiments of the present disclosure;
FIG. 7 illustrates a schematic diagram of a data read operation in a degraded state, according to some embodiments of the present disclosure.
FIG. 8 illustrates a flow diagram of a method 800 for performing data recovery for a disk array, according to some embodiments of the present disclosure;
FIG. 9 illustrates a schematic diagram of a fail-over operation in returning to an optimized state, according to some embodiments of the present disclosure.
FIG. 10 illustrates a flow diagram of a method 1000 for performing data access with respect to a disk array, according to some embodiments of the present disclosure;
FIG. 11 illustrates a schematic diagram of a data write operation after returning to an optimized state, according to some embodiments of the present disclosure.
FIG. 12 illustrates a flow diagram of a method 1200 for performing a data access for a disk array, according to some embodiments of the present disclosure;
fig. 13 shows a schematic block diagram of an example device 1300 that may be used to implement embodiments of the present disclosure.
Detailed Description
The principles of the present disclosure will be described below with reference to a number of example embodiments shown in the drawings. While the preferred embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that these embodiments are described merely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way.
The term "include" and variations thereof as used herein is meant to be inclusive in an open-ended manner, i.e., "including but not limited to". Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.
Referring first to fig. 1, fig. 1 shows a schematic diagram of an example environment in which aspects of the present disclosure may be implemented. The storage system 100 includes a set of physical storage devices 120 for providing data storage capabilities. The set of physical storage devices 120 includes a cache memory 122 and a disk device 124. Cache memory 122 is used for data caching and disk device 124 is used for persistent storage of data. Typically, the access speed of the cache memory 122 is greater than that of the hard disk device 124. The storage system 100 may utilize a variety of storage technologies to provide data storage capabilities.
In some embodiments, examples of cache memory 122 include caches, Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), and the like, which have higher access speeds. Examples of disk devices 124 may include a redundant array of disks (RAID) or other disk device.
To manage the storage of data to physical storage space, storage system 100 also includes a storage controller 110. Typically, the storage system 100 employs a hierarchical control model. As shown in FIG. 1, under a hierarchical control model, the storage controller 110 may have multiple layers of control modules, including a host control module 112, a cache control module 114, and a physical disk control module 116. These control modules implement hierarchical control functions.
To facilitate an understanding of the hierarchical control model of the storage system 100, the operating mechanism of the controller 110 is described in an example using RAID technology. Physical disk control module 116 presents Logical Unit Number (LUN) devices to cache control module 114. The physical disk control module 116 controls the storage space of the magnetic disk device 124. The cache control module 114 controls the cache space of the cache memory 122 to present the cache volume to the host control module 112. The host control module 112 manages the logical storage pool and presents the pool LUN to the host 102.
In operation, an application running on the host 102 may send a user write request to the host control module 112 to request that data be written to the storage system 100. In response to a received user write request, the host control module 112 may generate multiple write requests to the cache control module 114. For example, if a user write request by host 102 requires a large amount of data to be written to multiple discrete sectors of a RAID LUN created by hard disk device 124, host control module 112 sends a write request to cache control module 114 for each sector.
The cache control module 114 and the cache memory 122 operate in a write-back mode. This means that upon receiving a write request, the cache control module 114 first caches the data to be written by the write request to the cache memory 122, and then releases the data of the write request(s) to the disk device 140. The cache control module 114 may send a completion indication of the write request to the host control module 112 upon completion of the data cache, thereby enabling a fast response to the user write request. The physical disk control module 116 is used to control the actual writing to the hard disk device 124.
It should be understood that while shown as distinct modules above, the host control module 112, cache control module 114, and physical disk control module 116 may be implemented by a single or multiple processors, controllers, microprocessors, etc. or computing devices including these devices with processing capabilities. In some examples, the cache control module 114 may also be integrated with the cache memory 122 to obtain a device with both cache and control capabilities. Although not shown, in some embodiments, the storage system 100 may also include another storage controller that is a mirror of the storage controller 100 to provide data consistency, security, and data recovery capabilities. In some examples, storage system 100 may also implement data storage and management using multi-core storage mechanisms.
For illustrative purposes, the method of constructing different storage levels in a prior art hierarchical storage system will first be described with reference to fig. 2. As shown in FIG. 2, in the storage system, there are N physical disks, disk 0 through disk N-1. Each of the disks 0 to N-1 is divided into a plurality of disk segments, and the disk segments are collectively managed in the segment pool 110. The RAID section is formed by selecting a plurality of disk sections from the disks, and the plurality of RAID sections collectively constitute different storage tiers 121 to 123. Each tier represents a combination of hard disks corresponding to a particular rate. In general, three storage tiers may be created for a storage pool, each storage tier may be formed by a corresponding type of physical disk. For example, a Solid State Disk (SSD) may be utilized to form a high performance SSD layer, namely Tier 0; forming a medium performance small computer system interface (SAS) layer, i.e., Tier1, using a SAS disk; a serial hard disk SATA disk is used to form a high capacity SATA layer, namely Tier 2. A volume may be distributed across multiple tiers, and depending on the activity of the volume's I/O, the data on the volume may be moved between different tiers to implement automated tiered storage functions.
As previously described, in the prior art, in an optimized state, if an I/O write operation is performed, data will be written to the corresponding data disk and corresponding parity information will be written to the parity disk, while data will be read directly from the corresponding data disk upon reading. The I/O operation in the degraded state is cumbersome and inefficient. For example, in RAID 5 mode, if an I/O write operation is performed to a failed disk in a degraded state, the target disk cannot be written with data because it failed, but rather the corresponding parity information is calculated and written to the parity disk. When the I/O read operation is performed on the data in the degraded state, the parity information and the data of other storage disks are acquired first, the user data is reconstructed based on the parity information and the data read from other disks, and the reconstructed data is returned to the user. In addition, when the subsequent disk is brought online again and data needs to be recovered, the user data needs to be reconstructed again based on the parity information and the data of other disks, and the reconstructed data needs to be rewritten on the disk brought online again. Therefore, compared with the I/O operation in the optimized state, the read-write operation in the degraded state is very cumbersome and inefficient.
For this reason, in the embodiments in the present disclosure, it is proposed to directly write data in a parity disc without writing parity information at the time of a write operation to a failed data disc in a degraded state. Therefore, in the writing process under the degradation state, the check information does not need to be calculated, and the calculation operation can be saved. Moreover, when a read operation is performed on the data in a degraded state, it is only necessary to read the data directly from the check disk, without reading the data and the check information of other data disks and reconstructing the data based on the check information and the read data as in the related art. In a further embodiment, the parity information may be calculated and written directly to the data disk during the failover process, which does not cause additional I/O operations and computational operations upon failover. In addition, in a further embodiment, when the data is written again in the optimized state, the data can be written back to the data disk again, and the verification information can be written to the verification disk, so that the degraded writing mode can be automatically switched back to the normal writing mode without any additional I/O operation and consumption of computing resources.
It should be noted that the term "degraded writing mode" used herein refers to a writing mode proposed in the present application for directly writing data in the verification disc during a writing operation on a failed data disc in a degraded state; the term "normal writing mode" used herein refers to a writing mode in which data is written to a data disc in the related art.
Hereinafter, the data access scheme proposed in the present application will be described with reference to fig. 3 to 12 in conjunction with example embodiments. It should be noted that the following examples are given for illustrative purposes only, and the present disclosure is not limited thereto.
FIG. 3 illustrates a flow diagram of a method 300 for performing data access with respect to a disk array, according to some embodiments of the present disclosure. As shown in FIG. 3, at block 310, data is written to a parity disk in the disk array in response to a write request to a failed data disk in the disk array while the disk array is in a degraded state. In an embodiment of the present disclosure, a disk array includes a parity disk and a plurality of data disks. The disk array is, for example, a disk group formed by RAID technology. For RAID 5, the disk array includes one parity disk P and 4 data disks. It should be noted that, hereinafter, RAID 5 will be described as an example, but the present disclosure is not limited thereto, and may be applied to any other RAID mode, and may also be applied to a disk array including a parity disk and multiple data disks formed based on any other technology.
At block 320, corresponding destaged storage location information is set in the disk resource mapping information indicating that the data is stored in the check disk. The write operation for a failed disk in a degraded state is in a different manner than the normal write operation, where the parity disk contains data instead of parity information. Such a change in storage location may be indicated in order to support a subsequent read I/O operation to retrieve data directly from the parity disk in a subsequent operation. To this end, destage storage location information may be included in the disk resource mapping information to indicate whether data is stored in the check disk. Therefore, in the case where data writing in a degraded state proposed in the present disclosure is performed, the corresponding degraded storage location information may be set to indicate that the data is stored in the parity disk.
In some embodiments of the present disclosure, the disk resource mapping information includes a plurality of destage storage location indicators for the plurality of data disks, i.e., a respective destage storage indicator is set for each data disk. In this way, a destage storage location indicator corresponding to the failed data disk may be set in the disk resource mapping information to indicate that data associated with the failed data disk has been written to the parity disk.
In some embodiments of the present disclosure, the degraded storage location indicator may have a first value and a second value, wherein the first value corresponds to, for example, a valid value, e.g., "1", to indicate that the data is stored in the parity disk; the second value may correspond to an invalid value, e.g. "0", indicating that the data is not stored in the check disc. In this way, when corresponding destaged storage location information is set in the disk resource mapping information, the destaged storage location indicator may be set to a first value indicating that the data is stored in the parity disk to indicate that the data is written in the parity disk.
FIG. 4 schematically illustrates an example of a disk resource mapping approach that may be employed in embodiments consistent with the present disclosure. The RAID is illustrated in the figure as an example, and the embodiments of the present disclosure may implement storage of degraded storage location information based on the disk resource mapping of this example.
The resource mapper is a log-based mapping system that is used in a storage system to manage disk resource mappings. It exports logical storage to upper layers and maps storage space to the physical storage resources of the RAID. As shown in fig. 4, the logical space of the resource mapper is composed of 4KB sized pages, each page for storing the resource mapping information of the disk, and a plurality of pages are packed together to form a virtual block of 2MB size, which is called a virtual large block, VLB.
The RAID zone is divided into a plurality of Physical Large Blocks (PLBs), PLB 0 to PLB M, which include disk zones 1 to k from different disks. Each PLB includes storage resources in each disk sector from the RAID sector. For a PLB, a corresponding VLB is set to store resource mapping information on the PLB. Information about the mapping of VLBs to PLBs is in the data structure of VLBs. The resource mapper may, for example, use a three-level B + tree structure to reference pages in the VLB, with block pointers in leaf entries of the B + tree pointing to pages of the VLB.
According to some embodiments of the present disclosure, destaged storage location information may be stored in a VLB used to store disk resource mapping information. In this way, the degraded storage location information and other disk resource mapping information can be read out simultaneously during data reading without additional reading operation. For example, destage storage location indicators corresponding to respective data disks may be set in the VLB.
Next, an exemplary write operation to the disk array in the degraded state will be schematically described with reference to fig. 5, which shows a scenario when data write is performed for one PLB of a RAID section. As shown in fig. 5, in one PLB, 4 disk segments from 4 disks are used to store a plurality of data versions, respectively, and a check disk is used to store corresponding check information. In this embodiment, in the VLB of the resource mapper, a degraded storage location indicator de _ pos is added to record for which data disk its data is recorded at parity locations.
As shown in fig. 5, for the illustrated PLB, when writing is performed on it in the degraded state, the writing I/O operation for the good data disc may be performed normally, i.e., the multiple versions D0, D2, and D3 of the data may be directly written to the corresponding data disc. For the write I/O operation of the failed data disk, the scheme proposed by the present disclosure will be adopted, that is, the data D1 that should be written to the failed data disk is directly written to the check disk, and the check information is not calculated and stored, as shown in fig. 5. After the write operation is completed, a degraded storage location indicator value corresponding to the failed data disk is stored in the VLB to indicate that the data is stored in the parity disk.
In particular, the resource mapper may allocate VLBs and PLBs for dirty pages that have already been written during a data flush. The resource mapper then sends the complete PLB write to the RAID. During a PLB write, if the disk segment of the PLB is in an optimized state, a normal write is performed and a degraded storage location indication of INVALID "0" is returned to the resource mapper. Conversely, if the disk segment of the PLB is in a degraded state, the degraded data is written to a corresponding location in the parity disk, rather than the data disk, while a valid degraded storage location indication is returned to the mapper. After the write I/O operation returns to the resource mapper, the resource mapper may store the degraded storage location indicator in the corresponding VLB.
Therefore, with the degraded writing mode in the present disclosure, when writing to a failed disk in a degraded state, there is no need to calculate and store the verification information, and thus a large number of CPU cycles can be saved. Moreover, when a read operation is performed on the data in a degraded state, it is only necessary to read the data directly from the check disk, without reading the data and the check information of other data disks and reconstructing the data based on the check information and the read data as in the related art.
FIG. 6 illustrates a flow diagram of a method 600 for performing data accesses for a disk array, showing example I/O read operations for a failed disk in a degraded state, according to some embodiments of the present disclosure. As shown in FIG. 6, at block 610, in response to a read request for the data with the disk array in the destaged state, the corresponding destaged storage location information is obtained from the disk resource mapping information. The data is then retrieved from the parity disk at block 620 in accordance with the indication of the corresponding destaged storage location information.
Therefore, in the degraded state, when reading is performed on data written by the degraded writing mode, the degraded storage location information may be acquired first, and according to the indication of the degraded storage location information, "deg _ pos is 1", which means that the corresponding data is written to the parity disk, so that the data may be directly acquired from the parity disk without performing a data reconstruction operation as in the related art. Thus, I/O operations can be significantly reduced, saving a large amount of resources.
On the other hand, for data written by the normal writing method, the degraded storage location information is obtained first, but since the degraded storage location information is in an invalid state, the data is still obtained from the corresponding location in the data disc. Since the destage storage location information operation is stored in the VLB along with the resource mapping information, it can be read together, so that no additional read operation is needed and the computing resource occupation can be ignored. This means that, according to embodiments of the present disclosure, the performance of normal data read operations for degraded states is not affected at all.
FIG. 7 illustrates a schematic diagram of a data read operation in a degraded state, according to some embodiments of the present disclosure. As shown in fig. 7, for data access, on a cache miss, the resource mapper first reads the VLB to obtain mapping information and destage storage location information of the leaf node entry of the B + tree. Then, the resource mapper transmits the degraded storage location information deg _ pos in the VLB to the RAID indicated by the mapping information together with the read I/O. If the destage location is invalid, i.e., the I/O read operation is not for a failed data disk, a sub I/O is sent to a storage location in the data disk to read the data in the normal manner, i.e., to read the data from the data disk. If the failed disk is read and the degraded storage information is valid, then according to the scheme proposed in this disclosure, the sub I/O is sent to the corresponding location in the parity disk to read the data, rather than the storage location in the data disk.
In this way, when a read operation is performed on data written in the degraded mode in the degraded state, data can be directly read from the check disk without reading data and check information of other data disks and reconstructing the data based on the check information and the read data, so that I/O operations can be saved and a large number of CPU cycles can be saved.
Further, in some embodiments according to the present disclosure, the data storage location and the check information storage location may be restored to a normal state and the degraded storage location information may be set to invalid when performing the failure recovery, so that the storage locations in the optimized state are all normal. In this way, in the optimized state, the storage locations of all data are in the normal state, and thus data can be directly retrieved from the data disk without paying attention to the value of the degraded storage location information.
However, in other embodiments according to the present disclosure, the verification information may also be stored directly on the data disk upon failure recovery without changing the storage location of the data written in the degraded writing mode during the degradation, and new data may not be written to the data disk and the verification information may be written to the verification disk until later when the data is read and written again. Hereinafter, the related operation will be described with reference to fig. 8 to 11.
FIG. 8 illustrates a flow diagram of a method 800 for performing data recovery for a disk array, according to some embodiments of the present disclosure. As shown in FIG. 8, at block 810, in response to the disk array returning to an optimized state, data for the other data disks and data stored in the parity disk are calculated, and corresponding parity information is calculated based on these data. The parity information may then be stored in the failed data disk that has been recovered at block 810. Therefore, according to the embodiment of the disclosure, the calculated check information can be directly stored in the recovered failed data disk in the failure recovery process, so that the storage positions of the data and the check information do not need to be exchanged, and therefore, the I/O operation can be reduced, and the bandwidth resource can be saved.
In the case where degraded storage location information is stored in the resource mapping information, data reconstruction during failure recovery may be performed in cooperation with the resource mapper and the RAID.
FIG. 9 illustrates a schematic diagram of operations upon failure recovery back to an optimized state, according to some embodiments of the present disclosure. As shown in fig. 9, in the degraded state, if the failed disk comes back on-line, the resource mapper first reads the VLB to obtain the degraded storage location information. The resource mapper then sends an I/O operation to the degraded storage location to read the entire PLB. At this point, the RAID will read D0, D2, and D3 first from the storage locations of the originally intact data disks, and read the data D1 from the parity locations. The resource mapper will initiate a repair write operation to RAID using data D0, D1, D2, and D3. Specifically, the RAID calculates parity data from D0, D1, D2, and D3, and then writes the calculated parity data to the data disk locations that have been restored.
Further, in the embodiment according to the present disclosure, the storage location of the data and the storage location of the parity information in the disk system are also swapped back through the subsequent write I/O operation, and then the degraded storage location information is set to be invalid and returned to the resource mapper. The mapper stores the corresponding destage storage location information in the VLB metadata.
FIG. 10 illustrates a flow diagram of a method 1000 for performing data access with respect to a disk array, according to some embodiments of the present disclosure. As shown in FIG. 10, at block 1010, new data is written to the failed data disk that has been restored in response to a write request for the data in the optimized state. At block 1020, the corresponding parity information is written to the parity disk. At block 1030, the corresponding degraded storage location information is set to a second value indicating that data is not stored in the check disk. In this way, the storage locations of the data and the check information written in the degraded write mode can be swapped to the normal storage state.
FIG. 11 illustrates a schematic diagram of a data write operation after returning to an optimized state, according to some embodiments of the present disclosure. As shown in fig. 11, when a write I/O operation is performed on data written using the degraded sucking mode in an optimized state, the data write operation is performed in a normal write manner, i.e., data D0, D1, D2, and D3 are written to corresponding locations in the data disc, and parity information is written to corresponding locations in the parity disc. At the same time, the degraded storage location identifier is also set to invalid. In this way, it is possible to switch back to normal data storage by means of a subsequent write operation without the need for additional I/O operations, thus saving bandwidth resources.
FIG. 12 illustrates a flow diagram of a method 1200 for performing a data access for a disk array, illustrating an I/O read operation in an optimized state, according to some embodiments of the present disclosure. As shown in FIG. 12, at block 1210, corresponding destaged storage location information is obtained from the disk resource mapping information in response to a read request for a data disk in the disk array while the disk array is in an optimized state. At block 1220, data is retrieved from the parity disk if it is determined that the corresponding degraded storage location information indicates that the data is stored in the parity disk. In block 1230, data is retrieved from the data disk if it is determined that the corresponding degraded storage location information indicates that the data is not stored in the check disk.
It should be noted that the data access scheme proposed in the present disclosure may also have no influence on the data access operation in the optimized state. In an embodiment of the present invention, the destage storage location information operation may be stored with the resource mapping information so that they can be read together, thus no additional read operation is required and the computing resource occupation may be ignored. Therefore, according to the embodiments of the present disclosure, the performance of the normal data read operation in the optimized state is not affected at all.
Additionally, fig. 13 also illustrates a schematic block diagram of an example device 1300 that may be used to implement some embodiments of the present disclosure. The apparatus 1300 may be used to implement one or more control modules of the storage controller 110 of FIG. 1.
As shown, device 1300 includes a Central Processing Unit (CPU)1301 that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM)1302 or computer program instructions loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the unit 1300 can also be stored. The CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.
A number of components in the device 1300 connect to the I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, or the like; and a communication unit 1309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1309 allows the device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Processing unit 1301 performs the various methods and processes described above, such as any one or more of method 300, method 600, method 800, method 1000, and method 1200. For example, in some embodiments, any one or more of the methods 300, 600, 800, 1000, and 1200 may be implemented as a computer software program or computer program product, which is tangibly embodied in a machine-readable medium, such as the storage unit 1308. In some embodiments, some or all of the computer program may be loaded and/or installed onto device 1300 via ROM 1302 and/or communications unit 1309. When the computer program is loaded into RAM 1303 and executed by CPU 1301, one or more steps of any one or more of methods 300, 600, 800, 1000, and 1200 described above may be performed. Alternatively, in other embodiments, the CPU 1301 may be configured in any other suitable manner (e.g., by way of firmware) to perform any one or more of the methods 300, 600, 800, 1000, and 1200.
It will be appreciated by those skilled in the art that the steps of the method of the present disclosure described above may be implemented by a general purpose computing device, centralized on a single computing device or distributed over a network of computing devices, or alternatively, may be implemented by program code executable by a computing device, such that the program code may be stored in a memory device and executed by a computing device, or may be implemented by individual or multiple modules or steps of the program code as a single integrated circuit module. As such, the present disclosure is not limited to any specific combination of hardware and software.
It should be understood that although several means or sub-means of the apparatus have been referred to in the detailed description above, such division is exemplary only and not mandatory. Indeed, according to some embodiments of the present disclosure, the features and functions of two or more of the apparatuses described above may be embodied in one apparatus. Conversely, the features and functions of one apparatus described above may be further divided into embodiments by a plurality of apparatuses.
The above description is intended only as an alternative embodiment of the present disclosure and is not intended to limit the present disclosure, which may be modified and varied by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (15)

1. A method for performing data access for a disk array, wherein the disk array includes a parity disk and a plurality of data disks, the method comprising:
in response to a write request for a failed data disk in the disk array while the disk array is in a degraded state, writing data to a check disk in the disk array; and
and setting corresponding degraded storage location information in the disk resource mapping information to indicate that the data is stored in the check disk.
2. The method of claim 1, wherein the disk resource mapping information includes a plurality of destage storage location indicators for the plurality of data disks, and wherein setting respective destage location information in the disk resource mapping information includes setting a destage storage location indicator corresponding to the failed data disk in the disk resource mapping information.
3. The method of claim 2, wherein setting corresponding destage storage location information in the disk resource mapping information comprises setting the destage storage location indicator to a first value indicating that the data is stored in the parity disk.
4. The method of claim 1 or 2, further comprising:
in response to a read request for the data when the disk array is in the degraded state, obtaining the corresponding degraded storage location information from the disk resource mapping information; and
and acquiring the data from the check disk according to the indication of the corresponding degraded storage position information.
5. The method of claim 1 or 2, further comprising:
in response to the disk array returning to an optimized state, calculating parity information for the data; and
Storing the verification information in the recovered failed data disk.
6. The method of claim 5, further comprising:
in response to a write request for the data in the optimized state;
writing new data to the failed data disk that has been restored;
writing corresponding verification information into the verification disk; and
setting corresponding destaged storage location information to a second value indicating that data is not stored in the check disk.
7. The method of claim 1 or 2, further comprising:
responding to a read request aiming at a data disk in the disk array when the disk array is in an optimized state, and acquiring corresponding degraded storage position information from the disk resource mapping information;
retrieving data from the parity disk if it is determined that the corresponding degraded storage location information indicates that the data is stored in the parity disk; and
in an instance in which it is determined that the corresponding destaged storage location information indicates that the data is not stored in the check disk, retrieving data from the data disk.
8. An apparatus for performing data access with respect to a disk array, wherein the disk array includes a parity disk and a plurality of data disks, the apparatus comprising:
A processor; and
a memory coupled with the processor, the memory having instructions stored therein that, when executed by the processor, cause the apparatus to perform acts comprising:
in response to a write request for a failed data disk in the disk array while the disk array is in a degraded state, writing data to a check disk in the disk array; and
and setting corresponding degraded storage location information in the disk resource mapping information to indicate that the data is stored in the check disk.
9. The apparatus of claim 8, wherein the disk resource mapping information includes a plurality of destage storage location indicators for the plurality of data disks, and wherein setting respective destage location information in the disk resource mapping information includes setting a destage storage location indicator corresponding to the failed data disk in the disk resource mapping information.
10. The method of claim 9, wherein setting corresponding destage storage location information in the disk resource mapping information comprises setting the destage storage location indicator to a first value indicating that the data is stored in the parity disk.
11. The apparatus of claim 8 or 9, wherein the instructions are further configured to, when executed by the processor, cause the apparatus to:
in response to a read request for the data when the disk array is in the degraded state, obtaining the corresponding degraded storage location information from the disk resource mapping information; and
and acquiring the data from the check disk according to the indication of the corresponding degraded storage position information.
12. The apparatus of claim 8 or 9, wherein the instructions are further configured to, when executed by the processor, cause the apparatus to:
in response to the disk array returning to an optimized state, calculating parity information for the data; and
storing the verification information in the recovered failed data disk.
13. The apparatus of claim 12, wherein the instructions are further configured to, when executed by the processor, cause the apparatus to:
in response to a write request for the data in the optimized state;
writing new data to the failed data disk that has been restored;
writing corresponding verification information into the verification disk; and
setting corresponding destaged storage location information to a second value indicating that data is not stored in the check disk.
14. The apparatus of claim 8 or 9, wherein the instructions are further configured to, when executed by the processor, cause the apparatus to:
responding to a read request aiming at a data disk in the disk array when the disk array is in an optimized state, and acquiring corresponding degraded storage position information from the disk resource mapping information;
retrieving data from the parity disk if it is determined that the corresponding degraded storage location information indicates that the data is stored in the parity disk; and
in an instance in which it is determined that the corresponding destaged storage location information indicates that the data is not stored in the check disk, retrieving data from the data disk.
15. A computer program product tangibly stored on a computer-readable medium and comprising machine executable instructions that, when executed, cause a machine to perform the method of any of claims 1-7.
CN201910340221.6A 2019-04-25 2019-04-25 Data access method, device and computer program product Pending CN111857540A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910340221.6A CN111857540A (en) 2019-04-25 2019-04-25 Data access method, device and computer program product
US16/824,032 US11379326B2 (en) 2019-04-25 2020-03-19 Data access method, apparatus and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910340221.6A CN111857540A (en) 2019-04-25 2019-04-25 Data access method, device and computer program product

Publications (1)

Publication Number Publication Date
CN111857540A true CN111857540A (en) 2020-10-30

Family

ID=72917055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910340221.6A Pending CN111857540A (en) 2019-04-25 2019-04-25 Data access method, device and computer program product

Country Status (2)

Country Link
US (1) US11379326B2 (en)
CN (1) CN111857540A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849123A (en) * 2021-08-14 2021-12-28 苏州浪潮智能科技有限公司 Data processing method, system, equipment and medium for slow disk
CN114546272A (en) * 2022-02-18 2022-05-27 山东云海国创云计算装备产业创新中心有限公司 Method, system, apparatus and storage medium for fast universal RAID demotion to RAID5

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857552A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Storage management method, electronic device and computer program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851635A (en) * 2006-06-01 2006-10-25 杭州华为三康技术有限公司 Method and system for read-write operation to cheap magnetic disk redundant array
US8839028B1 (en) * 2011-12-23 2014-09-16 Emc Corporation Managing data availability in storage systems

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7389393B1 (en) 2004-10-21 2008-06-17 Symantec Operating Corporation System and method for write forwarding in a storage environment employing distributed virtualization
US8145840B2 (en) * 2009-06-05 2012-03-27 Lsi Corporation Method and system for storing excess data in a redundant array of independent disk level 6
US8583866B2 (en) 2010-02-22 2013-11-12 International Business Machines Corporation Full-stripe-write protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
US9563509B2 (en) * 2014-07-15 2017-02-07 Nimble Storage, Inc. Methods and systems for storing data in a redundant manner on a plurality of storage units of a storage system
US9720596B1 (en) 2014-12-19 2017-08-01 EMC IP Holding Company LLC Coalescing writes for improved storage utilization
US9990263B1 (en) * 2015-03-20 2018-06-05 Tintri Inc. Efficient use of spare device(s) associated with a group of devices
US10901646B2 (en) 2018-11-30 2021-01-26 International Business Machines Corporation Update of RAID array parity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851635A (en) * 2006-06-01 2006-10-25 杭州华为三康技术有限公司 Method and system for read-write operation to cheap magnetic disk redundant array
US8839028B1 (en) * 2011-12-23 2014-09-16 Emc Corporation Managing data availability in storage systems

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849123A (en) * 2021-08-14 2021-12-28 苏州浪潮智能科技有限公司 Data processing method, system, equipment and medium for slow disk
CN113849123B (en) * 2021-08-14 2023-08-25 苏州浪潮智能科技有限公司 Method, system, equipment and medium for processing data of slow disk
CN114546272A (en) * 2022-02-18 2022-05-27 山东云海国创云计算装备产业创新中心有限公司 Method, system, apparatus and storage medium for fast universal RAID demotion to RAID5
CN114546272B (en) * 2022-02-18 2024-04-26 山东云海国创云计算装备产业创新中心有限公司 Method, system, device and storage medium for degrading RAID (redundant array of independent disks) to RAID5 (redundant array of independent disks)

Also Published As

Publication number Publication date
US20200341873A1 (en) 2020-10-29
US11379326B2 (en) 2022-07-05

Similar Documents

Publication Publication Date Title
US9128855B1 (en) Flash cache partitioning
US6058489A (en) On-line disk array reconfiguration
EP2685384B1 (en) Elastic cache of redundant cache data
US9684591B2 (en) Storage system and storage apparatus
US10019362B1 (en) Systems, devices and methods using solid state devices as a caching medium with adaptive striping and mirroring regions
JP5944587B2 (en) Computer system and control method
JP2014174992A (en) System, method and computer-readable medium for managing cache store to achieve improved cache ramp-up across system reboots
CN103207840B (en) For imperfect record to be degraded to the system and method for the second buffer memory from the first buffer memory
US10564865B2 (en) Lockless parity management in a distributed data storage system
JP4229626B2 (en) File management system
US6378038B1 (en) Method and system for caching data using raid level selection
WO1999017208A1 (en) Multiple data controllers with centralized cache
JPH0642193B2 (en) Update recording method and apparatus for DASD array
CN113971104A (en) System and method for parity-based fault protection of storage devices
US9223655B2 (en) Storage system and method for controlling storage system
US20200133836A1 (en) Data management apparatus, data management method, and data management program
US11379326B2 (en) Data access method, apparatus and computer program product
US10579540B2 (en) Raid data migration through stripe swapping
CN114443346A (en) System and method for parity-based fault protection of storage devices
JP6011153B2 (en) Storage system, storage control method, and storage control program
CN111858189A (en) Handling of storage disk offline
US20170220476A1 (en) Systems and Methods for Data Caching in Storage Array Systems
JPH11288387A (en) Disk cache device
US20140372672A1 (en) System and method for providing improved system performance by moving pinned data to open nand flash interface working group modules while the system is in a running state
JP2003131818A (en) Configuration of raid among clusters in cluster configuring storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination