CN102541472A - Method and device for reconstructing RAID (Redundant Array of Independent Disks) - Google Patents

Method and device for reconstructing RAID (Redundant Array of Independent Disks) Download PDF

Info

Publication number
CN102541472A
CN102541472A CN2011104567385A CN201110456738A CN102541472A CN 102541472 A CN102541472 A CN 102541472A CN 2011104567385 A CN2011104567385 A CN 2011104567385A CN 201110456738 A CN201110456738 A CN 201110456738A CN 102541472 A CN102541472 A CN 102541472A
Authority
CN
China
Prior art keywords
resource
written
physical block
reconstruction
physical
Prior art date
Application number
CN2011104567385A
Other languages
Chinese (zh)
Inventor
上官应兰
Original Assignee
杭州宏杉科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州宏杉科技有限公司 filed Critical 杭州宏杉科技有限公司
Priority to CN2011104567385A priority Critical patent/CN102541472A/en
Publication of CN102541472A publication Critical patent/CN102541472A/en

Links

Abstract

The invention relates to a method and device for reconstructing an RAID (Redundant Array of Independent Disks), which is used for executing RAID reconstructing operation, wherein the RAID is divided into physical blocks with the same size in advance. The device comprises a resource distribution unit, an access record unit and a reconstruction processing unit, wherein the resource distribution unit is used for distributing one or more physical blocks to a logical resource when the logical resource is established, and recording a corresponding relationship between the logical resource and the physical blocks; the access record unit is used for maintaining a resource access log sheet which is used for recording whether data is written in each physical block or not; and the reconstruction processing unit is used for acquiring physical blocks in a written state according to the resource access log sheet when the RAID is reconstructed, and reconstructing the physical blocks in the written state by using the physical block as a unit. According to the method and the device for reconstructing the RAID, the reconstruction efficiency can be greatly improved, and the influence to services is reduced to be a lower level in the reconstructing process.

Description

一种RAID阵列重建的方法及装置 Method and device for reconstruction of RAID arrays

技术领域 FIELD

[0001] 本发明涉及网络存储技术,尤其涉及一种RAID阵列重建技术。 [0001] The present invention relates to network storage technology, and particularly to a RAID array reconstruction technique. 背景技术 Background technique

[0002] 在涉及众多主机的数据存储的网络环境中,为了提高数据存储的可靠性和安全性,同时为了存储容量的扩展性和灵活性,网络存储技术应运而生。 [0002] In many hosts relates to data storage network environment, to improve the reliability and security of data storage, and in order to expand storage capacity and flexibility, network storage technology emerged. 通常来说,网络存储系统的作用是为客户端PC机或者服务器(一般统称为主机或Host)提供可用的存储空间。 In general, the role of the network storage system is a client PC or server storage space available (generally referred to as host or Host).

[0003] 一般网络存储系统的前端可以通过IP网络或者FC网络与主机相连,为主机提供数据存储服务。 [0003] Usually the distal end network storage system via the IP network, or an FC network connected to the host, provide data storage services to the host. 在数据传输方面,以基于IP承载的网络存储系统为例,主机可以基于标准的iSCSI(互联网小型计算机系统接口)协议网络存储系统进行数据的读写操作。 In the data transmission to the network storage system based on IP bearer, for example, the host can read and write data based on the standards the iSCSI (Internet Small Computer System Interface) protocol network storage system. 网络存储系统的核心是存储控制器(StorageController),存储控制器进行数据处理并把数据写入到后端物理磁盘中。 The core network storage system is a memory controller (StorageController), data processing and memory controller writes data to the back-end physical disk.

[0004] 为提高写物理磁盘的性能以及提供数据冗余性,存储控制器通常支持独立磁盘冗余阵列(RAID,也可称为RAID阵列,或者简称为阵列)技术,RAID技术是一种把多块独立的物理磁盘按不同的方式组合起来形成一个磁盘组,从而提供比单个磁盘更高的存储性能及 [0004] In order to improve the physical disk write performance and provide data redundancy, the memory controller typically redundant array of independent disks (RAID, may also be referred to as a RAID array, or simply array) technology, the technology is a RAID a plurality of independent physical disks in different ways to form a disk group, thereby providing a higher performance than a single disk storage and

可靠性。 reliability.

[0005] 根据不同的数据组织方式,常用的RAID包括RAIDO、RAIDl、RAID5、RAID6、RAIDlO 等。 [0005] Depending on the organization of the data, including conventional RAID RAIDO, RAIDl, RAID5, RAID6, RAIDlO like. 根据RAID级别的不同可以提供各种级别的性能和可靠度,可以保证多数情况下,一个或者多个磁盘故障时可以通过剩余成员磁盘中的数据采用RAID级别对应的算法恢复出错磁盘的数据,即保证数据不丢失。 The various RAID levels may provide different levels of performance and reliability can be guaranteed in most cases, the data may be used or a RAID level corresponding to the error recovery algorithm data disk by the remaining members of the plurality of disks in a disk failure, i.e., to ensure that data is not lost. 通过这种算法可重构故障盘中的数据并写入到热备盘中, 重构完成后热备盘做成阵列的成员磁盘,恢复阵列的冗余性和可靠性,即通常所说的RAID 阵列重建。 By this algorithm may reconstruct the data written to disk failure and hot spare disk, the rebuild completed after hot spare disk member made of an array, to restore redundancy and reliability of the array, known as RAID array rebuild.

[0006] 在传统的网络存储系统中,当某项应用需要一部分存储空间的时候,往往是预先从后端存储系统中划分出一部分足够大的空间预先分配给该项应用,分配空间时必须要考虑业务扩容的需求,以及业务数据量膨胀的需求,综合考虑各种因素的后果是,逻辑资源(LUN)的大小远远大于当前实际需要的存储空间,将导致LUN中只有少量的空间存放用户数据,大量的空间是闲置的。 [0006] In the conventional network storage system, when an application requires a portion of memory space, often divided in advance a portion of pre-allocated space large enough for the application must allocate space when the storage system from the backend consider business expansion needs, as well as expansion of data traffic demand, considering the consequences of various factors, the size of the logical resource (LUN) is far greater than the current actual storage space needed, LUN will result in only a small amount of space to store the user data, lots of space is unused. 在这种情况下,一方面,用户的投资回报率降低;另一方面,存储空间变大,重建的概率也将变大。 In this case, on the one hand, the user's return on investment reduced; on the other hand, the storage space is increased, the probability that the reconstruction will be larger. 在重建过程中,如果再有其他数据磁盘损坏,则将会导致数据丢失。 In the reconstruction process, and then if there are other data disk is damaged, it will cause data loss. 另外,在重建过程中,重建IO将占用系统资源,将影响读写业务的性能。 In addition, in the reconstruction process, rebuilding the IO system resources, it will affect the performance of read and write operations. 重建的效率以及重建的性能,成为影响存储系统可靠性的关键因素。 Efficiency of reconstruction and rebuilding of the performance, a key factor affecting the reliability of the storage system.

[0007] 自动精简配置是网络存储系统中常见的功能特性,其目的是解决前面提到的存储过量供给问题,根据实际的需求来分配存储空间。 [0007] Thin provisioning is a common feature of the network storage system, which aims to solve the aforementioned storing excess supply problem, to allocate memory space according to the actual needs. 其核心原理是“欺骗”客户端操作系统, 让客户端操作系统认为已经分配了很大LUN,比如客户端操作系统看到一个2TB的LUN,而实际上存储设备上只为这个资源分配了几十或者几百GB的物理空间,其余空间都是虚拟出来的。 The core principle is to "cheat" the client operating system that allows the client operating system thought to have been allocated a large LUN, such as the client operating system sees a 2TB LUN, and in fact on the storage device allocated only for the few resources tens or hundreds of GB of physical space, and the rest are out of the virtual space. 随着应用程序写入越来越多的数据,物理存储利用率也会越来越高,当实际分配的物理空间不足时,再分配额外的物理空间,达到随需扩展的目的。 With the application writes more and more data, physical storage utilization will be higher, when the lack of actual physical space allocation, redistribution of additional physical space, to expand on demand purposes. [0008] 主机(通常是各种服务器)识别LUN时,其所看到的并不是真实空间,而是由自动精简配置虚拟出来的空间,真实分配的物理空间取决于资源分配策略,可能只有总空间的四分之一,甚至更少。 [0008] host (usually a variety of servers) identifying LUN, which they see is not real space, but out of the virtual space by a thin provisioning, real physical space allocated depends on resource allocation strategy may be total a quarter of the space, or even less.

[0009] 创建一个启用自动精简配置的LUN时,需要指定LUN总容量、LUN预分配物理空间大小以及占用的RAID、LUN对应的物理空间扩容策略。 When the [0009] create a thinly provisioned LUN is enabled, you need to specify the total capacity of the LUN, LUN pre-allocate physical space occupied and RAID, LUN physical space corresponding expansion strategy. LUN总容量是指客户端看到的LUN大小,LUN预分配物理空间大小是指创建LUN时实际占用的物理空间大小,LUN物理空间扩容策略是指LUN物理空间扩容的触发条件以及扩容策略,比如LUN预分配物理空间使用率达到80%时触发扩容,每次扩容的步长是LUN总容量的5%。 The total capacity of the LUN refers the client to see the LUN size, LUN pre-allocated physical size is the size when creating the physical space occupied by the actual LUN, LUN physical space expansion strategy is the trigger condition and expansion strategy LUN expansion of physical space, such as when the pre-allocated LUN expansion trigger physical space utilization rate of 80% per expansion step is 5% LUN total capacity. 系统在指定的RAID上根据LUN 预分配物理空间大小分配资源,并创建LUN的段表,标识LUN和RAID的对应关系,同时修改RAID的段表,标识这些段已经使用。 Pre-allocation system in accordance with the specified RAID LUN size resource allocation of physical space, and creating a LUN segment table, and identifies the corresponding relationship between the RAID LUN, the RAID while modifying segment table, have been used to identify these segments.

[0010] 因为启用了自动精简配置的LUN实际分配的物理空间和客户端看到的总空间不对等,因此还需要维护一个专门的LUN线性表,用于记录LUN线性空间和RAID实际物理空间的对应关系。 [0010] Since activation of the physical space LUN thin provisioning clients actually allocated and the total space seen unequal, it is also necessary to maintain a specific LUN linear table for recording the actual RAID LUN linear space and physical space correspondence. 当LUN上收到一个IO写请求时,先从预分配的物理空间中分配应用要访问的空闲空间,修改LUN线性表,写入数据。 When receiving an IO write request on the LUN, start pre-allocated physical space allocation of free space applications to access, modify LUN linear table, writing of data. 当LUN上收到一个IO读请求时,如果LUN线性表中有对应的物理空间,直接访问,如果没有,则直接返回全0。 When an IO read request is received on the LUN, if there is physical space corresponding to the LUN linear table, direct access, if not, the process directly returns all zeros.

[0011] 自动精简配置最显著的特点是可以根据当前业务的实际需求分配存储空间,总存储空间变小,需要重建的空间也随之变小,即从最小化存储空间的角度降低重建的风险。 [0011] Thin The most notable feature of the configuration is according to the current actual demand allocation of storage space services, total storage space becomes smaller, the need to rebuild the space will be smaller, i.e. reduced risk reconstructed from the perspective of minimizing storage space . 然而自动精简配置实现复杂,引入的一个显而易见的问题是降低了性能,LUN维护段表和线性分布表,每一个IO都需要查找段表和线性分布表,以找到对应的物理空间,数据通道处理流程加长,性能变差。 However, thin provisioning implementation complexity, an obvious question is introduced is to reduce the performance, LUNs maintenance segment tables and linear distribution table, each IO will need to find a segment table and a linear distribution table to find the physical space corresponding to the data channel processing longer flow, performance deteriorated. 因此,自动精简配置不适用于性能要求较高、可靠性要求较高、但是对成本控制松散的用户。 Thus, thin provisioning requirements is not available for higher performance, higher reliability, cost control, but loose user.

[0012] 现有技术中,对重建优化的另外一个思路是仅重建RAID中已经分配的空间,根据RAID记录的分配信息,重建已经分配的区域,以此来减少RAID重建的任务量,避免重建过程中做无用功,从而缩短重建过程需要的时间。 [0012] In the prior art, for a further optimization idea of ​​reconstruction is only space RAID rebuild has been assigned, the assignment information according to the recording RAID reconstruction area has been allocated, in order to reduce the amount of the RAID rebuild tasks, to avoid rebuilding the process of doing useful work, thus shortening the time required for the reconstruction process. 然而如前所述,通常存储系统中逻辑资源的大小远远大于当前实际需要的存储空间,这将导致逻辑资源中只有少量的空间存放用户数据,大量的空间是闲置的。 However as mentioned, the storage system is generally much larger than the size of the logical storage space resources of the current actually required, which will result in only a small amount of logic resources in the user data storage space, a large amount of unused space. 显而易见,仅重建RAID中已分配的区域并不是最优的解决方案。 Obviously, only the areas of the RAID reconstruction has been allocated is not the best solution. 即不能最大程度地规避对性能要求较高、可靠性要求较高但对成本控制松散的用户所面临的重建过程中数据丢失的风险。 That is not averse to maximize the high performance requirements, high reliability requirements but rebuilding costs for the user loose control of the process of facing the risk of data loss.

发明内容 SUMMARY

[0013] 有鉴于此,本发明提供一种RAID阵列重建装置,用于执行网络存储系统内的RAID 阵列重建操作,其中所述RAID阵列被预先划分为大小相同物理块,该装置包括: [0013] Accordingly, the present invention provides a RAID array reconstruction means for performing a RAID rebuild operation of the array storage system in the network, where the RAID array is previously divided into the same physical block size, the apparatus comprising:

[0014] 资源分配单元,用于在创建逻辑资源时为逻辑资源分配一个或多个物理块,并记录逻辑资源与物理块之间的对应关系; [0014] resource allocation means for allocating one or more physical blocks logical resources when creating a logical resource, and record the correspondence relation between the logical and physical resource blocks;

[0015] 访问记录单元,用于维护一个资源访问记录表,该资源访问记录表用于记录每一个物理块是否被写入了数据;其中该访问记录单元在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入,并在所述逻辑资源被删除时,将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入;以及 [0015] access the recording unit, for maintaining a record table resource access, access to the resource record table for recording whether each physical block is written to the data; wherein the access to the recording unit block when data is written to the physical resource access record table status flag of the physical block is written, and when said logical resource is deleted, the status of the physical resource blocks access to the record corresponding to the logical resource is marked as unwritten; and

[0016] 重建处理单元,用于在重建RAID阵列时根据所述资源访问记录表获取状态为已CN [0016] The reconstruction processing unit for reconstructing RAID array access record table according to the resource status acquiring as CN

写入的物理块,以物理块为单元对状态为已写入的物理块进行重建。 Physical block is written to the physical block reconstruction unit state of a physical block has been written.

[0017] 本发明还提一种RAID阵列重建方法,用于执行网络存储系统内的RAID阵列重建操作,其中所述RAID阵列被预先划分为大小相同物理块,该方法包括: [0017] The present invention also provides one kind of array RAID reconstruction method for performing a RAID rebuild operation of the array storage system in the network, where the RAID array is previously divided into the same physical block size, the method comprising:

[0018] A、在创建逻辑资源时为逻辑资源分配一个或多个物理块,并记录逻辑资源与物理块之间的对应关系; [0018] A, logical resource allocating one or more physical block when creating a logical resource, and record the correspondence relation between the logical and physical resource blocks;

[0019] B、维护一个资源访问记录表,该资源访问记录表用于记录每一个物理块是否被写入了数据;并在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入, 并在所述逻辑资源被删除时,将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入;以及 [0019] B, maintains a resource access record table, access to the resource record table for recording whether each physical data block is written; and when data is written to the physical block of the physical block access resource record table state flag is written, and when said logical resource is deleted, the status of the physical resource blocks access to the record corresponding to the logical resource is marked as unwritten; and

[0020] C、在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块,以物理块为单元对状态为已写入的物理块进行重建。 [0020] C, when reconstructing acquired RAID array access record table according to the resource status of a physical block has been written to the physical block reconstruction unit state of a physical block has been written.

[0021] 由于本发明在重建过程中仅仅重建实际被使用的物理空间,因此相较于现有技术大幅度提高了重建的效率与速度,有效避免了重建所引发的数据丢失等风险,并且对于正常的数据读写业务影响很低。 [0021] Since the present invention only reconstruct the physical space are actually used in the reconstruction process, compared to the prior art thus greatly improving the efficiency and speed of the reconstruction, the reconstruction avoid data loss caused by risk, and for normal data read and write operations is very low impact.

附图说明 BRIEF DESCRIPTION

[0022] 图1是本发明网络存储设备逻辑原理图。 [0022] FIG. 1 is a schematic diagram of the logic network storage device according to the present invention.

[0023] 图2是本发明一种实施方式中数据写入流程处理图。 [0023] FIG 2 is an embodiment of the present invention, the data write flow process embodiment of FIG.

[0024] 图3是本发明一种实施方式中逻辑资源删除流程处理图。 [0024] FIG. 3 is a logical resource deletion procedure described process embodiment of the present invention FIG.

[0025] 图4是本发明一种实施方式中阵列重建流程处理图。 [0025] FIG. 4 is a process flow of reconstruction array embodiment of FIG embodiment of the present invention.

具体实施方式 Detailed ways

[0026] 总体上来说,本发明在现有的数据流处理和RAID重建管理的基础上,引入了资源访问记录表,跟踪数据写入情况,仅重建已写入数据的区域,从而实现最大程度地减少了需要重建的任务量,提高重建的效率,减少重建的时间,降低重建对读写业务性能的影响。 [0026] In general, the present invention is based on the existing data stream processing and management of the RAID rebuild introduced resource access record table, the trace data is written, the reconstruction region is only the data has been written to achieve maximum reduces the amount of tasks need to rebuild and improve the efficiency of the reconstruction, rebuilding reduce the time and reduce the impact on the reconstruction of read and write performance of the business.

[0027] 在本发明中需要维护资源访问记录表,可以基于RAID条带记录,也可以基于固定长度的RAID资源块(请参考本申请人先前申请的相关专利)记录,取决于具体的实现,以下以统一称为物理块(Block),一个Block表示特定长度的存储空间。 [0027] In the present invention, the need to maintain access to the resource record table, based on RAID stripe record, RAID may be based on the resource block of a fixed length (see the present applicant previously filed patents) records, depending on the particular implementation, hereinafter referred to as a unified physical block (block), a block indicates the length of the particular storage space. 资源访问记录表可以是任何格式的结构,也可以位于系统任意可实现的位置,主要取决于对系统性能和空间的需求。 Resource access record table may be structured in any format, may be located in any position of the system can be achieved, depending on the system performance and the need for space. 比如说,放入更底层的位置实现,会提升性能,但实现复杂度可能略高,反之则性能一般,但实现容易。 For example, the lower-level position into implemented, will improve performance, but the implementation complexity may be slightly higher, and vice versa performance in general, but it is easy to achieve. 在一种实施方式中,可以在存储系统的RAID模块这层面来维护资源访问记录表。 In one embodiment, it is possible to maintain resource access table records in this system-level RAID storage module. 另外,为了提高检索效率并减少资源访问记录表占用的系统资源,可以采用bitmap 方式进行记录,一个bit对应一个Block,比如bit为1表示对应的Block已写入数据;bit 为0 :表示对应的Block上未写入数据。 In order to improve search efficiency and reduce system resources occupied by the resource access record table may be used for recording bitmap mode, a bit corresponding to a Block, such as the bit corresponding to Block 1 represents the written data; bit 0: indicates that the corresponding data is not written on the Block.

[0028] 请参考图1,图2以及图3。 [0028] Please refer to FIG 1, FIG 2 and FIG 3. 本发明RAID阵列重建装置20应用于网络存储系统10 之中,该网络存储系统10进一步包括读写业务处理装置30,所述RAID阵列重建装置20包括资源分配单元22、访问记录单元M以及重建处理单元26。 The present invention is applicable to RAID array reconstruction device 20 into the network storage system 10, the network storage system 10 further includes a reader service processing apparatus 30, the apparatus 20 includes a RAID array reconstruction resource allocation unit 22, access to the recording unit M and the reconstruction process unit 26. 以上所述的装置是从逻辑层面抽象而成的,典型的方式是通过处理器加上程序代码来实现,但同样可以通过硬件、固件或者软硬结合的方式来实现。 The above-described means is formed from a logical level of abstraction, the typical approach is to add the program code implemented by the processor, but can also be implemented by hardware, firmware, or a combination of hardware and software embodiment. 以下描述上述RAID阵列重建装置20的一般处理流程。 RAID array described above the following general process flow of reconstruction means 20.

6[0029] 步骤101,在创建逻辑资源时为逻辑资源分配一个或多个物理块,并记录逻辑资源与物理块之间的对应关系;本步骤由资源分配单元22执行。 6 [0029] Step 101, when creating a logical resource assigned to a logic resource or a plurality of physical blocks, and recording correspondence relationship between the logical and physical resource block; this step is performed by the resource allocation unit 22.

[0030] 在本发明中,RAID阵列被预先划分为大小相同的物理块,典型的物理块可以是条带(Mripe),也可以是本申请人在先前专利申请中提出的固定长度的资源块;本发明并不关心条带或者资源块的划分方法以及实际大小;其本质是将RAID阵列的物理资源进行分块,本发明在此将其统称为物理块。 [0030] In the present invention, RAID arrays is previously divided into the same physical block size, physical blocks may be a typical strip (Mripe), resource blocks may be a fixed length in the present applicant's previous patent application ; the present invention is not concerned with the strip or dividing method and the actual size of the resource blocks; physical resource which is essentially divided into blocks of a RAID array, the present invention is herein referred to collectively as the physical block.

[0031] 网络存储系统在创建逻辑资源的时候,从RAID阵列中挑选出一个或者多个物理块分配给该逻辑资源,这样一来逻辑资源会有一个或者多个对应的物理块,资源分配单元22将这种对应关系记录下来以备后续处理使用。 [0031] The network storage system when creating logical resources, or a selected plurality of physical blocks allocated to the logical resources, so there will be one or more logical resource corresponding to a physical block, resource allocation unit from the RAID array 22 such a correspondence relationship recorded for subsequent use process.

[0032] 需要说明的是,网络存储系统10可能支持自动精简配置技术,用户一旦使能自动精简配置技术,逻辑资源的大小很可能与物理块的总和不一致,因此这里所说的对应关系是指物理块被分配给哪个逻辑资源,并不关心逻辑资源与物理资源之间大小是否对应的问题。 [0032] Note that, the network storage system 10 may support thin provisioning, to enable the user once the thin provisioning, the size of the logical resources may be different and the sum of the physical block, the corresponding relationship is referred to herein means a physical block is assigned to which logical resources, the problem is not concerned between logical resources and the physical resources corresponding to the size.

[0033] 步骤102,维护一个资源访问记录表,该资源访问记录表用于记录每一个物理块是否被写入了数据;在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入,并在所述逻辑资源被删除时,将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入;本步骤由访问记录单元M执行。 [0033] Step 102, a resource access record table maintenance, access to the resource record table for recording whether each physical block is written in the transactions; When data is written to the physical block of the physical block access to the resource record table state flag is written, and when said logical resource is deleted, the status of the physical resource blocks access to the record corresponding to the logical resource is marked as unwritten; this step is executed by accessing the recording unit M.

[0034] 如前所述资源访问记录表比较典型有效的实现方式是Bitmap,其是一种在网络存储领域非常流行的技术,在此不在对其进行详细描述。 [0034] As previously described resource access record table typical implementation is effective Bitmap, which is a very popular in the field of network storage technology, this is not described in detail. 资源访问记录表中各个物理块的更新处理可以与读写业务装置的处理串行设置也可以与读写业务并行设置。 Resource access record table update processing for each physical block may be provided in parallel with the serial read and write operations and processing means arranged to read and write operations.

[0035] 对于串行处理来说,在读写业务处理装置(通常为RAID业务处理模块)收到来自逻辑资源或者网络存储系统内应用程序写入数据的写命令时,其可以按照一般的写处理流程去处理,如果数据写入是成功的,则可以转入步骤102进行处理,否则返回,比如提示写入端写入操作失败。 When [0035] For serial processing, in the business processing apparatus reader (typically RAID service processing module) receives a write command or logic resources within the network storage system to write data from the application program, which may be written according to the general to handle the processing flow, if the data writing is successful, the process can proceeds to step 102, otherwise, such as the writing end of the writing operation failed prompt.

[0036] 对于并行处理来说,步骤102与读写业务装置对写命令的处理是并行的,即便写命令最后的处理结果是失败,步骤102仍然有可能会将物理块的状态标记为已写入。 [0036] For parallel processing, a step 102 and read-write operations write command processing means in parallel, and final processing results in a failure even when a write command, step 102 might still physical block status flag written as into. 这样做的结果是资源访问记录表中对该物理块的状态记录可能与物理块的实际状态不符合。 The result of this is that resource access record table records the state of the physical block may not match the actual state of the physical block. 然而这样做的效率较高,可以提高业务流程的处理速度。 However, the higher efficiency of doing so can improve the processing speed of business processes. 而且即使出现不符合的状况,所引发的问题仅仅是后续重建工作有少许的增加,也就是说重建了一部分不应该重建的物理块。 And even if the situation does not comply with the emergence of problems caused by the subsequent reconstruction work is only slightly increased, that is part of the physical reconstruction of the block should not be rebuilt.

[0037] 用户可能会删除已经创建的逻辑资源,在删除逻辑资源之前,访问记录单元M可以先获得将逻辑资源所占用的对应的物理块,然后在资源访问记录表中把这些物理块的状态变更为未写入,执行逻辑资源的删除然后再释放掉物理空间进而完成整个逻辑资源删除的操作。 [0037] The user may delete a logical resource has been created, before deleting the logical resources, access to the recording unit M can first obtain the physical block corresponding to the logical resources occupied, and the states of the physical blocks in the resource access record table changed to not write, delete and then perform logical resources to relieve the physical space and then complete the entire logical resource delete operation. 逻辑资源被删除后,资源访问记录表需要做出相应的更新,因为后续的重建是依照逻辑资源记录表中对各个物理块状态的记录展开的,及时更新可以确保重建的范围限于被业务或者应用层面真实占用的各个物理块。 After the logical resource is deleted, the resource needs to access record table updated accordingly, as is the subsequent reconstruction in accordance with the logical resource record table record for each physical block unfolded state, to ensure the reconstruction to update limit the scope of the application or service various aspects of real physical block occupied.

[0038] 进一步来说,在初始的时候,很显然所有物理块的状态皆为未写入数据。 [0038] Further, at the initial time, it is clear that the state of all the physical data blocks are all not written. 然而物理块可能会反复地被写入数据,所以在数据写入物理块时,可以先看看物理块的状态是否已经是已写入,如果是返回,否则继续。 However, the physical block may be repeatedly written data, when data is written to a physical block, you can first see if the state of the physical block already been written, if it is returned, otherwise continue. 相当于在首次写入时进行状态更改,后续则跳过。 Is equivalent to status changes when first written, the follow-up is skipped. 对上述操作的优化方案是,在每次写入数据到物理块时直接更新该物理块为已写入,可以减去读出和判断的过程,从而提高效率。 Optimization of the above-described operation is directly update the physical block each time data is written to the physical block has been written, and may be subtracted during the read out of the determination, thereby improving efficiency.

[0039] 进一步来说,在资源访问记录表可能会呈现出无效的状态,比如Bitmap中无法置位,或者说更新不能不成功。 [0039] Further, in the resource access record table may appear to be invalid status, such as Bitmap can not be set, or can not update successfully. 此时为了更为严谨的考虑,可以在标记物理块状态之前,检查所述资源访问记录表是否有效,如果是则更新资源访问记录表,否则不更新并转入其他不同的重建处理单元,比如说转入一个现有的软件/硬件实现的重建处理单元去执行常见的重建处理流程中去。 For a more rigorous consideration of this case, you can be labeled prior to the physical block status, checking the resource access record table is valid, and if access to the resource record table is updated, or updated and not transferred to other different reconstruction processing unit, such as He said reconstruction processing unit into an existing software / hardware implementation to perform common to the reconstruction process flow. 如此一来可以确保充分利用已有的重建处理单元作为备份,提高重建处理的可靠性。 Thus it can be ensured fully utilize the existing reconstruction processing unit as a backup, to improve the reliability of reconstruction processing. 相应地,可以在更新所述物理块状态时检查是否更新成功,如果更新成功则继续,否则将所述资源访问记录表标记为无效。 Accordingly, updating can be checked whether the status of the physical block is successfully updated, if the update is successful, else the resource access record table marked invalid.

[0040] 步骤103,在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块,以物理块为单元对状态为已写入的物理块进行重建。 [0040] Step 103, when reconstructing acquired RAID array state, physical state block reconstruction unit physical block has been written into a physical block is written in accordance with the resource access record table. 本步骤有重建处理单元沈执行。 This step of performing reconstruction processing unit sink.

[0041] 请参考图4,RAID阵列处于降级状态时,系统可以提示管理员进行重建,也可能是立刻触发重建,一旦系统发现有可用的热备盘,系统即可开始重建操作。 [0041] Please refer to FIG 4, when the RAID array is in a degraded state, the system may prompt the administrator for the reconstruction, the reconstruction may be triggered immediately, once the system has found a hot spare is available, the system can start a rebuild operation. 通常重建是根据RAID阵列的级别使用相应的校验算法计算出损坏磁盘上的数据后写入到相应的热备盘中去。 Reconstruction is typically calculate the corresponding hot spare write data to the disk using the corresponding damaged check algorithm according to the level of the RAID array. 本发明获取到的物理块均是有数据的物理块,因此重建工作显得非常有意义,省去了大量的没有必要的重建工作。 The present invention is acquired physical blocks are physical blocks of data, so it is very meaningful reconstruction, reconstruction save a lot of unnecessary. 相对于现有技术中重建已经分配的物理资源的实现方式,本发明重建已经被写入数据的物理块,效率大大提升。 With respect to the implementation of the prior art physical resource allocation has been reconstructed, the reconstruction according to the present invention has been written to the physical blocks of data, greatly enhance the efficiency. 因为往往已经分配的物理资源很多都没有实际写入数据。 Since it is often physical resources have been allocated, are not really a lot of data is written.

[0042] 重建处理单元沈还可以做进一步各种优化处理。 [0042] Shen reconstruction processing unit may be further processed various optimization. 在重建过程中如果新数据写入到物理块时,重建处理单元沈可以将新数据同时写入数据盘和热备盘中。 If the new data is written to the physical block, the reconstruction processing unit may sink new data is written to the disk and the hot spare data in the reconstruction process. 这样做的好处是重建过程中业务数据的写入与重建进程互不影响,相当于业务写入的时完成了重建,因此重建处理单元不需要再单独重建这部分新写入的数据。 The benefit of this is written in the process of rebuilding and reconstruction process business data independently of each other, when the completion of the reconstruction of the equivalent of business written, so the reconstruction processing unit does not need to rebuild this part of the newly written data alone.

[0043] 从实现上来说,有两种方式。 [0043] From the method, there are two ways. 比如说,在重建开始的时候,重建处理单元沈可以根据当前的资源访问记录表先生成一个重建列表(当前状态为已写入的物理块所构成的列表),按照重建列表顺序对各个物理块进行重建。 For example, when the start of the reconstruction, the reconstruction processing unit sink may be reconstructed into a list (the current state is already written physical block constituted) according to Mr. current resource access record table, in accordance with the reconstruction of the respective physical block list order reconstruction.

[0044] 当然也可以直接根据资源访问记录表进行重建,重建处理单元沈每次取N个(N 为自然数)状态为已写入的物理块进行重建,直到所有物理块重建完成为止。 [0044] Of course, can also be accessed directly according to the resource record table reconstruction, the reconstruction processing unit time to take Shen N (N is a natural number) state reconstruction of a physical block has been written, the reconstruction until all the physical blocks is completed.

[0045] 更进一步,当重建过程中有新数据写入时,根据步骤102的方式,访问记录单元24会将该物理块的状态更新为已写入,这样可以确保资源访问记录表的完整性和实时性。 [0045] Still further, when the reconstruction process, new data is written, according to the embodiment of step 102, access to the recording unit 24 will update the state of the physical block is written, so that to ensure the integrity of the resource access record table and real-time. 假设重建过程中热备盘出现了故障,用另外的热备盘替换,由于资源访问记录表是完整的(或者说是及时更新了的),因此再次重建的过程不会受到任何影响。 Assuming that the reconstruction process in the hot spare fails, with additional hot spare replaced, due to resource access record table is complete (or is timely updated), so the rebuilding process again will not be affected.

[0046] 以上所述仅仅为本发明较佳的实现方式,任何基于本发明精神所做出的等同的修改皆应涵盖于本发明的权利要求范围中。 [0046] The above description is only a preferred implementation of the present invention, any equivalent modifications made based on the spirit of the present invention as claimed are intended to be included in the scope of the claimed invention.

Claims (10)

1. 一种RAID阵列重建装置,用于执行网络存储系统内的RAID阵列重建操作,其中所述RAID阵列被预先划分为大小相同物理块,该装置包括:资源分配单元,用于在创建逻辑资源时为逻辑资源分配一个或多个物理块,并记录逻辑资源与物理块之间的对应关系;访问记录单元,用于维护一个资源访问记录表,该资源访问记录表用于记录每一个物理块是否被写入了数据;其中该访问记录单元在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入,并在所述逻辑资源被删除时,将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入;以及重建处理单元,用于在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块,以物理块为单元对状态为已写入的物理块进行重建。 A RAID array reconstruction means for performing a RAID rebuild operation of the array storage system in the network, where the RAID array is previously divided into the same physical block size, the apparatus comprising: a resource assignment unit configured to create logical resources when the logical resource allocating one or more physical blocks, and recording correspondence relationship between the logical and physical resource blocks; accessing a recording unit, for maintaining a record table resource access, access to the resource record table for each of the physical blocks are recorded whether the data has been written; wherein the recording means to access the resource record table access state of the physical block is marked as written when data is written to the physical block, and said logical resource is deleted, the resource access recording the state of the physical blocks corresponding to logical resource is marked as unwritten; and a reconstruction processing unit for access to the resource record table when reconstructing RAID array status acquired physical block has been written to the physical block reconstruction unit state of a physical block has been written.
2.根据权利要求1所述的重建装置,其特征在于,所述重建处理单元进一步用于在发现重建过程中有新数据写入状态为未写入的物理块时,将需要写入故障磁盘的数据写入该故障磁盘以及对应的热备磁盘。 2. The reconstruction device according to claim 1, characterized in that, the reconstruction processing unit is further configured to, when the reconstruction process of discovery of new physical block data is written to an unwritten state, the failed disk will need to be written data written to the failed disk and the corresponding hot spare disk.
3.根据权利要求1所述的重建装置,其特征在于,所述网络存储系统还包括读写业务处理装置,其用于处理数据读写命令,如果处理失败则返回,如果是写命令且处理成功则转入访问记录单元处理。 3. The reconstruction device according to claim 1, wherein said system further comprises a reader network storage service processing means for processing the data read and write commands, if the process fails to return, if it is a write command and the process success processing unit then transferred to access records.
4.根据权利要求3所述的重建装置,其中所述访问记录单元进一步用于在标记物理块状态之前,检查所述资源访问记录表是否有效,如果是则更新所述资源访问记录表,否则不更新并转入其他不同的重建处理单元;并在更新所述物理块状态时检查是否更新成功,如果更新成功则继续,否则将所述资源访问记录表标记为无效。 4. The reconstruction device according to claim 3, wherein the access unit is further configured to, before recording the physical block status flag, the resource access record table check is valid, and if so updating the resource access record table, or not updated and transferred to various other reconstruction processing unit; checking whether the update is successful, and when the physical block status updating, if the update is successful, else the resource access record table marked invalid.
5.根据权利要求1所述的重建装置,其中所述访问记录单元进一步用于在标记物理块状态之前获取该物理块当前状态,如果当前状态为已写入,则返回,否则继续。 The reconstruction device according to claim 1, wherein said recording unit is further configured to obtain access to the physical block of the current state of the physical block status flag before, if the current state is written, is returned, otherwise continue.
6. 一种RAID阵列重建方法,用于执行网络存储系统内的RAID阵列重建操作,其中所述RAID阵列被预先划分为大小相同物理块,该方法包括:A、在创建逻辑资源时为逻辑资源分配一个或多个物理块,并记录逻辑资源与物理块之间的对应关系;B、维护一个资源访问记录表,该资源访问记录表用于记录每一个物理块是否被写入了数据;并在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入,并在所述逻辑资源被删除时,将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入;以及C、在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块,以物理块为单元对状态为已写入的物理块进行重建。 A RAID array reconstruction method for performing a RAID rebuild operation of the array storage system in the network, where the RAID array is previously divided into the same physical block size, the method comprising: A, when creating a logical resource to logical resource allocating one or more physical blocks, and recording correspondence relationship between the logical and physical resource block; B, maintains a resource access record table, access to the resource record table for recording whether each physical block data is written; and the resource record table access state of the physical block is marked as written when data is written to a physical block, and said logical resource is deleted, the physical resource blocks access to the record corresponding to the logical resource status flag and C, in the reconstruction of RAID array state table for recording the access resource according to the physical state of the block is written to a physical block as a physical block unit written reconstruction; is not written.
7.根据权利要求6所述的方法,其特征在于,步骤C进一步包括在发现重建过程中有新数据写入状态为未写入的物理块时,将需要写入故障磁盘的数据写入该故障磁盘以及对应的热备磁盘。 7. The method according to claim 6, wherein the step C further comprises the reconstruction process has found a new data is written to unwritten physical block status will need to write the data written to the failed disk failed disk and corresponding hot spare disk.
8.根据权利要求6所述的方法,其特征在于,还包括:D、处理数据读写命令,如果处理失败则返回,如果是写命令且处理成功则转入步骤B。 8. The method according to claim 6, characterized in that, further comprising: D, processing the data read and write commands, if the process fails to return, and if the write command is successful then the process proceeds to step B.
9.根据权利要求6所述的方法,其中步骤B进一步包括:在标记物理块状态之前,检查所述资源访问记录表是否有效,如果是则更新所述资源访问记录表,否则不更新并转入其他不同的重建处理流程;并在更新所述物理块状态时检查是否更新成功,如果更新成功则继续,否则将所述资源访问记录表标记为无效。 9. The method according to claim 6, wherein the step B further comprising: prior to the physical block status flag, the resource access record table check is valid, and if so updating the resource access record table, or not updated and transferred the various other reconstruction process flow; and checks whether the update is successful when the physical block status updating, if the update is successful, else the resource access record table marked invalid.
10.根据权利要求6所述的方法,其中所述步骤B进一步包括:在更新物理块状态之前获取该物理块当前状态,如果当前状态为已写入,则返回,否则继续。 10. The method according to claim 6, wherein said step B further comprising: obtaining the current status of the physical block in the physical block before updating status, if the current state is written, is returned, otherwise continue.
CN2011104567385A 2011-12-31 2011-12-31 Method and device for reconstructing RAID (Redundant Array of Independent Disks) CN102541472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104567385A CN102541472A (en) 2011-12-31 2011-12-31 Method and device for reconstructing RAID (Redundant Array of Independent Disks)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104567385A CN102541472A (en) 2011-12-31 2011-12-31 Method and device for reconstructing RAID (Redundant Array of Independent Disks)

Publications (1)

Publication Number Publication Date
CN102541472A true CN102541472A (en) 2012-07-04

Family

ID=46348458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104567385A CN102541472A (en) 2011-12-31 2011-12-31 Method and device for reconstructing RAID (Redundant Array of Independent Disks)

Country Status (1)

Country Link
CN (1) CN102541472A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123100A (en) * 2013-04-25 2014-10-29 国际商业机器公司 Controlling data storage in an array of storage devices
CN105531677A (en) * 2013-08-27 2016-04-27 新加坡科技研究局 Raid parity stripe reconstruction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553285A (en) * 1988-04-22 1996-09-03 Amdahl Corporation File system for a plurality of storage classes
CN101510145A (en) * 2009-03-27 2009-08-19 杭州华三通信技术有限公司 Storage system management method and apparatus
JP2010272138A (en) * 2003-08-14 2010-12-02 Compellent Technologies Virtual disk drive system and method
CN102147713A (en) * 2011-02-18 2011-08-10 杭州宏杉科技有限公司 Method and device for managing network storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553285A (en) * 1988-04-22 1996-09-03 Amdahl Corporation File system for a plurality of storage classes
JP2010272138A (en) * 2003-08-14 2010-12-02 Compellent Technologies Virtual disk drive system and method
CN101510145A (en) * 2009-03-27 2009-08-19 杭州华三通信技术有限公司 Storage system management method and apparatus
CN102147713A (en) * 2011-02-18 2011-08-10 杭州宏杉科技有限公司 Method and device for managing network storage system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123100A (en) * 2013-04-25 2014-10-29 国际商业机器公司 Controlling data storage in an array of storage devices
CN104123100B (en) * 2013-04-25 2017-08-04 国际商业机器公司 Data storage control store array
CN105531677A (en) * 2013-08-27 2016-04-27 新加坡科技研究局 Raid parity stripe reconstruction

Similar Documents

Publication Publication Date Title
US9690487B2 (en) Storage apparatus and method for controlling storage apparatus
US7197598B2 (en) Apparatus and method for file level striping
US7840838B2 (en) Rapid regeneration of failed disk sector in a distributed database system
US9128855B1 (en) Flash cache partitioning
US8904129B2 (en) Method and apparatus for backup and restore in a dynamic chunk allocation storage system
US6718434B2 (en) Method and apparatus for assigning raid levels
US5657468A (en) Method and apparatus for improving performance in a reduntant array of independent disks
JP3358687B2 (en) Disk array device
CN101023412B (en) Semi-static parity distribution technique
US20030023811A1 (en) Method for managing logical volume in order to support dynamic online resizing and software raid
KR100637779B1 (en) Configuring memory for a raid storage system
US8356126B2 (en) Command-coalescing RAID controller
Holland et al. Architectures and algorithms for on-line failure recovery in redundant disk arrays
CN1327330C (en) Logical disk management method and apparatus
US20090228648A1 (en) High performance disk array rebuild
US6052759A (en) Method for organizing storage devices of unequal storage capacity and distributing data using different raid formats depending on size of rectangles containing sets of the storage devices
CN1965298B (en) Method, system, and equipment for managing parity RAID data reconstruction
US8392752B2 (en) Selective recovery and aggregation technique for two storage apparatuses of a raid
US7032070B2 (en) Method for partial data reallocation in a storage system
JP5537976B2 (en) Method and apparatus for using large capacity disk drive
JP2009505310A (en) Method and system for accessing auxiliary data in a power efficient large capacity scalable storage system
US8250335B2 (en) Method, system and computer program product for managing the storage of data
US7228381B2 (en) Storage system using fast storage device for storing redundant data
US20120192037A1 (en) Data storage systems and methods having block group error correction for repairing unrecoverable read errors
US9213612B2 (en) Method and system for a storage area network

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)