CN103051676A - Distributed data storage management method - Google Patents

Distributed data storage management method Download PDF

Info

Publication number
CN103051676A
CN103051676A CN2012104863468A CN201210486346A CN103051676A CN 103051676 A CN103051676 A CN 103051676A CN 2012104863468 A CN2012104863468 A CN 2012104863468A CN 201210486346 A CN201210486346 A CN 201210486346A CN 103051676 A CN103051676 A CN 103051676A
Authority
CN
China
Prior art keywords
data
layer
storage
management method
distributed
Prior art date
Application number
CN2012104863468A
Other languages
Chinese (zh)
Inventor
平原
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to CN2012104863468A priority Critical patent/CN103051676A/en
Publication of CN103051676A publication Critical patent/CN103051676A/en

Links

Abstract

The invention provides a distributed data storage management method, belonging to the field of computer communication. The distributed data storage management method comprises the following concrete steps: data is generated by a data source layer; the generated data enters into a data processing layer; segmentation data processed by the data processing layer enters a data transmission layer; the segmentation data is packaged into a data pocket by the data transmission layer; the data packet is transmitted to a data storage layer through an SCSI (small computer system interface) protocol or an IP (internet protocol); the received data packet is unpacked and stored by the data storage layer; the data storage layer refers to a plurality of sub storage array nodes; and a data segmentation is stored in each storage array node. With the adoption of the distributed data storage management method, compared with the prior art, continuously increased user data is stored by utilizing a distributed RAID (redundant array of inexpensive disk) storage array, so that data failure risks are effectively reduced, the data restoring time is reduced, the storage capacity is conveniently expanded, the cost is reduced, and a large amount of maintaining cost is saved.

Description

一种分布式数据存储管理方法技术领域[0001] 本发明涉及计算机通信技术领域,具体的说是一种分布式数据存储管理方法。 A distributed data storage management TECHNICAL FIELD [0001] The present invention relates to computer communication technologies, and specifically is a distributed data storage management method. 背景技术[0002] 随着互联网时代的数据爆炸式增长,普通数据中心用于存储数据的硬盘的容量在不断增长,越来越多的数据被存储在同一个硬盘或硬盘整列中,但是硬盘部件的硬件故障失效率并没有随之降低。 Background art [0002] As the data explosion of the Internet era of growth, the typical data center capacity for the storage data is growing, more and more data is on the same hard disk or entire column, but it Cunchu hard disk unit hardware failures and the failure rate is not reduced. 这意味着的单位数据的失效风险在不断加大,数据恢复的时间越来越差,用户的数据可靠性面临着越来越高的挑战。 This means that the risk of failure of the unit data continues to increase, data recovery time is getting worse, the reliability of user data is facing increasing challenges. 如何有效降低数据失效率,是计算机通信技术面临的一大难题。 How to effectively reduce the failure rate data, it is a major problem facing the computer communication technology. 发明内容[0003] 本发明的技术任务是解决现有技术的不足,提供一种分布式数据存储管理方法。 [0003] The technical task of the present invention is to solve the deficiencies of the prior art, there is provided a distributed data storage management method. [0004] 本发明的技术方案是按以下方式实现的,该一种分布式数据存储管理方法,其具体实现步骤为:O由数据源层产生数据,该数据源层即为用户的应用程序产生数据来源;2)步骤I)中产生的数据进入数据处理层,该数据处理层将文件数据进行分块处理,所述数据处理层包含EC算法控制端处理原始数据;3)数据处理层处理的分块数据进入数据传输层,该数据传输层将分块数据封装成数据包,通过SCSI协议或者IP协议把数据包传输到数据存储层内;4)所述数据存储层对接收到的数据包解包后存储,该数据存储层是指若干子存储阵列节点,每个存储阵列节点负责存一个数据分块。 [0004] aspect of the present invention are achieved in the following manner, that a distributed data storage management method, specific implementation steps of: O layer generated by the data source, the data source is the user application layer is produced data sources; 2) in step I) into the data generated in the data processing layer, the processing layer data file into blocks of data, the data processing algorithm EC layer comprising a control terminal processing the raw data; 3) data processing layer processing block layer data enters the data transfer, the data transmission block layer data is encapsulated into packets or IP protocol by the SCSI protocol to transfer the data packets into the data storage layer; 4) of the data storage layer received data packet unpacking memory, the data storage layer is a plurality of sub-memory arrays of nodes, each node is responsible for memory storage array a block of data. [0005] 上述技术方案中,通过相关算法将传统架构中存储在统一物理位置中数据进行分散管理,通过erasure code算法将文件数据进行分块处理,再通过SCSI或者IP数据包将处理过的分块数据分布存储于多个子阵列节点中。 [0005] In the above technical solution, the conventional architecture is stored by correlation algorithm in the data combining physical location of decentralized management, file data into blocks by erasure code algorithm, then by SCSI or IP packet the treated fraction a plurality of block data stored in the distributed nodes in the sub-arrays. [0006] 所述数据存储层的子存储节点之间在物理上相对独立,即每个子存储阵列节点拥有自己的数据冗余保护方法,可独立的进行数据保护、数据备份、数据恢复。 Between the [0006] sub-memory data storage layer opposite node physically separate, i.e. each child node has its own storage array data redundancy protection method, independently of the data protection, data backup, data restoration. 该技术方案中的分块数据存储在多个子阵列节点,各子阵列节点只负责存储其相对应的数据,同时各子阵列节点在物理上相对隔离独立。 Storing the data block in the aspect nodes plurality of sub-arrays, each sub-array storage node is responsible only for its corresponding data, while each of the nodes in the sub-array are physically independent relative isolation. 通过算法生成的分块数据的索引数据将子阵列节点进行逻辑顺序、逻辑容量管理。 Index data generated by the algorithm of block data subarray nodes logical sequence, logical capacity management. 各子阵列都具有一定的数据保护及恢复能力,可以有效降低数据失效率。 Each sub-array have some protection and recovery of data, the data can be effectively reduced failure rate. [0007] 本专利提出的配置方式,子阵列节点可配置为不同的容量、不同的RAID级别(例如RAID0、RAID1、RAID5等),不同的子存储阵列节点都是按照数据数量的重要程度及规模大小的进行数据阵列的配置,因此可以保障服务器硬件配置的成本的优化,能够充分发挥出各个存储于多个子阵列节点的应用优势,从而有效的降低了数据失效风险,也实现成本的优化。 [0007] The arrangement proposed by the present patent, the node may be configured to sub-array of different capacities, different RAID levels (e.g. RAID0, RAID1, RAID5, etc.), different sub-node storage arrays are in accordance with the degree of importance of the number and size of the data the size of the configuration data array, it is possible to optimize the cost of the security server hardware configuration can be full advantage of an application stored in the respective nodes of a plurality of sub-arrays, which effectively reduces the risk of failure data, but also to achieve cost optimization. [0008] 所述数据处理层还包括`Meta Data服务器,该Meta Data服务器将索引数据备份记录。 [0008] The data processing layer further comprises a `Meta Data Server, the Meta Data Server backup records index data. [0009] 本发明与现有技术相比所产生的有益效果是:本发明的一种分布式数据存储管理方法针对不断增长的数据失效率风险的威胁而设计,通过数据分布管理,将原先存储在一个RAID阵列中的数据通分发到多个RAID阵列中, 利用分布式的RAID存储阵列存储不断增长的用户数据,有效降低数据失效风险,降低数据恢复时间,并且便于存储容量的扩展,实现成本降低,节省大量维护成本。 [0009] the beneficial effects of the present invention and the prior art is produced compared: a distributed data storage management method of the present invention are directed to the growing threat of failure rate data and design risk management through data distribution, and the original store data in a RAID array through the plurality of distributed RAID array, the use of distributed RAID storage array storage growing user data, effectively reduce the risk of failure data, reduce the data recovery time, and easy to expand storage capacity, to achieve cost reduce, save a lot of maintenance costs. 附图说明[0010] 附图1是本发明的数据存储分层式示意图。 BRIEF DESCRIPTION [0010] Figure 1 is a data storage according to the present invention is layered FIG. 具体实施方式[0011] 下面结合附图对本发明的一种分布式数据存储管理方法作以下详细说明。 DETAILED DESCRIPTION [0011] The following detailed description of the accompanying drawings the following a distributed data storage management method of the present invention binds. [0012] 如附图1所示,该一种分布式数据存储管理方法,其由四部分组成,具体为:数据源层:用户的应用程序产生数据来源。 [0012] As shown in Figure 1, the one distributed data storage management method, which consists of four parts, specifically: a data source layer: a user application produces data sources. [0013] 数据处理层:包含EC算法控制端处理原始数据,Meta Data服务端备份索引数据。 [0013] The data processing layer: EC algorithm control terminal comprises processing the raw data, Meta Data backup index data server. [0014] 数据传输层:通过SCSI/IP协议将分块处理过的原始数据、校验数据分发到各子存储阵列节点。 [0014] Data Transport Layer: The SCSI / IP protocol block processed raw data, parity data is distributed to each of the sub memory array node. [0015] 数据存储层:通过各子存储阵列节点分别存储接收到的相应数据块,完成数据存储保护。 [0015] Data storage layer: the sub-memory arrays through the respective storage nodes respective data blocks are received, to complete the data storage protection. [0016] 其具体实现步骤为:1、根据用户实际应用环境,即存储子阵列的数量、容量大小等存储类型生成相关算法参数。 [0016] The specific steps are implemented: 1, according to the actual application environment, i.e. the number, the size of the storage capacity of the memory sub-arrays and other types of generating a correlation algorithm parameters. 根据相关参数,配置修改控制服务器端数据处理层的erasure code (EC)算法处理分块算法。 The relevant parameters to modify the configuration data erasure code control server treatment layer (EC) algorithm processing block algorithm. [0017] 2、在控制服务器数据源层即通过用户应用程序获得要存储的原始数据,通过EC 算法处理,将原始数据处理为分块数据、校验数据及索引因数据。 [0017] 2, the control server obtains the original data source layer, i.e., data to be stored by the user application process by EC algorithm, the raw data processed data divided into blocks, because the index data and parity data. [0018] 3、算法设置冗余保护数量,例如可将原始数据分为M个数据分块,K个校验数据块,共N个数据块(M=N+K)。 [0018] 3, the number of redundant protection algorithm is provided, for example, the original data is divided into M data block, K parity blocks, a total of N blocks of data (M = N + K). 即可做到M中有任意N个数据块即可恢复原始数据,编码率为N/(N+M)。 M has to do any of the N data blocks to recover the original data, the coding rate is N / (N + M). [0019] 4、控制服务器的数据传输层将处理后的分块数据、校验数据进行数据封装成数据包,通过SCSI或者IP协议,将数据包传输到多个子存储阵列节点进行分布存储。 [0019] 4, the transport layer control data server will block the processed data, check data to be encapsulated into data packets, via the SCSI protocol or IP, the transmission data packet to a plurality of memory sub-arrays distributed storage nodes. 各子存储阵列节点各自负责存一个数据分块。 Each sub-memory arrays each node is responsible for a stored data block. [0020] 5、由于控制服务器中的索引数据极其重要,故系统设计将索引数据备份记录于数据处理层的Meta data服务器中,便于后续数据的恢复、组织、查询。 [0020] 5, the index data since the control server is extremely important, so the design of the index data backup system in the data recording layer Meta data processing server, to facilitate the subsequent recovery of the data, the organization, the query. [0021] 6、子存储阵列节点接收到数据包后,进行数据包进行解包。 [0021] 6, the sub-node storage array after receiving the data packet, the data packet is unpacked. 将解包后的数据存储在其独立控制的存储阵列中。 The data stored in the memory array independently controlled unpacked. 利用其自身的阵列数据冗余纠错特性,对数据进行第二层保护。 With its own redundant array data correction characteristic, a second layer for protecting data. 因为子存储整列中相对独立为一个存储单元,都具备一定的数据保护能力,且在各自在数据恢复时也相对独立,大大节省在出现硬盘故障时的数据恢复时间,节省大量维护成本。 Because the sub-memory relative to the entire column memory cell independently, have a certain degree of data protection and recovery of data when each is relatively independent, it saves the data in the event of disk failure recovery time, saving a lot of maintenance costs.

Claims (3)

1. 一种分布式数据存储管理方法,其特征在于:其具体实现步骤为: 1)由数据源层产生数据,该数据源层即为用户的应用程序产生数据来源; 2)步骤I)中产生的数据进入数据处理层,该数据处理层将文件数据进行分块处理,所述数据处理层包含EC算法控制端处理原始数据; 3)数据处理层处理的分块数据进入数据传输层,该数据传输层将分块数据封装成数据包,通过SCSI协议或者IP协议把数据包传输到数据存储层内; 4)所述数据存储层对接收到的数据包解包后存储,该数据存储层是指若干子存储阵列节点,每个存储阵列节点负责存一个数据分块。 A distributed data storage management method, comprising: a specific implementation of steps: 1) generated by the data source layer data, the data source is the user's application layer data generating source; 2) in step I) data generated into the data processing layer, the processing layer data file into blocks of data, the data processing algorithm EC layer comprising a control terminal processing the raw data; data block 3) the data processing layer processing incoming data transport layer, the data block transfer layer data is encapsulated into packets or IP protocol by the SCSI protocol to transfer the data packets into the data storage layer; 4) after said data depacketizer storing the received data storage layer, data storage layer refers to the number of sub memory array nodes, each node is responsible for memory storage array a block of data.
2.根据权利要求1所述的一种分布式数据存储管理方法,其特征在于:所述数据存储层的子存储节点之间在物理上相对独立,即每个子存储阵列节点拥有自己的数据冗余保护方法,可独立的进行数据保护、数据备份、数据恢复。 2. A distributed data storage management method according to claim 1, wherein: said sub-memory between the data storage node layer is physically independent, i.e. each child node has its own storage array redundant data Yu protection method, an independent data protection, data backup, data recovery.
3.根据权利要求1所述的一种分布式数据存储管理方法,其特征在于:所述数据处理层还包括Meta Data服务器,该Meta Data服务器将索引数据备份记录。 3. A distributed data storage management method according to claim 1, wherein: said layer further comprises a data processing Meta Data Server, the Meta Data Server backup records index data.
CN2012104863468A 2012-11-26 2012-11-26 Distributed data storage management method CN103051676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104863468A CN103051676A (en) 2012-11-26 2012-11-26 Distributed data storage management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104863468A CN103051676A (en) 2012-11-26 2012-11-26 Distributed data storage management method

Publications (1)

Publication Number Publication Date
CN103051676A true CN103051676A (en) 2013-04-17

Family

ID=48064170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104863468A CN103051676A (en) 2012-11-26 2012-11-26 Distributed data storage management method

Country Status (1)

Country Link
CN (1) CN103051676A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683472A (en) * 2015-03-12 2015-06-03 浪潮集团有限公司 Data transmission method capable of supporting large data volume
CN106686117A (en) * 2017-01-20 2017-05-17 郑州云海信息技术有限公司 Distributed calculation cluster data storage processing system and method
CN106873911A (en) * 2017-02-10 2017-06-20 济南浪潮高新科技投资发展有限公司 Realization method for distributed data storage in container classification mode

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676855A (en) * 2008-09-11 2010-03-24 美国日本电气实验室公司 Scalable secondary storage systems and methods
CN101834899A (en) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 Distributed adaptive coding and storing method
US20110213928A1 (en) * 2010-02-27 2011-09-01 Cleversafe, Inc. Distributedly storing raid data in a raid memory and a dispersed storage network memory
CN102520890A (en) * 2011-12-30 2012-06-27 北京天地云箱科技有限公司 RS (Reed-Solomon) - DRAID( D redundant array of independent disk) system based on GPUs (graphic processing units) and method for controlling data of memory devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676855A (en) * 2008-09-11 2010-03-24 美国日本电气实验室公司 Scalable secondary storage systems and methods
US20110213928A1 (en) * 2010-02-27 2011-09-01 Cleversafe, Inc. Distributedly storing raid data in a raid memory and a dispersed storage network memory
CN101834899A (en) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 Distributed adaptive coding and storing method
CN102520890A (en) * 2011-12-30 2012-06-27 北京天地云箱科技有限公司 RS (Reed-Solomon) - DRAID( D redundant array of independent disk) system based on GPUs (graphic processing units) and method for controlling data of memory devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
顾瑜 等: "带重复数据删除的大规模存储系统可靠性保证", 《清华大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683472A (en) * 2015-03-12 2015-06-03 浪潮集团有限公司 Data transmission method capable of supporting large data volume
CN106686117A (en) * 2017-01-20 2017-05-17 郑州云海信息技术有限公司 Distributed calculation cluster data storage processing system and method
CN106873911A (en) * 2017-02-10 2017-06-20 济南浪潮高新科技投资发展有限公司 Realization method for distributed data storage in container classification mode

Similar Documents

Publication Publication Date Title
Xin et al. Reliability mechanisms for very large storage systems
US6957313B2 (en) Memory matrix and method of operating the same
Xiang et al. Optimal recovery of single disk failure in RDP code storage systems
US8862800B2 (en) Distributed storage network including memory diversity
US8479037B1 (en) Distributed hot-spare storage in a storage cluster
US9021263B2 (en) Secure data access in a dispersed storage network
US9058116B2 (en) Intra-device data protection in a raid array
US8285878B2 (en) Block based access to a dispersed data storage network
EP1343087B1 (en) Technique for correcting multiple storage device failures in a storage array
US20080221856A1 (en) Method and System for a Self Managing and Scalable Grid Storage
US9501358B2 (en) Adjusting a dispersal parameter of dispersedly stored data
US8874991B2 (en) Appending data to existing data stored in a dispersed storage network
EP2791805B1 (en) Distributed computing in a distributed storage and task network
US8738582B2 (en) Distributed object storage system comprising performance optimizations
US20120079318A1 (en) Adaptive raid for an ssd environment
US20110055161A1 (en) Cloud Data Backup Storage
US9823861B2 (en) Method and apparatus for selecting storage units to store dispersed storage data
US9158624B2 (en) Storing RAID data as encoded data slices in a dispersed storage network
US9021335B2 (en) Data recovery for failed memory device of memory device array
US20090055682A1 (en) Data storage systems and methods having block group error correction for repairing unrecoverable read errors
US8732206B2 (en) Distributed storage timestamped revisions
US20130086448A1 (en) Accessing large amounts of data in a dispersed storage network
US8327080B1 (en) Write-back cache protection
CN101316274B (en) Data disaster tolerance system suitable for WAN
CN101488104B (en) System and method for implementing high-efficiency security memory

Legal Events

Date Code Title Description
C06 Publication
WD01