CN103051676A

CN103051676A - Distributed data storage management method

Info

Publication number: CN103051676A
Application number: CN2012104863468A
Authority: CN
Inventors: 平原
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2012-11-26
Filing date: 2012-11-26
Publication date: 2013-04-17

Abstract

The invention provides a distributed data storage management method, belonging to the field of computer communication. The distributed data storage management method comprises the following concrete steps: data is generated by a data source layer; the generated data enters into a data processing layer; segmentation data processed by the data processing layer enters a data transmission layer; the segmentation data is packaged into a data pocket by the data transmission layer; the data packet is transmitted to a data storage layer through an SCSI (small computer system interface) protocol or an IP (internet protocol); the received data packet is unpacked and stored by the data storage layer; the data storage layer refers to a plurality of sub storage array nodes; and a data segmentation is stored in each storage array node. With the adoption of the distributed data storage management method, compared with the prior art, continuously increased user data is stored by utilizing a distributed RAID (redundant array of inexpensive disk) storage array, so that data failure risks are effectively reduced, the data restoring time is reduced, the storage capacity is conveniently expanded, the cost is reduced, and a large amount of maintaining cost is saved.

Description

A kind of Distributed Storage management method

Technical field

The present invention relates to the computer communication technology field, specifically a kind of Distributed Storage management method.

Background technology

Along with the Internet era the data explosion formula increase, the general data center is used for the capacity of hard disk of storage data in continuous growth, increasing data are stored in same hard disk or the hard disk permutation, but the hardware fault failure rate of hard disk parts does not decrease.The failure risk of the unit data that this means is continuing to increase, and the time that data are recovered, user's data reliability was faced with more and more higher challenge worse and worse.How effectively to reduce the data failure rate, be a great problem that computer communication technology faces.

Summary of the invention

Technical assignment of the present invention is to solve the deficiencies in the prior art, and a kind of Distributed Storage management method is provided.

Technical scheme of the present invention realizes in the following manner, this a kind of Distributed Storage management method, and its specific implementation step is:

1) produce data by the data source layer, the application program that this data source layer is the user produces Data Source;

2) data that produce in the step 1) enter data analysis layer, and this data analysis layer carries out piecemeal with file data to be processed, and described data analysis layer comprises EC algorithm control end and processes initial data;

3) block data processed of data analysis layer enters data transfer layer, and this data transfer layer is packaged into packet with block data, by SCSI agreement or IP agreement data packet transmission in data storage layer;

4) described data storage layer unpacks rear storage to the packet that receives, and this data storage layer refers to some sub-storage array nodes, and each storage array node is responsible for depositing a data piecemeal.

In the technique scheme, carry out Decentralization by related algorithm with being stored in the unified physics position data in the conventional architectures, by erasure code algorithm file data is carried out piecemeal and process, the block data distributed store that will process by SCSI or IP packet again is in a plurality of subarray nodes.

Relatively independent physically between the sub-memory node of described data storage layer, namely every individual sub-storage array node has the data redundancy protection method of oneself, can independently carry out data protection, data backup, data recovery.Block data in this technical scheme is stored in a plurality of subarray nodes, and each subarray node only is responsible for its corresponding data of storage, and isolation is independent relatively physically for each subarray node simultaneously.The index data of the block data that generates by algorithm carries out logical order, logical volume management with the subarray node.Each subarray all has certain data protection and recovery capability, can effectively reduce the data failure rate.

The configuration mode that this patent proposes, the subarray node can be configured to different capacity, different RAID rank (such as RAID0, RAID1, RAID5 etc.), different sub-storage array nodes all is according to the significance level of data bulk and the configuration of carrying out data array of scale, therefore can ensure the optimization of the cost of server hardware configuration, can give full play of the application advantage that each is stored in a plurality of subarray nodes, thereby effectively reduce the data failure risk, also realize the optimization of cost.

Described data analysis layer also comprises Meta Data server, and this Meta Data server is with the index data duplicated record.

The beneficial effect that the present invention compared with prior art produces is:

A kind of Distributed Storage management method of the present invention is for the threat of ever-increasing data failure rate risk and design, by the data distribution management, be distributed in a plurality of RAID arrays originally being stored in a data communication device in the RAID array, utilize distributed RAID storage array to store ever-increasing user data, effectively reduce the data failure risk, reduce data recovery times, and be convenient to the expansion of memory capacity, realize cost, save a large amount of maintenance costs.

Description of drawings

Accompanying drawing 1 is data storage of hierarchically formula schematic diagram of the present invention.

Embodiment

Below in conjunction with accompanying drawing a kind of Distributed Storage management method of the present invention is described in detail below.

As shown in Figure 1, this a kind of Distributed Storage management method, it is comprised of four parts, is specially:

The data source layer: user's application program produces Data Source.

Data analysis layer: comprise EC algorithm control end and process initial data, Meta Data service end archive index data.

Data transfer layer: initial data, the checking data piecemeal processed by the SCSI/IP agreement are distributed to each sub-storage array node.

Data storage layer: by each sub-storage array node respective data blocks of arriving of storing received respectively, finish the data storage protection.

Its specific implementation step is:

1, according to user's actual application environment, the storage classes such as quantity, amount of capacity of namely storing subarray generate the related algorithm parameter.According to relevant parameter, the erasure code(EC of configuration modification Control Server end data processing layer) the algorithm process block algorithm.

2, the initial data that namely will store by the user application acquisition at Control Server data source layer by the EC algorithm process, is block data, checking data and index factor data with original data processing.

3, algorithm arranges redundancy protecting quantity, for example initial data can be divided into M data piecemeal, K checking data piece, altogether N data block (M=N+K).Can accomplish to have among the M any N data block can recover initial data, encoding rate is N/ (N+M).

4, the block data after the data transfer layer of Control Server will be processed, checking data carry out data encapsulation and become packet, by SCSI or IP agreement, data packet transmission are carried out distributed store to a plurality of sub-storage array nodes.Each sub-storage array node is responsible for depositing a data piecemeal separately.

5, since Control Server in index data of crucial importance, so system with the index data duplicated record in the Meta of data analysis layer data server, be convenient to recovery, tissue, the inquiry of follow-up data.

6, after sub-storage array node receives packet, carry out packet and unpack.Data after unpacking are stored in its storage array of independently controlling.Utilize the array data redundant correcting characteristic of himself, data are carried out second layer protection.Because relatively independent in the son storage permutation is a memory cell, all possess certain data protection ability, and also relatively independent when each comfortable data is recovered, greatly save the data recovery time when hard disk failure occurring, save a large amount of maintenance costs.

Claims

1. Distributed Storage management method, it is characterized in that: its specific implementation step is:

2. a kind of Distributed Storage management method according to claim 1; it is characterized in that: relatively independent physically between the sub-memory node of described data storage layer; be the data redundancy protection method that every sub-storage array node has oneself, can independently carry out data protection, data backup, data recovery.

3. a kind of Distributed Storage management method according to claim 1, it is characterized in that: described data analysis layer also comprises Meta Data server, this Meta Data server is with the index data duplicated record.