CN100543743C - Multiple machine file storage system and method - Google Patents

Multiple machine file storage system and method Download PDF

Info

Publication number
CN100543743C
CN100543743C CNB200610098516XA CN200610098516A CN100543743C CN 100543743 C CN100543743 C CN 100543743C CN B200610098516X A CNB200610098516X A CN B200610098516XA CN 200610098516 A CN200610098516 A CN 200610098516A CN 100543743 C CN100543743 C CN 100543743C
Authority
CN
China
Prior art keywords
server node
data block
data
block
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB200610098516XA
Other languages
Chinese (zh)
Other versions
CN1900931A (en
Inventor
王进兢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB200610098516XA priority Critical patent/CN100543743C/en
Publication of CN1900931A publication Critical patent/CN1900931A/en
Application granted granted Critical
Publication of CN100543743C publication Critical patent/CN100543743C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a kind of multiple machine file storage system and method, this system mainly comprises a plurality of server nodes, adopt Fast Ethernet to connect between each server node, communicate by Fast Ethernet, described each server node segmentation is carried out the preservation of file and is read.This method mainly comprises: reading corresponding data block from the file that needs are preserved is kept at the data block segmentation of reading in each server node in the described multiple machine file storage system.Utilize the present invention, can realize a kind of low cost, high available, high performance multimachine file storage scheme.

Description

Multiple machine file storage system and method
Technical field
The present invention relates to communication field, relate in particular to a kind of multiple machine file storage system and method.
Background technology
Virtual storage system is made up of a plurality of physical store subsystems, and carries out the integrated system (storage pool) of logic manage by virtual store software, and virtual storage system has unexistent, the service function more fully of the storage subsystem of forming storage pool.Virtual storage system is the real world applications of virtual memory technique.
The application of virtual memory technique has injected new expulsive force for the construction of IT system, and virtual memory management software and management platform are occupied key position in entire I T system.Because the developing history of virtual memory technique and products thereof is shorter, the reliabilty and availability of virtual memory technique still needs further raising, can eliminate user's doubt.
A kind of file storage scheme is in the prior art: the Coda distributed file system.The Coda distributed file system is a distributed file system in the experiment, has following characteristic:
1, mobile client still can be operated under the off-line state;
2, mistake is recovered;
3, Performance And Reliability;
4, security;
5, the perfection of sharing is annotated;
6, source code be can freely obtain, Linux, NetBSD, several operation systems such as FreeBSD, Windows95 supported.
The shortcoming of above-mentioned Coda distributed file system is: corresponding C lient (client) software need be installed on calculation server, and the file read-write concurrency is not high, the file system of the parallel storage server of uncomfortable cooperation.
Another kind of file storage scheme is in the prior art: PVFS (parallel Virtual File System).PVFS is widely used in the High-Performance Computing Cluster computing system, PVFS provides the storage space of a global naming, data file is assigned on a plurality of storage subsystems, the IO management that each storage subsystem is correlated with by node server independently, all memory node servers are managed by MGR (management server), and corresponding C lient software is installed on each calculation server.The characteristics of PVFS are in IP network, and parallel reading of data is guaranteed the IO performance that the network computer cluster is required, utilizes the network transmission protocol, realize computer cluster sharing the data file simply.
The shortcoming of above-mentioned PVFS scheme is: PVFS can only support the LINUX system at present, and node fault-tolerance is relatively poor, highly relies on MGR.PVFS need install corresponding C lient software on calculation server in addition, the file system of the parallel storage server of uncomfortable cooperation.
Summary of the invention
The purpose of this invention is to provide a kind of multiple machine file storage system and method, thereby can realize low cost, high available, high performance multimachine file storage scheme.
The objective of the invention is to be achieved through the following technical solutions:
A kind of multiple machine file storage system comprises:
The verification computing module, the file that needs are kept in the multiple machine file storage system carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software data block and the check block that obtains is passed to each server node;
A plurality of server nodes, data block and corresponding check piece after the segmentation that described verification computing module is passed over are merged into a group, and data block in this group and check block are deposited in respectively in each server node.
Described each server node is read data block after the segmentation of described preservation and check block according to the request of the reading of data that receives.
Described server node sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:
Query requests processing module: when the heartbeat data of not receiving certain server node in the given time, then in other server node sends at this schedule time, do not receive the query requests of certain server node of heartbeat data, pass to the fault judgement module receiving the inquiry response that other server node returns;
The inquiry response sending module: when receive that other querying server node sends at after by the query requests of querying server node, if the book server node is not received described by the heartbeat data of querying server node in the given time, then return the inquiry response that carries failure message to described other querying server node; Otherwise, return the inquiry response that carries normal information to described other querying server node;
Fault judgement module: in the inquiry response that the query requests processing module passes over, carry failure message, perhaps, do not receive the inquiry response that other server node returns, then determine not receive that certain server node of heartbeat data breaks down in the described schedule time; In the inquiry response that the query requests processing module passes over, carry normal information, determine that then the book server node breaks down.
Come described multiple machine file storage system is expanded by in described multiple machine file storage system, increasing server node.
A kind of document handling method of multiple machine file storage system comprises:
The file that needs are preserved carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software the data block that obtains and check block is passed to each server node in the multiple machine file storage system;
Data block after the described segmentation and corresponding check piece are merged into a group, data block in this group and check block are deposited in respectively in described each server node;
When from described multiple machine file storage system, reading file, on server node, initiate the reading of data request, and this reading of data request is passed to other each server node; From each server node, read the data block and the check block of a group successively, all read up to this document.
Described method specifically comprises step:
A, from the file that needs are preserved, read the big or small identical data block of setting quantity successively, calculate check block according to the data block that reads, the data block and the corresponding check piece of described acquisition are merged into a group, and data block and check block segmentation in will organizing by software are kept in each server node;
B, from described file, read the big or small identical data block of setting quantity once more, calculate check block according to the data block that reads once more, the data block and the corresponding check piece of described acquisition are merged a group, and the data block in will organizing and check block segmentation are kept in each server node, the data block of same file and check block are kept in the file of the same name successively on each server node, read up to described file to finish.
Described method also comprises:
When the remainder of described file is not enough to constitute a setting quantity data piece in the group, in described file, replenish the all-zero word joint, from described file, read last setting quantity data piece, calculate check block according to the data block that reads, the data block of described acquisition and corresponding check piece are formed a group, and the data block in will organizing and check block segmentation are kept in each server node.
Described method also comprises:
When a server node in the described multiple machine file storage system breaks down,, calculate the data block of preserving in the described server node that breaks down of acquisition according to data block and the check block the group of reading from other server node.
As seen from the above technical solution provided by the invention, the present invention connects and a plurality of server nodes of communicating by letter by Fast Ethernet by adopting, and each server node carries out the preservation of file by software segments and reads.Thereby can provide a kind of low cost, high available, high performance multimachine file storage implementation.Compare with prior art, have following advantage:
1, being realized by software fully, do not increase hardware cost, is a kind of storage of high capacity cheaply implementation, and file read-write concurrency height is fit to do the file server of great capacity;
2, not limited by operating system, can on various operating system platforms, move;
3, simple to operate, reliability is high, compatibility good, under the situation that a station server breaks down, total system still can normally provide service;
4, existing system can conveniently be upgraded to more jumbo system, and extensibility is good.
Description of drawings
Fig. 1 is the structural representation of the embodiment of the described multiple machine file storage system of this method;
Fig. 2 reads the principle schematic of process for the file of the embodiment of multiple machine file storage system of the present invention.
Embodiment
The invention provides a kind of multiple machine file storage system and method, core of the present invention is: adopt to connect and a plurality of server nodes of communicating by letter by Fast Ethernet, each server node carries out the preservation of file by software segments and reads.
Describe the present invention in detail below in conjunction with accompanying drawing, the structure of the embodiment of the described multiple machine file storage system of this method as shown in Figure 1.This multiple machine file storage system comprises N+1 (at least three) server node, and Servers-all node status equity adopts Fast Ethernet to connect between each server node, carry out data communication and fault detect by Fast Ethernet.
Each server node inside comprises a plurality of IDE (storage unit) and NC (network interface card), adopts the dual network backup, utilizes wherein that the part network interface card carries out the server node internal communication, utilizes other part network interface card to carry out communication between the server node.
All devices in the multiple machine file storage system of the present invention adopts conventional equipment, does not need to add any special detection equipment, constitutes the tolerant system of N+1.If because reasons such as disk failure, operating system failure, hardware fault, network failure cause certain server node to break down, then all the other N server node by verification after, still can make multiple machine file storage system of the present invention that data, services normally is provided, can not cause service disruption.
Multiple machine file storage system of the present invention is realized by software, with operating system independent, is applicable to any operating system.
Server node in the multiple machine file storage system of the present invention sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:
Query requests processing module: when the heartbeat data of not receiving certain server node in the given time, then in other server node sends at this schedule time, do not receive the query requests of the server node of heartbeat data, pass to the fault judgement module receiving the inquiry response that other server node returns;
The inquiry response sending module: when receive that other querying server node sends at after by the query requests of querying server node, if the book server node is not received described by the heartbeat data of querying server node in the given time, then return the inquiry response that carries failure message to described other querying server; Otherwise, return the inquiry response that carries normal information to described other querying server;
Fault judgement module: in the inquiry response that request sending module passes over, carry failure message, perhaps, do not receive the inquiry response that other server node returns, then determine not receive that the server node of heartbeat data breaks down in the described schedule time; In the inquiry response that request sending module passes over, carry normal information, determine that then the book server node breaks down.
Describe the method for the invention in detail below in conjunction with accompanying drawing, the principle schematic that the file of multiple machine file storage system of the present invention reads process as shown in Figure 2, the file access process uses software mode to realize, makes up based on network Raid (Redundant Array of Inexpensive Disks) file access system that realizes by software.
The specific descriptions that the file of multiple machine file storage system of the present invention is preserved process are as follows:
In the time need being kept at a file in the multiple machine file storage system that comprises N+1 server node, read the data b lock (piece) of N identical size successively from this document, the verification of calculating this N data block obtains a verification block (parity).Then, above-mentioned N data block and verification block are merged into a group, deposit in respectively in the memory device of each server node.The head of each block also comprises information such as the number of the server node that this document distributes and sign.The data b lock of N identical size is read in continuation from this document, the verification of calculating this N data block again obtains a verification block, once more this N data block and verification block are merged into a group, deposit in respectively in the memory device of each server node.
As shown in Figure 2, multiple machine file storage system comprises six server nodes, when in this multiple machine file storage system, writing a file, from this document, read the data b lock of five identical sizes successively, the verification of calculating these five data block obtains a verification block (parity1), amounts to six block; According to storing on certain hash regular-scattered to five server node, verification block (parity1) is stored these five data block into a remaining server node, so just finished the storage of group data.
The back is read in continuation from this document five data block proceed the storage work of above-mentioned group data, run through up to this document.If five data block of end part less than of this document replenish the data b lock that all-zero word saves.The block of same file is stored in the file of the same name successively on each server node.
The specific descriptions that the file of multiple machine file storage system of the present invention reads process are as follows:
When a file of storing in need be to multiple machine file storage system reads, initiate read request by any node in this multiple machine file storage system, and this read request sent to other each server node by Fast Ethernet, from each server node, read group data according to this read request at every turn, read up to this document and finish.If one of them server node breaks down, then can calculate the data b lock that preserves in this server node that breaks down by above-mentioned verification block.
When a file of storing in to multiple machine file storage system reads, can also calculate the required group that reads according to offset (skew) value of this document, then, from each server node, read the group that is calculated selectively, realize query manipulation file.
The specific descriptions of the error detection process of multiple machine file storage system of the present invention are as follows:
Because the server node One's name is legion that comprises in the multiple machine file storage system of the present invention, therefore, by to adding the rule of setting in the simple heartbeat detection, adopt the voting mechanism of each node to carry out the error detection process of described multiple machine file storage system.
Adopt regularly heartbeat detection between each server node, server node receives the heartbeat data that all the other Servers-all nodes send in each timing cycle, sends heartbeat data to all the other Servers-all nodes simultaneously.If certain server node A finds not receive for a long time the heartbeat data of server node B transmission, just send query requests, the information of inquiring about this server node B to other server nodes of residue.
Other server nodes of above-mentioned residue are after receiving the query requests that server node A sends, judge whether overtime the last time receive by the heartbeat data of query node (server node B), if then return failure message and give query node (server node A); Otherwise, return normal information and give query node.
After server node A receives the inquiry response that returns of other nodes of residue, this inquiry response information is analyzed, also think server node B fault if remain other node, promptly returned failure message, perhaps server node A does not receive the response returned of other node of residue, and then server node A thinks server node B fault; Returned normal information if remain other nodes, then server node A thinks and himself breaks down.
Multiple machine file storage system of the present invention is with good expansibility, and system of the present invention is expanded only need add server node on grid and get final product, and all data can realize seamlessly transitting.
The system upgrade of a N+1 is after the system of N+m+1, and the file in the system of N+1 is file of all corresponding generation of each node on the m that increases newly a node, an include file header in the file.When reading file on this m node, according to the file header information in the file of above-mentioned generation, promptly the interstitial content in the file header information is N+1, can read corresponding file to an original N+1 node.
Newly-increased file data is written to N+m+1 node.System can provide service after increasing data like this, does not need to do data-switching.When system load is lower the data in the former N+1 system are read, and then write in the N+m+1 system, after after a while, all data have all become the file layout of N+m+1 system like this.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (7)

1, a kind of multiple machine file storage system is characterized in that, comprising:
The verification computing module, the file that needs are kept in the multiple machine file storage system carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software data block and the check block that obtains is passed to each server node;
A plurality of server nodes, data block and corresponding check piece after the segmentation that described each server node passes over described verification computing module are merged into a group, data block in this group and check block are deposited in respectively in each server node, when the file of storing in need be to multiple machine file storage system reads, described each server node is according to the request of the reading of data that receives, data block after the segmentation of described preservation and check block are read, read group data at every turn, read up to described file and to finish.
2, multiple machine file storage system according to claim 1, it is characterized in that, described server node sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:
Query requests processing module: when the heartbeat data of not receiving certain server node in the given time, then in other server node sends at this schedule time, do not receive the query requests of certain server node of heartbeat data, pass to the fault judgement module receiving the inquiry response that other server node returns;
The inquiry response sending module: when receive that other querying server node sends at after by the query requests of querying server node, if the book server node is not received described by the heartbeat data of querying server node in the given time, then return the inquiry response that carries failure message to described other querying server node; Otherwise, return the inquiry response that carries normal information to described other querying server node;
Fault judgement module: in the inquiry response that the query requests processing module passes over, carry failure message, perhaps, do not receive the inquiry response that other server node returns, then determine not receive that certain server node of heartbeat data breaks down in the described schedule time; In the inquiry response that the query requests processing module passes over, carry normal information, determine that then the book server node breaks down.
3, multiple machine file storage system according to claim 1 is characterized in that, comes described multiple machine file storage system is expanded by increase server node in described multiple machine file storage system.
4, a kind of document handling method of multiple machine file storage system is characterized in that, comprising:
The file that needs are preserved carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software the data block that obtains and check block is passed to each server node in the multiple machine file storage system;
Data block after the described segmentation and corresponding check piece are merged into a group, data block in this group and check block are deposited in respectively in described each server node;
When from described multiple machine file storage system, reading file, on server node, initiate the reading of data request, and this reading of data request is passed to other each server node; From each server node, read the data block and the check block of a group successively, all read up to this document.
5, method according to claim 4 is characterized in that, described method specifically comprises step:
A, from the file that needs are preserved, read the big or small identical data block of setting quantity successively, calculate check block according to the data block that reads, the data block and the corresponding check piece of described acquisition are merged into a group, and data block and check block segmentation in will organizing by software are kept in each server node;
B, from described file, read the big or small identical data block of setting quantity once more, calculate check block according to the data block that reads once more, the data block and the corresponding check piece of described acquisition are merged a group, and the data block in will organizing and check block segmentation are kept in each server node, the data block of same file and check block are kept in the file of the same name successively on each server node, read up to described file to finish.
6, method according to claim 5 is characterized in that, described method also comprises:
When the remainder of described file is not enough to constitute a setting quantity data piece in the group, in described file, replenish the all-zero word joint, from described file, read last setting quantity data piece, calculate check block according to the data block that reads, the data block of described acquisition and corresponding check piece are formed a group, and the data block in will organizing and check block segmentation are kept in each server node.
7, method according to claim 4 is characterized in that, described method also comprises:
When a server node in the described multiple machine file storage system breaks down,, calculate the data block of preserving in the described server node that breaks down of acquisition according to data block and the check block the group of reading from other server node.
CNB200610098516XA 2006-07-04 2006-07-04 Multiple machine file storage system and method Active CN100543743C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200610098516XA CN100543743C (en) 2006-07-04 2006-07-04 Multiple machine file storage system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200610098516XA CN100543743C (en) 2006-07-04 2006-07-04 Multiple machine file storage system and method

Publications (2)

Publication Number Publication Date
CN1900931A CN1900931A (en) 2007-01-24
CN100543743C true CN100543743C (en) 2009-09-23

Family

ID=37656818

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200610098516XA Active CN100543743C (en) 2006-07-04 2006-07-04 Multiple machine file storage system and method

Country Status (1)

Country Link
CN (1) CN100543743C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101888405B (en) * 2010-06-07 2013-03-06 北京高森明晨信息科技有限公司 Cloud computing file system and data processing method
CN104580268A (en) * 2013-10-09 2015-04-29 南京中兴新软件有限责任公司 Method and device for transmitting file data
CN103544285B (en) * 2013-10-28 2017-09-26 华为技术有限公司 A kind of data load method and device
CN103699610A (en) * 2013-12-13 2014-04-02 乐视网信息技术(北京)股份有限公司 Method for generating file verification information, file verifying method and file verifying equipment
CN105227672B (en) * 2015-10-13 2018-04-17 国家电网公司 The method and system that data are stored and accessed
CN112256642A (en) * 2020-10-13 2021-01-22 北京神州数字科技有限公司 Mechanism and system for writing, reading and processing files distributed under micro-service system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Tutorial on Reed-solomon Coding for Fault- tolerance inRAID-like Systems. James S.Plank.http://cs.utk.edu/~plank/. 1999
Tutorial on Reed-solomon Coding for Fault- tolerance inRAID-like Systems. James S.Plank.http://cs.utk.edu/~plank/. 1999 *
一种基于Erasure Code的分布式文件系统模型. 董科军,冯家宏,阎保平.计算机工程,第31卷第20期. 2005
一种基于Erasure Code的分布式文件系统模型. 董科军,冯家宏,阎保平.计算机工程,第31卷第20期. 2005 *

Also Published As

Publication number Publication date
CN1900931A (en) 2007-01-24

Similar Documents

Publication Publication Date Title
US11941278B2 (en) Data storage system with metadata check-pointing
CN1776675B (en) Method and system for storing and using metadata in multiple storage locations
US7069465B2 (en) Method and apparatus for reliable failover involving incomplete raid disk writes in a clustering system
US6691209B1 (en) Topological data categorization and formatting for a mass storage system
US6678788B1 (en) Data type and topological data categorization and ordering for a mass storage system
CN101288052B (en) Data storing method and system
CN105659213B (en) Restore without the target drives independent data integrality and redundancy shared in distributed memory system
CN102024044A (en) Distributed file system
US7882081B2 (en) Optimized disk repository for the storage and retrieval of mostly sequential data
CN101888405B (en) Cloud computing file system and data processing method
CN100524235C (en) Recovery operations in storage networks
CN102110154B (en) File redundancy storage method in cluster file system
CN100452046C (en) Storage method and system for mass file
CN103534688B (en) Data reconstruction method, memory device and storage system
CN100543743C (en) Multiple machine file storage system and method
US9753792B2 (en) Method and system for byzantine fault tolerant data replication
WO2018064188A1 (en) Physical media aware spacially coupled journaling and replay
CN103793182A (en) Scalable storage protection
CN103942112A (en) Magnetic disk fault-tolerance method, device and system
US20050091451A1 (en) Methods of reading and writing data
CN103037004A (en) Implement method and device of cloud storage system operation
US20090024768A1 (en) Connection management program, connection management method and information processing apparatus
US20100082793A1 (en) Server-Embedded Distributed Storage System
US7805565B1 (en) Virtualization metadata promotion
US20030188102A1 (en) Disk subsystem

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant