CN100543743C

CN100543743C - Multiple machine file storage system and method

Info

Publication number: CN100543743C
Application number: CNB200610098516XA
Authority: CN
Inventors: 王进兢
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-07-04
Filing date: 2006-07-04
Publication date: 2009-09-23
Anticipated expiration: 2026-07-04
Also published as: CN1900931A

Abstract

The invention provides a kind of multiple machine file storage system and method, this system mainly comprises a plurality of server nodes, adopt Fast Ethernet to connect between each server node, communicate by Fast Ethernet, described each server node segmentation is carried out the preservation of file and is read.This method mainly comprises: reading corresponding data block from the file that needs are preserved is kept at the data block segmentation of reading in each server node in the described multiple machine file storage system.Utilize the present invention, can realize a kind of low cost, high available, high performance multimachine file storage scheme.

Description

Multiple machine file storage system and method

Technical field

The present invention relates to communication field, relate in particular to a kind of multiple machine file storage system and method.

Background technology

Virtual storage system is made up of a plurality of physical store subsystems, and carries out the integrated system (storage pool) of logic manage by virtual store software, and virtual storage system has unexistent, the service function more fully of the storage subsystem of forming storage pool.Virtual storage system is the real world applications of virtual memory technique.

The application of virtual memory technique has injected new expulsive force for the construction of IT system, and virtual memory management software and management platform are occupied key position in entire I T system.Because the developing history of virtual memory technique and products thereof is shorter, the reliabilty and availability of virtual memory technique still needs further raising, can eliminate user's doubt.

A kind of file storage scheme is in the prior art: the Coda distributed file system.The Coda distributed file system is a distributed file system in the experiment, has following characteristic:

1, mobile client still can be operated under the off-line state;

2, mistake is recovered;

3, Performance And Reliability;

4, security;

5, the perfection of sharing is annotated;

6, source code be can freely obtain, Linux, NetBSD, several operation systems such as FreeBSD, Windows95 supported.

The shortcoming of above-mentioned Coda distributed file system is: corresponding C lient (client) software need be installed on calculation server, and the file read-write concurrency is not high, the file system of the parallel storage server of uncomfortable cooperation.

Another kind of file storage scheme is in the prior art: PVFS (parallel Virtual File System).PVFS is widely used in the High-Performance Computing Cluster computing system, PVFS provides the storage space of a global naming, data file is assigned on a plurality of storage subsystems, the IO management that each storage subsystem is correlated with by node server independently, all memory node servers are managed by MGR (management server), and corresponding C lient software is installed on each calculation server.The characteristics of PVFS are in IP network, and parallel reading of data is guaranteed the IO performance that the network computer cluster is required, utilizes the network transmission protocol, realize computer cluster sharing the data file simply.

The shortcoming of above-mentioned PVFS scheme is: PVFS can only support the LINUX system at present, and node fault-tolerance is relatively poor, highly relies on MGR.PVFS need install corresponding C lient software on calculation server in addition, the file system of the parallel storage server of uncomfortable cooperation.

Summary of the invention

The purpose of this invention is to provide a kind of multiple machine file storage system and method, thereby can realize low cost, high available, high performance multimachine file storage scheme.

The objective of the invention is to be achieved through the following technical solutions:

A kind of multiple machine file storage system comprises:

The verification computing module, the file that needs are kept in the multiple machine file storage system carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software data block and the check block that obtains is passed to each server node;

A plurality of server nodes, data block and corresponding check piece after the segmentation that described verification computing module is passed over are merged into a group, and data block in this group and check block are deposited in respectively in each server node.

Described each server node is read data block after the segmentation of described preservation and check block according to the request of the reading of data that receives.

Described server node sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:

Query requests processing module: when the heartbeat data of not receiving certain server node in the given time, then in other server node sends at this schedule time, do not receive the query requests of certain server node of heartbeat data, pass to the fault judgement module receiving the inquiry response that other server node returns;

The inquiry response sending module: when receive that other querying server node sends at after by the query requests of querying server node, if the book server node is not received described by the heartbeat data of querying server node in the given time, then return the inquiry response that carries failure message to described other querying server node; Otherwise, return the inquiry response that carries normal information to described other querying server node;

Fault judgement module: in the inquiry response that the query requests processing module passes over, carry failure message, perhaps, do not receive the inquiry response that other server node returns, then determine not receive that certain server node of heartbeat data breaks down in the described schedule time; In the inquiry response that the query requests processing module passes over, carry normal information, determine that then the book server node breaks down.

Come described multiple machine file storage system is expanded by in described multiple machine file storage system, increasing server node.

A kind of document handling method of multiple machine file storage system comprises:

The file that needs are preserved carries out the data block segmentation, according to the data block calculation check piece after the segmentation, by software the data block that obtains and check block is passed to each server node in the multiple machine file storage system;

Data block after the described segmentation and corresponding check piece are merged into a group, data block in this group and check block are deposited in respectively in described each server node;

When from described multiple machine file storage system, reading file, on server node, initiate the reading of data request, and this reading of data request is passed to other each server node; From each server node, read the data block and the check block of a group successively, all read up to this document.

Described method specifically comprises step:

A, from the file that needs are preserved, read the big or small identical data block of setting quantity successively, calculate check block according to the data block that reads, the data block and the corresponding check piece of described acquisition are merged into a group, and data block and check block segmentation in will organizing by software are kept in each server node;

B, from described file, read the big or small identical data block of setting quantity once more, calculate check block according to the data block that reads once more, the data block and the corresponding check piece of described acquisition are merged a group, and the data block in will organizing and check block segmentation are kept in each server node, the data block of same file and check block are kept in the file of the same name successively on each server node, read up to described file to finish.

Described method also comprises:

When the remainder of described file is not enough to constitute a setting quantity data piece in the group, in described file, replenish the all-zero word joint, from described file, read last setting quantity data piece, calculate check block according to the data block that reads, the data block of described acquisition and corresponding check piece are formed a group, and the data block in will organizing and check block segmentation are kept in each server node.

Described method also comprises:

When a server node in the described multiple machine file storage system breaks down,, calculate the data block of preserving in the described server node that breaks down of acquisition according to data block and the check block the group of reading from other server node.

As seen from the above technical solution provided by the invention, the present invention connects and a plurality of server nodes of communicating by letter by Fast Ethernet by adopting, and each server node carries out the preservation of file by software segments and reads.Thereby can provide a kind of low cost, high available, high performance multimachine file storage implementation.Compare with prior art, have following advantage:

1, being realized by software fully, do not increase hardware cost, is a kind of storage of high capacity cheaply implementation, and file read-write concurrency height is fit to do the file server of great capacity;

2, not limited by operating system, can on various operating system platforms, move;

3, simple to operate, reliability is high, compatibility good, under the situation that a station server breaks down, total system still can normally provide service;

4, existing system can conveniently be upgraded to more jumbo system, and extensibility is good.

Description of drawings

Fig. 1 is the structural representation of the embodiment of the described multiple machine file storage system of this method;

Fig. 2 reads the principle schematic of process for the file of the embodiment of multiple machine file storage system of the present invention.

Embodiment

The invention provides a kind of multiple machine file storage system and method, core of the present invention is: adopt to connect and a plurality of server nodes of communicating by letter by Fast Ethernet, each server node carries out the preservation of file by software segments and reads.

Describe the present invention in detail below in conjunction with accompanying drawing, the structure of the embodiment of the described multiple machine file storage system of this method as shown in Figure 1.This multiple machine file storage system comprises N+1 (at least three) server node, and Servers-all node status equity adopts Fast Ethernet to connect between each server node, carry out data communication and fault detect by Fast Ethernet.

Each server node inside comprises a plurality of IDE (storage unit) and NC (network interface card), adopts the dual network backup, utilizes wherein that the part network interface card carries out the server node internal communication, utilizes other part network interface card to carry out communication between the server node.

All devices in the multiple machine file storage system of the present invention adopts conventional equipment, does not need to add any special detection equipment, constitutes the tolerant system of N+1.If because reasons such as disk failure, operating system failure, hardware fault, network failure cause certain server node to break down, then all the other N server node by verification after, still can make multiple machine file storage system of the present invention that data, services normally is provided, can not cause service disruption.

Multiple machine file storage system of the present invention is realized by software, with operating system independent, is applicable to any operating system.

Server node in the multiple machine file storage system of the present invention sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:

Query requests processing module: when the heartbeat data of not receiving certain server node in the given time, then in other server node sends at this schedule time, do not receive the query requests of the server node of heartbeat data, pass to the fault judgement module receiving the inquiry response that other server node returns;

The inquiry response sending module: when receive that other querying server node sends at after by the query requests of querying server node, if the book server node is not received described by the heartbeat data of querying server node in the given time, then return the inquiry response that carries failure message to described other querying server; Otherwise, return the inquiry response that carries normal information to described other querying server;

Fault judgement module: in the inquiry response that request sending module passes over, carry failure message, perhaps, do not receive the inquiry response that other server node returns, then determine not receive that the server node of heartbeat data breaks down in the described schedule time; In the inquiry response that request sending module passes over, carry normal information, determine that then the book server node breaks down.

Describe the method for the invention in detail below in conjunction with accompanying drawing, the principle schematic that the file of multiple machine file storage system of the present invention reads process as shown in Figure 2, the file access process uses software mode to realize, makes up based on network Raid (Redundant Array of Inexpensive Disks) file access system that realizes by software.

The specific descriptions that the file of multiple machine file storage system of the present invention is preserved process are as follows:

In the time need being kept at a file in the multiple machine file storage system that comprises N+1 server node, read the data b lock (piece) of N identical size successively from this document, the verification of calculating this N data block obtains a verification block (parity).Then, above-mentioned N data block and verification block are merged into a group, deposit in respectively in the memory device of each server node.The head of each block also comprises information such as the number of the server node that this document distributes and sign.The data b lock of N identical size is read in continuation from this document, the verification of calculating this N data block again obtains a verification block, once more this N data block and verification block are merged into a group, deposit in respectively in the memory device of each server node.

As shown in Figure 2, multiple machine file storage system comprises six server nodes, when in this multiple machine file storage system, writing a file, from this document, read the data b lock of five identical sizes successively, the verification of calculating these five data block obtains a verification block (parity1), amounts to six block; According to storing on certain hash regular-scattered to five server node, verification block (parity1) is stored these five data block into a remaining server node, so just finished the storage of group data.

The back is read in continuation from this document five data block proceed the storage work of above-mentioned group data, run through up to this document.If five data block of end part less than of this document replenish the data b lock that all-zero word saves.The block of same file is stored in the file of the same name successively on each server node.

The specific descriptions that the file of multiple machine file storage system of the present invention reads process are as follows:

When a file of storing in need be to multiple machine file storage system reads, initiate read request by any node in this multiple machine file storage system, and this read request sent to other each server node by Fast Ethernet, from each server node, read group data according to this read request at every turn, read up to this document and finish.If one of them server node breaks down, then can calculate the data b lock that preserves in this server node that breaks down by above-mentioned verification block.

When a file of storing in to multiple machine file storage system reads, can also calculate the required group that reads according to offset (skew) value of this document, then, from each server node, read the group that is calculated selectively, realize query manipulation file.

The specific descriptions of the error detection process of multiple machine file storage system of the present invention are as follows:

Because the server node One's name is legion that comprises in the multiple machine file storage system of the present invention, therefore, by to adding the rule of setting in the simple heartbeat detection, adopt the voting mechanism of each node to carry out the error detection process of described multiple machine file storage system.

Adopt regularly heartbeat detection between each server node, server node receives the heartbeat data that all the other Servers-all nodes send in each timing cycle, sends heartbeat data to all the other Servers-all nodes simultaneously.If certain server node A finds not receive for a long time the heartbeat data of server node B transmission, just send query requests, the information of inquiring about this server node B to other server nodes of residue.

Other server nodes of above-mentioned residue are after receiving the query requests that server node A sends, judge whether overtime the last time receive by the heartbeat data of query node (server node B), if then return failure message and give query node (server node A); Otherwise, return normal information and give query node.

After server node A receives the inquiry response that returns of other nodes of residue, this inquiry response information is analyzed, also think server node B fault if remain other node, promptly returned failure message, perhaps server node A does not receive the response returned of other node of residue, and then server node A thinks server node B fault; Returned normal information if remain other nodes, then server node A thinks and himself breaks down.

Multiple machine file storage system of the present invention is with good expansibility, and system of the present invention is expanded only need add server node on grid and get final product, and all data can realize seamlessly transitting.

The system upgrade of a N+1 is after the system of N+m+1, and the file in the system of N+1 is file of all corresponding generation of each node on the m that increases newly a node, an include file header in the file.When reading file on this m node, according to the file header information in the file of above-mentioned generation, promptly the interstitial content in the file header information is N+1, can read corresponding file to an original N+1 node.

Newly-increased file data is written to N+m+1 node.System can provide service after increasing data like this, does not need to do data-switching.When system load is lower the data in the former N+1 system are read, and then write in the N+m+1 system, after after a while, all data have all become the file layout of N+m+1 system like this.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1, a kind of multiple machine file storage system is characterized in that, comprising:

A plurality of server nodes, data block and corresponding check piece after the segmentation that described each server node passes over described verification computing module are merged into a group, data block in this group and check block are deposited in respectively in each server node, when the file of storing in need be to multiple machine file storage system reads, described each server node is according to the request of the reading of data that receives, data block after the segmentation of described preservation and check block are read, read group data at every turn, read up to described file and to finish.

2, multiple machine file storage system according to claim 1, it is characterized in that, described server node sends heartbeat data to other server node in timing cycle, receive the heartbeat data that other server node sends, and described server node comprises:

3, multiple machine file storage system according to claim 1 is characterized in that, comes described multiple machine file storage system is expanded by increase server node in described multiple machine file storage system.

4, a kind of document handling method of multiple machine file storage system is characterized in that, comprising:

5, method according to claim 4 is characterized in that, described method specifically comprises step:

6, method according to claim 5 is characterized in that, described method also comprises:

7, method according to claim 4 is characterized in that, described method also comprises: