CN101567805B - Method for recovering failed parallel file system - Google Patents

Method for recovering failed parallel file system Download PDF

Info

Publication number
CN101567805B
CN101567805B CN2009100854839A CN200910085483A CN101567805B CN 101567805 B CN101567805 B CN 101567805B CN 2009100854839 A CN2009100854839 A CN 2009100854839A CN 200910085483 A CN200910085483 A CN 200910085483A CN 101567805 B CN101567805 B CN 101567805B
Authority
CN
China
Prior art keywords
client computer
data server
meta data
lock
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100854839A
Other languages
Chinese (zh)
Other versions
CN101567805A (en
Inventor
舒继武
刘洋
易乐天
薛巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2009100854839A priority Critical patent/CN101567805B/en
Publication of CN101567805A publication Critical patent/CN101567805A/en
Application granted granted Critical
Publication of CN101567805B publication Critical patent/CN101567805B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for recovering a failed parallel file system belongs to the field of file system and is characterized in that a client machine applies to a server for a state validity period and keeps the validity of a locked state within the validity period; the client machine and the server judge the opponent state respectively according to the opponent identifier and the state validity period; the server writes the modification to metadata in a log; after the parallel file system fails and is then restarted, only the log is needed to be read and the incomplete affair is executed so as to recover the consistency of the metadata; furthermore, other servers can take over the failed server by reading the log. The use of the state validity period avoids the network expense caused by frequent communication between the client machine and the server for determining the opponent state, thus reducing the complexity of recovering the locked state; furthermore, the method that the server records the metadata log is convenient for quickly recovering the consistency of the metadata; and compared with the traditional method based on magnetic disk scanning, the method shortens the recovery time and obviously improves the reliability of the system.

Description

Restoration methods after parallel file system breaks down
Technical field
Restoration methods after parallel file system breaks down belongs to field of filesystems, relates in particular to reliability protection field wherein.
Background technology
Storage area network (Storage Area Network, SAN) be a kind of framework that connects external memory device and server that is widely adopted at present, its adopts technology such as optical-fibre channel, disk array, and good autgmentability is provided, and is widely used in every field such as high-performance calculation.This Architecture characteristic is to connect each memory device by special network, and provide the piece level other access interface to front end; Front end is considered as direct-connected memory device with it.
SAN environment parallel file system is a kind of parallel file system based on the storage area network technology, and a plurality of clients can be visited the file that is stored in the SAN memory device concurrently, and this visit is identical with the visit local file concerning the user.At present, parallel file system is widely used in high-performance computing sector, has then further improved whole resource utilization of parallel file system and performance based on the parallel file system of SAN environment.
Summary of the invention
The objective of the invention is to after SAN environment parallel file system is breaking down and restarting, system can recover the consistency of its state rapidly.This invention is primarily aimed at the demand of SAN environment parallel file system reliability, designed and Implemented one and has been enclosed within client computer and meta data server and breaks down and restart the method that recover system mode the back.After using this method to break down it is recovered fast and effectively at the machine of system, and the influence that file system is caused is restarted in minimizing because of breaking down as far as possible, guarantee the continuous service of file system, improve the reliabilty and availability of whole system.
The invention is characterized in: core of the present invention is that meta data server passes through state term of validity maintain customer machine to obtain lock state of resources, client computer and meta data server be separately by judging that to square mark and the state term of validity the other side taken place by restarting that fault causes, and meta data server is guaranteed the conforming fast quick-recovery of metadata when restarting after breaking down at self by log approach.
Described method realizes according to following steps in storage area network SAN environment parallel file system successively:
Step (1). construct a described SAN environment parallel file system that comprises client computer, meta data server and disk array, wherein:
Client computer, at least one, realize various file operations and obtain the metadata of associated documents from described meta data server,
Meta data server, at least one, link to each other with described client computer by Ethernet, simultaneously by optical fiber switch visit disk array, the file data that is distributed on the disk array is organized into unified parallel file system view, and provide the metadata operation service for described client computer, in described parallel file system, comprise dissimilar locks and file layout, dissimilar locks comprise: byte lock byte-range lock, share and keep lock share reservation, file authorizing lock file delegation, catalogue mandate lock directory delegation, wherein:
The byte lock is used for providing byte level other access control to file,
Share to keep and lock, be used to control mechanism, set up, the required access type and the access type of refusal are described by the OPEN operation to file access,
The file license lock; It is voidable lock; It guarantees to the holder of described file license lock inconsistent OPEN operation and document change not to take place; So that between a plurality of client computer during not to the conflict request of file; Reduce constantly and send the expense that request causes to described Metadata Service device; Only when other client computer proposes access request to this document; Just this document license lock is cancelled; In addition; Even the current conflicting access request that does not exist this document; This Metadata Service device also can at any time be cancelled this document license lock
The catalogue license lock; It is voidable lock; It guarantees that to the holder of described catalogue license lock inconsistent catalogue can not take place revises; So that between a plurality of client computer during not to the conflict request of catalogue; Reduce constantly and send the expense that request causes to described Metadata Service device; Only when other client computer proposes access request to this catalogue; Just this catalogue license lock is cancelled; In addition; Even the current conflicting access request that does not exist this catalogue; This Metadata Service device also can at any time be cancelled this catalogue license lock
File layout, the description document data are guaranteed can conduct interviews to file data as described layout holder's client computer, and in access process the inconsistent change to data can not taken place to the corresponding relation between the disk array of preserving these data,
One " the state term of validity " of described meta data server definition, this is the lock state term of validity of all locks of obtaining of described client computer, at this moment in the interval, the lock that the described meta data server that this client computer obtains is authorized is effective, before this state term of validity finishes, each request that this client computer is submitted to this meta data server all will be upgraded this state term of validity, if do not upgrade in time, all locks that this client computer was obtained when then this state term of validity finished all can lose efficacy, if after this state term of validity finishes, described meta data server granted other client computer that send with the lock request current lock conflict of holding of this client computer, then current lock must be cancelled, if this meta data server breaks down, after it is restarted, the request of the lock state that this client's function is held before in the state term of validity of setting this meta data server after restarting being sent and regaining, for same described meta data server, all state terms of validity all have the length of a setting, and the client identifying of this meta data server by unique setting to pick out after this client computer is restarted be the different running examples of same client computer, recover accordingly, otherwise, it is the different running examples of same meta data server that this client computer also can pick out by the meta data server sign of unique setting, recover accordingly
In addition, described metadata comprises: be used for the B+ tree information in managed storage space and the file layout layout and the directory information of parallel file system;
Disk array, at least one provides storage by described optical fiber switch for described client computer; Step (2). meta data server after the client computer that breaks down is restarted, is locked the recovery of state to it according to the following steps:
Step (2.1). this meta data server knows that described client computer is in inactive state:
Step (2.1.1). judged whether that newly-established client computer connects:
Step (2.1.1.1). if do not have, change step (2.1.2);
Step (2.1.1.2). if having, judge client identifying:
Step (2.1.1.2.1). this meta data server compares the owner ID that preserves on the owner ID of this client computer and this meta data server: if two owner ID are identical, then this client computer is once to set up the machine that is connected with this meta data server, changes step (2.1.1.2.2); Otherwise this client computer is the machine that newly connects, and changes step (2.3);
Step (2.1.1.2.2). this meta data server compares the version number that preserves on the version number of this client computer and this meta data server: if two version numbers are different, then this client computer is the different instances of same machine, changes step (2.2); Otherwise, change step (2.3);
Step (2.1.2). judge whether this client state term of validity finishes:
If this client computer state term of validity that do not upgrade in time before the state term of validity finishes, then this meta data server can be given tacit consent to this client computer and is in non-active state, thereby finds the client computer fault; Otherwise, change step (2.3);
Step (2.2). this meta data server is according to the result of step (2.1), and all types of lock states that the fault client computer is held recover respectively:
Step (2.2.1). cancel byte lock byte-range lock:
Finish according to the state term of validity, this meta data server is cancelled the byte lock byte-range lock that this client computer obtains;
Step (2.2.2). cancel to share to keep and lock share reservation:
Finish according to the state term of validity, meta data server is cancelled the shared reservation lock share reservation that this client computer obtains;
Step (2.2.3). cancel file layout layout:
When this client computer is restarted, can lose the relevant document layout layout that is obtained before all;
Step (2.2.3.1). if the state term of validity finishes and is not updated, then for file layout layout, this meta data server is according to configuration file, the state term of validity of being chosen in is crossed after date file layout layout is discharged immediately, or allowing file layout layout to wait for possible state term of validity renewal always, short of other file layout layout request conflicts with current file layout layout; Otherwise, change step (2.2.3.2);
Step (2.2.3.2). if this client computer restarts before the state term of validity finishes, and rebulid with this meta data server and to be connected, then this meta data server discharges the All Files layout layout state that is associated with the last example of this client computer in view of the above;
Step (2.2.4). recovery file and catalogue mandate lock delegation:
Because this client computer may have been stored some file datas in this locality before fault, and these data are associated with the mandate lock delegation that this client computer is held before, and then this client computer just need rebulid corresponding file status to meta data server;
Step (2.2.4.1). if the state term of validity finishes and is not updated, this meta data server is cancelled the mandate lock delegation that this client computer obtains;
Step (2.2.4.2). if this client computer restarts before the state term of validity finishes, and rebulid with this meta data server and to be connected:
Step (2.2.4.2.1). this client computer is at first to authorizing lock delegation to recover, and data in buffer flushed on this meta data server, this meta data server can prolong the recovery period of authorizing lock delegation, allows to surpass defined state term of validity length;
Step (2.2.4.2.2). this meta data server is cancelled the mandate lock delegation that this client computer is held;
Step (2.3). recover to finish;
Step (3). after the metadata that breaks down was restarted, the lock state that oneself was once obtained recovered client computer according to the following steps:
Step (3.1). this client computer finds that this meta data server restarts:
This client computer carries out but returning corresponding error to this meta data server transmit operation request;
Perhaps the state term of validity closes to an end, and this client computer is attempted the update mode term of validity and but found and can't upgrade;
Step (3.2). this client computer rebulids with meta data server after this is restarted and is connected, and obtains the state term of validity length that this meta data server is reset;
Step (3.3). this client computer judge this meta data server whether be before the meta data server that connects with it of this client computer;
Step (3.4). if being client computer, this meta data server once set up the meta data server that connects before, and if client computer holder lock state, then client computer is recovered all types of lock states that it is held respectively; Once do not set up before the meta data server that connects if this meta data server is not a client computer, then changeed step (3.4.6); Though if this meta data server was once set up the meta data server that connects before for client computer, client computer is the holder lock state not, then changes step (3.4.5);
Step (3.4.1). recover byte lock byte-range lock:
Step (3.4.1.1). this client computer sends the byte lock byte-range lock request of cancelling type reclaim-type to this meta data server;
Step (3.4.1.2). this meta data server is rebuild this lock state;
Step (3.4.2). recover to share to keep and lock share reservation:
Step (3.4.2.1). this client computer sends the shared reservation lock share reservation request of cancelling type reclaim-type to this meta data server;
Step (3.4.2.2). this meta data server is rebuild this lock state;
Step (3.4.3). recover to authorize lock delegation:
Step (3.4.3.1). the mandate lock delegation that this client computer is had to this meta data server statement;
Step (3.4.3.2). this meta data server is authorized the mandate lock delegation that this client computer has added special title;
Step (3.4.3.3). this client computer writes back to this meta data server with the state of being revised;
Step (3.4.3.4). this meta data server is cancelled this mandate lock delegation;
Step (3.4.4). recovery file layout layout:
Step (3.4.4.1). this client computer is stopped using All Files layout layout, and the device ID that receives from meta data server before the deletion is to the mapping of device address;
Step (3.4.4.2). whether this client computer inspection has the file layout layout that does not submit to as yet, if do not have, changes step (3.4.5);
Step (3.4.4.3). this client computer take measures to protect this client computer, this meta data server and disk array on data mode synchronously:
Step (3.4.4.3.2). judge that this client computer is whether asynchronous data are write on the disk array, but still have a data trnascription in depositing within it; If: then adopt the restoration methods of step (3.4.4.3.1) to recover;
Step (3.4.4.3.3). judge whether this client computer does not have the copy of data in internal memory, and this meta data server still is in its recovering state phase; If then current file layout layout is unavailable;
Step (3.4.4.3.4). judge whether this client computer does not have the copy of data in internal memory, and this meta data server has been through with the recovering state phase; If this client computer needs all data of buffer memory, up to the layout layout that presents a paper of success;
Step (3.4.5). this client computer sends the notice that lock state process of reconstruction finishes to this meta data server;
Step (3.4.6). meta data server done state convalescence:
Meta data server on disk array, preserve one all have the client list of the lock state that need to recover, thereby know the recovery whether all such client computer have all been finished the lock state; Server also can be selected at any time in advance done state convalescence;
Step (4). according to the following steps to the recovery of network partition (network partition) back lock state takes place:
At client-side, its step is as follows:
Step (4.1) in case. this client computer is recovered to communicate by letter with this meta data server when the state term of validity finishes as yet, perhaps after the state term of validity has finished, this client computer rebulids with this meta data server and is connected, this client computer to by main id and this meta data server sign of forming from id be buffered in this client computer on before the meta data server that is connected identify and compare, if both main id are identical and also identical from id, then visible this meta data server is not restarted, thereby distinguishes network partition network partition has taken place;
Step (4.2). client computer judges whether the state term of validity finishes: if do not finish, its lock state remains valid; Otherwise discharge all lock states;
As follows in its recovering step of meta data server end:
Step (4.1). this meta data server finds that in the client list of oneself preserving this client state term of validity finishes;
Step (4.2). this meta data server is cancelled all lock states except that authorizing lock delegation that this client computer is held;
Step (4.3). this meta data server is when having other lock request that conflicts with current mandate lock delegation to take place, and just revocation is locked delegation; In revocation lock delegation, this meta data server is write the relevant information of reversed mandate lock delegation on the disk array;
Step (5). the conforming recovery of metadata on the meta data server:
Step (5.1). meta data server is restarted;
Step (5.2). meta data server from last minor synchronous point checkpoint backward successively from disk
Read in the daily record affairs on the array, and in the Disk Logs every be labeled as successfully synchronous affairs and re-execute;
Step (5.3). meta data server is revised and is gone up the current last synchronous record of minor synchronous point checkpoint sensing;
Step (5.4). meta data server normally moves.
The invention is characterized in:
Concrete fault recovery comprises two main method:
1) relies on the state term of validity, after client computer breaks down, meta data server can pick out the fault of this client computer and restarts by this client identifying, and cancel the lock state that is obtained before this client computer fault, allow other client requests and the phase-locked lock that conflicts that client computer obtained before, thereby avoided the inconsistent state of lock request; When after meta data server breaks down, restarting, the lock state that client computer obtains before also can recovering by the state term of validity; When network partition network partition takes place, also can rely on the state term of validity to solve the inconsistent problem of lock request in addition;
2) on meta data server to the operation note daily record of metadata, like this when the meta data server fault and after restarting, can be by reading uncompleted affairs on daily record and the execution journal, thereby guarantee that file system metadata is in consistent state all the time; And other meta data server also can continue externally to provide the visit of metadata by the meta data server of uncompleted affairs taking over fault on the execution journal.
The present invention has solved lock state consistency problem after fault that client computer obtained in a simple manner by the user mode term of validity, and the state of dependence term of validity has guaranteed the consistency of lock request when network partition network partition takes place; Simultaneously, comparing traditional file systems, to need to use disk scanning instrument (as fsck) to carry out after fault is restarted time-consuming (particularly at larger file system, scanning spends several hrs even more time possibly, and during this period of time in the service interruption may be flagrant) scanning, log approach of the present invention can make extensive metadata be in consistent state in the short time (several seconds or a few minutes), recover service as far as possible rapidly, improved the reliability of SAN environment parallel file system.
Description of drawings
Fig. 1 .SAN environment parallel file system overall architecture schematic diagram.
Fig. 2 .SAN environment parallel file system software architecture and restoration methods level schematic diagram of living in.
Fig. 3. client computer fault recovery step schematic diagram.
Fig. 4. meta data server fault recovery step schematic diagram.
Fig. 5. the meta data server fault is restarted metadata consistency recovery schematic diagram.
Embodiment
Hardware device required for the present invention comprises by one or more meta data servers, one or more client computer, and one or more fiber reel battle array and optical fiber switch, and perhaps one or more Ethernet switches are formed.Fault recovery method operates in the described SAN environment parallel file system.
Fault recovery method comprises to the restoration methods of lock state with to the conforming restoration methods of metadata: the former operates in respectively on client computer and the meta data server; The latter only carries out the conforming recovery of metadata at restarting of meta data server as the log pattern of meta data server rear end file system.
The concrete step of implementing is as follows:
1. lock recovering state:
An important requirement of lock recovering state is that client computer and meta data server all will be known the opposing party's state (active or fault).In addition, client computer should be seen consistent Data View before and after meta data server is restarted.After meta data server was restarted, all have been lined up in client computer or network buffer before meta data server breaks down read or write operation must be waited until the lock that has successfully recovered protection read and write operation on the meta data server after client computer is being restarted.Can guarantee that at meta data server client computer has enough lock states with before the safe handling operation, the operation of any arrival meta data server all can be rejected.
Recovery after SAN environment parallel file system breaks down comprises recovery to client computer, to the recovery of meta data server and at the processing of network partition network partition situation.
● Client Restore:
The recovery of client computer mainly is that its various being locked in that was obtained before breaking down is returned to consistent state on the meta data server, comprise lock to byte lock byte-range, file authorizing lock file delegation and catalogue mandate lock directory delegation share the recovering state that keeps lock share reservation and file layout layout etc.
The step of Client Restore is:
Step (1). meta data server knows that client computer is in non-active state:
Step (1.1). judged whether that newly-established client computer connects:
Step (1.1.1). if do not have, change step (1.2);
Step (1.1.2). if having, judge client identifying:
Step (1.1.2.1). meta data server compares the owner ID of this client computer with the owner ID that is kept at this locality: if owner ID is identical, then this client computer is once to set up the machine that is connected with this meta data server, changes step (1.1.2.2); Otherwise this client computer is the machine that newly connects, and changes step (3);
Step (1.1.2.2). meta data server compares the version number that preserves on the version number of this client computer and the current meta data server: if version number is different, then this client computer is the different instances of same machine, that is to say that this client computer fault has taken place and restarted commentaries on classics step (2); Otherwise, change step (3);
Step (1.2). judge whether the client state term of validity finishes:
If this client computer state term of validity that do not upgrade in time before the state term of validity finishes, then this meta data server can be given tacit consent to this client computer and is in non-active state (end operation or fault), thereby finds that client computer breaks down; Otherwise, change step (3);
Step (2). meta data server is according to the result of step (1), and all types of lock states that the fault client computer is held recover respectively:
Step (2.1). cancel byte lock byte-range lock:
Finish according to the state term of validity, meta data server is cancelled the byte lock byte-range lock that this client computer obtains;
Step (2.2). cancel to share to keep and lock share reservation:
Finish according to the state term of validity, meta data server is cancelled the shared reservation lock share reservation that this client computer obtains;
Step (2.3). cancel file layout layout:
When client computer is restarted, can lose the information of the relevant document layout layout that is obtained before all;
Step (2.3.1). if the state term of validity finishes and is not updated, then for file layout layout, meta data server is according to configuration file, the state term of validity of being chosen in is crossed after date and is discharged immediately, or allowing this document layout layout to wait for possible state term of validity renewal always, short of other file layout layout request conflicts with current file layout layout; Otherwise, change step (2.3.2);
Step (2.3.2). if client computer restarts before the state term of validity finishes, and rebulid with meta data server and to be connected, then meta data server discharges the All Files layout layout state that is associated with the last example of this client computer in view of the above;
Step (2.4). recover to authorize lock delegation:
Because client computer may have been stored some file datas in this locality before breaking down, and these data are associated with the mandate lock delegation that this client computer is held before, and then this client computer just need rebulid corresponding file status to meta data server;
Step (2.4.1). if the state term of validity finishes and is not updated, meta data server is cancelled the mandate lock delegation that this client computer obtains;
Step (2.4.2). if client computer restarts before the state term of validity finishes, and rebulid with meta data server and to be connected:
Step (2.4.2.1). client computer is at first recovered mandate lock delegation, and data in buffer is flushed on the meta data server.In the recovery period of meta data server meeting proper extension mandate lock delegation, allow it to surpass defined state term of validity length.This also means from the request that conflicts with current mandate lock delegations of other client computer need wait for the time that surpasses the state term of validity.Because the normal lock delegation withdrawal process of authorizing can be because client computer can meta data server with the state refresh that changed and spend many time, other client computer need be ready for the mandate lock delegation of possible conflict.This longer recovery can increase the time window that client computer is restarted period, and needs storage reliably to make mandate lock delegation be resumed;
Step (2.4.2.2). meta data server is cancelled the mandate lock delegation that this client computer is held;
Step (3). recover to finish;
● meta data server recovers:
If meta data server is because restart the lock state of having lost, it must allow client computer to find this situation and rebulid the lock state of losing.Client computer must can be rebuild the lock state, and not can because of meta data server allow other client computer conflict visit and cause effective request of current client computer to be rejected.Equally, if client computer is not a file reconstruction lock state, then meta data server must stop client computer that this document is carried out read-write operation.
Because each client computer must have an opportunity to recover all its locks that have under the lock that does not take place to conflict is authorized to the situation of other client computer, need one section process of finishing recovery period that is called " recovering state phase ".In period, meta data server can be limited latching operation at this section, and the lock request of only cancelling type " reclaim-type " just is allowed to; If but meta data server reliably (not restarted influences) judge and authorize a lock request and can not conflict mutually with follow-up recovery request, just can authorize a new lock request.
The operating procedure that meta data server recovers is:
Step (1). client computer knows that meta data server restarts, and gets the hang of convalescence:
Client computer is but returned corresponding error to meta data server transmit operation request; Perhaps the state term of validity closes to an end, and client computer is attempted the update mode term of validity and but found and can't upgrade;
Step (2). client computer with restart after meta data server rebulid and be connected, and obtain meta data server state term of validity length (because meta data server may be reset the length of the state term of validity after restarting);
Step (3). client computer judge current meta data server whether be before the meta data server that connects with it of this client computer (because might client computer set up be connected) with other meta data server;
Step (4). if being client computer, this meta data server once set up the meta data server that connects before, and if client computer holder lock state, then client computer is recovered all types of lock states that it is held respectively; Once do not set up before the meta data server that connects if this meta data server is not a client computer, then changeed step (4.6); Though if this meta data server was once set up the meta data server that connects before for client computer, client computer is the holder lock state not, then changes step (4.5);
Step (4.1). recover byte lock byte-range lock:
Step (4.1.1). client computer sends the type reclaim-type lock request of cancelling to meta data server;
Step (4.1.2). meta data server is rebuild this lock state;
Step (4.2). recover to share to keep and lock share reservation:
Step (4.2.1). client computer sends the type reclaim-type lock request of cancelling to meta data server;
Step (4.2.2). meta data server is rebuild this lock state;
Step (4.3). recover to authorize lock delegation:
After meta data server is restarted, authorize lock delegation to use a kind of and byte lock byte-range lock and share reservation and lock the similar mode of share reservation and regain.But there is a bit semantic difference: under normal circumstances, if meta data server thinks that one is authorized lock delegation should not be awarded, then it still can carry out institute's requested operation (such as OPEN), does not authorize lock delegation but can not authorize this operation; And in order to recall, (after restarting) meta data server can be authorized this mandate lock delegation, but adds a special title, and client computer thinks that this is authorized lock delegation to be authorized by meta data server and still will be retracted now like this;
Step (4.3.1). the mandate lock delegation that client computer is had to the meta data server statement;
Step (4.3.2). meta data server is authorized the mandate lock delegation that client computer has added special title;
Step (4.3.3). client computer writes back to meta data server with the state of being revised;
Step (4.3.4). meta data server revocation lock delegation;
Step (4.4). recovery file layout layout:
Step (4.4.1). client computer is stopped using file layout layout, and the device ID that receives from meta data server before the deletion is to the mapping of device address;
Step (4.4.2). whether the client computer inspection has the file layout layout that does not submit to as yet, if do not have, changes step (4.5);
Step (4.4.3). client computer take measures to protect client computer, meta data server and disk array on data mode synchronously:
Step (4.4.3.1). judge client internal memory whether still have be modified but as yet not by data in synchronization; If have: client computer is obtained file layout layout after waiting for the end of meta data server recovering state phase, and data are write on the disk array;
Step (4.4.3.2). judge that client computer is whether asynchronous data are write on the disk array, but still have a data trnascription in depositing within it; If: then can adopt the restoration methods of step (4.4.3.1) to recover;
Step (4.4.3.3). judge whether client computer does not have the copy of data in internal memory, and meta data server still is in its recovering state phase; If then current file layout layout is unavailable;
Step (4.4.3.4). judge whether client computer does not have the copy of data in internal memory, and meta data server has been through with the recovering state phase; If client computer needs all data of buffer memory, up to the layout layout that presents a paper of success;
Step (4.5). client computer sends the notice that lock state process of reconstruction finishes to meta data server;
Step (4.6). meta data server done state convalescence:
Meta data server in storage reliably, preserve one all have the client list that needs the lock state that recovers, thereby know the recovery whether all such client computer have all been finished the lock state; Meta data server also can be selected at any time in advance done state convalescence;
● the recovery behind the generation network partition:
Network partition network partition takes place in, concerning client computer, phenomenon is identical with the meta data server fault, but different is after network partition network partition finishes, when client computer and meta data server rebulid when being connected, can know that it does not restart by the identifier of judging meta data server, thereby meta data server is existence convalescence (metadata just can get the hang of convalescence after only restarting) not, therefore for byte lock byte-rang lock, share and keep returning to form of lock share reservation and file layout layout, at meta data server, its phenomenon is identical with the client computer fault
The recovering step of client computer is behind the ■ generation network partition network partition:
Step (1). client computer find to communicate by letter (transmit operation does not have response or the update mode term of validity does not have response, still is the meta data server fault yet can't determine to have taken place network partition network partition this moment) with meta data server;
Step (2). client computer is recovered communication (the state term of validity does not finish as yet) or rebulids to be connected (the state term of validity finishes) with meta data server, client computer relatively this meta data server sign is (similar with client identifying, be divided into main id and from id) with the meta data server of this machine buffer memory sign, if the two main id is identical and also identical from id, then visible meta data server is not restarted, thereby distinguishes network partition network partition has taken place;
Step (3). judge whether the state term of validity finishes: if do not finish, its lock state remains valid; Otherwise discharge all lock states;
The recovering step of meta data server is behind the ■ generation network partition network partition:
Step (1). meta data server finds that in the client list of oneself preserving the client state term of validity finishes;
Step (2). meta data server is cancelled the lock state except that authorizing lock delegation that this client computer is held;
Step (3). meta data server postpones revocation lock delegation, up to other and the lock request generation that current mandate lock delegation conflicts are arranged, just cancels this mandate lock delegation; In revocation lock delegation, the relevant information of the reversed mandate lock of lock delegation is write in the reliable storage;
Annotate: above said mandate lock delegation recovers to refer to file authorizing lock file delegation; To catalogue mandate lock directory delegation,, apply for again when needing getting final product as long as client computer is cancelled current mandate lock delegation.
2. the conforming recovery of meta data server metadata:
For guaranteeing that file system metadata can return to consistent state rapidly after meta data server breaks down, SAN environment parallel file system adopts log approach: before metadata is made amendment, earlier these modifications being written to one writes earlier among the daily record write-ahead log, after daily record writes successfully, will the modification of metadata be synchronized on the disk again.The metadata of SAN environment parallel file system comprises B+ tree information and file layout layout and the directory information of managing the space, these metadata are kept on the disk of meta data server as the data of meta data server rear end file system, are managed by the rear end file system.In general, the file system of band daily record, its daily record has three kinds of patterns, is respectively write-back, ordered and data.In order to guarantee the consistency of SAN environment parallel file system metadata (as storage in the file system of rear end), we adopt the data pattern to the rear end file system, are about to the amendment record daily record to the metadata and the data (also being the metadata of SAN parallel file system) of rear end file system.
In the metadata of SAN environment parallel file system is revised, be respectively B+ tree and layout and directory information are made amendment; Also promptly, when carrying out allocation of space or recovery, need at first the B+ tree to be operated, then file layout layout and directory information are operated.Guarantee the consistency of SAN environment parallel file system metadata, just must guarantee the atomicity between these two operations.Therefore, these two operations need be put in the record in the daily record, also be in the middle of the affairs.These two the operation otherwise thereby all successfully writing daily record is performed; Be uncompleted log record, when recovering, do not consider.
The conforming recovering step of meta data server metadata is:
Step (1). meta data server is restarted;
Step (2). meta data server reads in the daily record affairs from disk backward successively from last minor synchronous point checkpoint, and in the Disk Logs every be labeled as successfully synchronously the affairs of (be that metadata has successfully write disk, but data not writing disk as yet) and carry out Redo;
Step (3). meta data server is revised and is gone up the current last synchronous record of minor synchronous point checkpoint sensing;
Step (4). meta data server normally moves;
Advantage of the present invention is as follows:
1) the lock state of authorizing client computer is adopted state term of validity method, avoided meta data server and client computer frequently to intercom mutually to find out the other side's the expense that state was brought;
2) rely on state term of validity method, efficiently solve client computer, meta data server and the inconsistent problem of lock state that network partition network partition is brought takes place, and recover rapidly, realize simple;
3) log approach is adopted in the operation of the SAN parallel file system metadata on the meta data server, can recover the consistency of metadata rapidly when after meta data server breaks down, restarting, and avoid traditional scanning disk mode (as fsck) scanning of extensive metadata to be recovered the service disruption of being brought that can not put up with; In addition, other meta data server also can continue externally to provide the visit of metadata by the meta data server of access log taking over fault.
Be implemented as follows described:
1. to locking the fault recovery method of state
As follows to the key structure in the fault recovery method of lock state:
● client identifying:
struct?client_owner{
verifier?co_verifier;
opaque?co_ownerid;
};
Client identifying has identified a connection example of client computer uniquely to meta data server.Wherein, the co_ownerid unique identification physical machine; Co_verifier has then identified the once connection of physical machine, and each connection all can produce different co_verifier.
● the meta data server sign:
struct?server_owner{
uint64_t?so_minor_id;
opaque?so_major_id;
};
Meta data server has identified to the unique sign of client computer a connection example of meta data server.Wherein, the so_major_id unique identification physical machine; So_minor_id has then identified the once connection of physical machine, and each connection all can produce different so_minor_id.
● file layout layout:
struct?layout{
offset?lo_offset;
length?lo_length;
layoutiomode?lo_iomode;
layout_content?lo_content;
};
enum?layoutiomode{
LAYOUTIOMODE4_READ=1,
LAYOUTIOMODE4_RW=2,
LAYOUTIOMODE4_ANY=3
};
File layout layout has described file data to the corresponding relation between the disk array of preserving these data; Layoutiomode has indicated the action type to this document layout layout.
● authorize lock delegation:
struct?nfs_delegation{
struct?list_head?super_list;
struct?rpc_cred*cred;
struct?inode*inode;
nfs4_stateid?stateid;
int?type;
#define?NFS_DELEGATION_NEED_RECLAIM?1
long?flags;
loff_t?maxsize;
_u64?change_attr;
struct?rcu_head?rcu;
};
The purpose of introduce authorizing lock delegation is in order during not to the conflict request of file, to reduce constantly sending the expense that request is caused to meta data server between a plurality of client computer.Authorize lock delegation in fact meta data server to be transferred under the situation that does not clash request to the client computer of visit this document the lock control of file.
2. to the conforming fault recovery of metadata
The metadata daily record of SAN environment parallel file system is that (as if block size is 4KBytes to the continuous blocks that are stored in the regular length on the file system disk of meta data server rear end, then daily record is that total length is the adjustable length transaction journal of 24556 pieces, adds a journal header piece).Daily record is initial with transaction list, and last piece of daily record is journal header.Each affairs comprises three pieces (description block, piece is submitted in the piece tabulation to) at least, and journal header only occupies last piece.
Daily record is a buffer circle, is write completely in case that is to say last piece of daily record, then next continues to write from first BOB(beginning of block).The metadata of rear end file system is not only write down in daily record, also writes down the data (because the metadata of SAN environment parallel file system as storage in the file system of rear end) of rear end file system.
The I/O that writes daily record is carried out on the backstage by a process, has reduced the influence to consumer process like this.
The daily record overall structure of SAN environment parallel file system is as follows:
The concrete structure of daily record is as follows:
Transaction?0 Transaction?1 ... ?Transaction?n Journal?Header
journal?header:
A daily record journal header is last piece of daily record, it described in the daily record first not by synchronous office the position.
Figure GSB00000344834500231
■ Lastflush ID:4Bytes, the transaction ID of the affairs that the last time is synchronous fully;
■ Unflushed offset:4Bytes, the offset of next affairs in the daily record (number representing) with piece;
■ Mount ID:4Bytes, the mount ID of synchronous affairs;
Wherein, offset affairs pointed must have than higher transactionID of synchronous affairs or higher mount ID, so just can be regarded as not synchronous affairs.Otherwise all affairs are all synchronous for being considered as, and the piece of offset indication can be used to the new daily record affairs of opening entry.
● affairs:
Affairs have been described the change in the file system.New piece or the piece that is modified are not directly to make amendment in file system tree, but write daily record earlier, and then are mapped to their physical locations in file system.
Affairs comprise an affairs description block, a piece tabulation, and a submission piece of affairs end.All these pieces are all continuous within daily record.
?Descripticn?Block Block?0 Bbck1 ... Block?n Commit?Block
● describe fast:
Description block has comprised transaction ID and mount ID, comprises the number of piece in the affairs, magicnumber, and the first half of the mapping of piece in the daily record.
■Transaction?ID:4?Bytes,transaction?ID;
■ Len:4 Bytes, affairs length is represented with piece;
■ Mount ID:4 Bytes, the mount ID of affairs;
■ Real blocks:Block size-24, the piece in the affairs is to the mapping of actual block;
■Magic:12?Bytes,Magic?number。
● submit piece to
Submit affairs of piece termination to.It comprises the copy of a transaction ID and the length of affairs.This external end also comprises the summary of the 16Bytes of a reservation, also comprises the latter half of piece mapping simultaneously.
Figure GSB00000344834500242
■Transaction?ID:4?Bytes,transaction?ID;
■ Len:4 Bytes, the length of affairs is represented with piece;
■ Real blocks:Block size-24, the piece in the affairs is to the mapping of actual block;
■ Digest:16 Bytes, the summary of all pieces in the affairs does not use.
The method of SAN environment parallel file system log is:
Next at first newly-built affairs insert affairs inside with the data block of revising, and write the submission piece at last, and revise a daily record journal header, with the current affairs of pointed of synchronized transaction not.After finishing, each affairs all to carry out synchronously with disk; Because the metadata of the each change of SAN environment parallel file system is abundant, therefore can not cause too much influence basically to the succession of disk access.
SAN environment parallel file system according to the method that daily record recovers is:
At first, determine to go up minor synchronous point checkpoint affairs position afterwards from reading a daily record journal header.Read daily record then, metadata is changed according to log content, thus the consistency of recovery metadata.Note because write down the position of last minor synchronous point checkpoint affairs afterwards among the daily record journal header,, then mean the success of these affairs synchronously in case therefore this position appears on the disk; And if power down or meta data server fault have taken place in synchronizing process these affairs, then a daily record journal header can not revise, therefore still point to affairs, and current incomplete affairs can not be performed when recovering, and therefore can't cause damage to the metadata consistency.

Claims (1)

1. the restoration methods after parallel file system breaks down is characterized in that described method realizes according to following steps successively in storage area network SAN environment parallel file system:
Step (1). construct a described SAN environment parallel file system that comprises client computer, meta data server and disk array, wherein:
Client computer, at least one, realize various file operations and obtain the metadata of associated documents from described meta data server,
Meta data server, at least one, link to each other with described client computer by Ethernet, simultaneously by optical fiber switch visit disk array, the file data that is distributed on the disk array is organized into unified parallel file system view, and provide the metadata operation service for described client computer, in described parallel file system, comprise dissimilar locks and file layout, dissimilar locks comprises byte lock byte-range lock, share and keep lock share reservation, file authorizing lock file delegation, catalogue mandate lock directory delegation, wherein:
The byte lock is used for providing byte level other access control to file,
Share to keep and lock, be used to control mechanism, set up explanation by the OPEN operation to file access
The required access type and the access type of refusal,
The file license lock; It is voidable lock; It guarantees to the holder of described file license lock inconsistent OPEN operation and document change not to take place; So that between a plurality of client computer during not to the conflict request of file; Reduce constantly and send the expense that request causes to described Metadata Service device; Only when other client computer proposes access request to this document; Just this document license lock is cancelled; In addition; Even the current conflicting access request that does not exist this document; This Metadata Service device also can at any time be cancelled this document license lock
The catalogue license lock; It is voidable lock; It guarantees that to the holder of described catalogue license lock inconsistent catalogue can not take place revises; So that between a plurality of client computer during not to the conflict request of catalogue; Reduce constantly and send the expense that request causes to described Metadata Service device; Only when other client computer proposes access request to this catalogue; Just this catalogue license lock is cancelled; In addition; Even the current conflicting access request that does not exist this catalogue; This Metadata Service device also can at any time be cancelled this catalogue license lock
File layout, the description document data are guaranteed can conduct interviews to file data as described layout holder's client computer, and in access process the inconsistent change to data can not taken place to the corresponding relation between the disk array of preserving these data,
One " the state term of validity " of described meta data server definition, this is the lock state term of validity of all locks of obtaining of described client computer, at this moment in the interval, the lock that the described meta data server that this client computer obtains is authorized is effective, before this state term of validity finishes, each request that this client computer is submitted to this meta data server all will be upgraded this state term of validity, if do not upgrade in time, all locks that this client computer was obtained when then this state term of validity finished all can lose efficacy, if after this state term of validity finishes, described meta data server granted other client computer that send with the lock request current lock conflict of holding of this client computer, then current lock must be cancelled, if this meta data server breaks down, after it is restarted, the request of the lock state that this client's function is held before in the state term of validity of setting this meta data server after restarting being sent and regaining, for same described meta data server, all state terms of validity all have the length of a setting, and the client identifying of this meta data server by unique setting to pick out after this client computer is restarted be the different running examples of same client computer, recover accordingly, otherwise, it is the different running examples of same meta data server that this client computer also can pick out by the meta data server sign of unique setting, recover accordingly
In addition, described metadata comprises: be used for the B+ tree information in managed storage space and the file layout layout and the directory information of parallel file system;
Disk array, at least one provides storage by described optical fiber switch for described client computer;
Step (2). meta data server after the client computer that breaks down is restarted, is locked the recovery of state to it according to the following steps:
Step (2.1). this meta data server knows that described client computer is in inactive state:
Step (2.1.1). judged whether that newly-established client computer connects:
Step (2.1.1.1). if do not have, change step (2.1.2);
Step (2.1.1.2). if having, judge client identifying:
Step (2.1.1.2.1). this meta data server compares the owner ID that preserves on the owner ID of this client computer and this meta data server: if two owner ID are identical, then this client computer is once to set up the machine that is connected with this meta data server, changes step (2.1.1.2.2); Otherwise this client computer is the machine that newly connects, and changes step (2.3);
Step (2.1.1.2.2). this meta data server compares the version number that preserves on the version number of this client computer and this meta data server: if two version numbers are different, then this client computer is the different instances of same machine, changes step (2.2); Otherwise, change step (2.3);
Step (2.1.2). judge whether this client state term of validity finishes:
If this client computer state term of validity that do not upgrade in time before the state term of validity finishes, then this meta data server can be given tacit consent to this client computer and is in non-active state, thereby finds the client computer fault; Otherwise, change step (2.3);
Step (2.2). this meta data server is according to the result of step (2.1), and all types of lock states that the fault client computer is held recover respectively:
Step (2.2.1). cancel byte lock byte-range lock:
Finish according to the state term of validity, this meta data server is cancelled the byte lock byte-range lock that this client computer obtains;
Step (2.2.2). cancel to share to keep and lock share reservation:
Finish according to the state term of validity, meta data server is cancelled the shared reservation lock share reservation that this client computer obtains;
Step (2.2.3). cancel file layout layout:
When this client computer is restarted, can lose the relevant document layout layout that is obtained before all;
Step (2.2.3.1). if the state term of validity finishes and is not updated, then for file layout layout, this meta data server is according to configuration file, the state term of validity of being chosen in is crossed after date file layout layout is discharged immediately, or allowing file layout layout to wait for possible state term of validity renewal always, short of other file layout layout request conflicts with current file layout layout; Otherwise, change step (2.2.3.2);
Step (2.2.3.2). if this client computer restarts before the state term of validity finishes, and rebulid with this meta data server and to be connected, then this meta data server discharges the All Files layout layout state that is associated with the last example of this client computer in view of the above;
Step (2.2.4). recovery file and catalogue mandate lock delegation:
Because this client computer may have been stored some file datas in this locality before fault, and these data are associated with the mandate lock delegation that this client computer is held before, and then this client computer just need rebulid corresponding file status to meta data server;
Step (2.2.4.1). if the state term of validity finishes and is not updated, this meta data server is cancelled the mandate lock delegation that this client computer obtains;
Step (2.2.4.2). if this client computer restarts before the state term of validity finishes, and rebulid with this meta data server and to be connected:
Step (2.2.4.2.1). this client computer is at first to authorizing lock delegation to recover, and some file datas that this locality is stored flush on this meta data server, this meta data server can prolong the recovery period of authorizing lock delegation, allows to surpass defined state term of validity length;
Step (2.2.4.2.2). this meta data server is cancelled the mandate lock delegation that this client computer is held;
Step (2.3). recover to finish;
Step (3). according to the following steps, the lock state that oneself was once obtained recovers client computer after restarting the meta data server that breaks down:
Step (3.1). this client computer finds that this meta data server restarts:
This client computer carries out but returning corresponding error to this meta data server transmit operation request; Perhaps the state term of validity closes to an end, and this client computer is attempted the update mode term of validity and but found and can't upgrade;
Step (3.2). this client computer rebulids with meta data server after this is restarted and is connected, and obtains the state term of validity length that this meta data server is reset;
Step (3.3). this client computer judge this meta data server whether be before the meta data server that connects with it of this client computer;
Step (3.4). if being client computer, this meta data server once set up the meta data server that connects before, and if client computer holder lock state, then client computer is recovered all types of lock states that it is held respectively; Once do not set up before the meta data server that connects if this meta data server is not a client computer, then changeed step (3.4.6); Though if this meta data server was once set up the meta data server that connects before for client computer, client computer is the holder lock state not, then changes step (3.4.5);
Step (3.4.1). recover byte lock byte-range 1ock:
Step (3.4.1.1). this client computer sends the byte lock byte-range lock request of cancelling type reclaim-type to this meta data server;
Step (3.4.1.2). this meta data server is rebuild this lock state;
Step (3.4.2). recover to share to keep and lock share reservation:
Step (3.4.2.1). this client computer sends the reservation lock share reservation request of cancelling type reclaim-type to this meta data server;
Step (3.4.2.2). this meta data server is rebuild this lock state;
Step (3.4.3). recover to authorize lock delegation:
Step (3.4.3.1). the mandate lock delegation that this client computer is had to this meta data server statement;
Step (3.4.3.2). this meta data server is authorized the mandate lock delegation that this client computer has added special title;
Step (3.4.3.3). this client computer writes back to this meta data server with the state of being revised;
Step (3.4.3.4). this meta data server is cancelled this mandate lock delegation;
Step (3.4.4). recovery file layout layout:
Step (3.4.4.1). this client computer is stopped using All Files layout layout, and the device ID that receives from meta data server before the deletion is to the mapping of device address;
Step (3.4.4.2). whether this client computer inspection has the file layout layout that does not submit to as yet, if do not have, changes step (3.4.5);
Step (3.4.4.3). this client computer take measures to protect this client computer, this meta data server and disk array on data mode synchronously:
Step (3.4.4.3.1). judge this client internal memory whether still have be modified but as yet not by data in synchronization; If have: this client computer is obtained file layout layout after waiting for this meta data server recovering state phase end, and data are write on the disk array;
Step (3.4.4.3.2). judge that this client computer is whether asynchronous data are write on the disk array, but still have a data trnascription in depositing within it; If: then adopt the restoration methods of step (3.4.4.3.1) to recover;
Step (3.4.4.3.3). judge whether this client computer does not have the copy of data in internal memory, and this meta data server still is in its recovering state phase; If then current file layout layout is unavailable;
Step (3.4.4.3.4). judge whether this client computer does not have the copy of data in internal memory, and this meta data server has been through with the recovering state phase; If this client computer needs all data of buffer memory, up to the layout layout that presents a paper of success;
Step (3.4.5). this client computer sends the notice that lock state process of reconstruction finishes to this meta data server;
Step (3.4.6). meta data server done state convalescence:
Meta data server on disk array, preserve one all have the client list of the lock state that need to recover, thereby know the recovery whether all such client computer have all been finished the lock state; Server also can be selected at any time in advance done state convalescence;
Step (4). according to the following steps to the recovery of network partition (network partition) back lock state takes place: at client-side, its step is as follows:
Step (4.1) in case. this client computer is recovered to communicate by letter with this meta data server when the state term of validity finishes as yet, perhaps after the state term of validity has finished, this client computer rebulids with this meta data server and is connected, this client computer to by main id and this meta data server sign of forming from id be buffered in this client computer on before the meta data server that is connected identify and compare, if both main id are identical and also identical from id, then visible this meta data server is not restarted, thereby distinguishes network partition network partition has taken place;
Step (4.2). client computer judges whether the state term of validity finishes: if do not finish, its lock state remains valid; Otherwise discharge all lock states;
As follows in its recovering step of meta data server end:
Step (4.1). this meta data server finds that in the client list of oneself preserving this client state term of validity finishes;
Step (4.2). this meta data server is cancelled all lock states except that authorizing lock delegation that this client computer is held;
Step (4.3). this meta data server is when having other lock request that conflicts with current mandate lock delegation to take place, and just revocation is locked delegation; In revocation lock delegation, this meta data server is write the relevant information of reversed mandate lock delegation on the disk array;
Step (5). the conforming recovery of metadata on the meta data server:
Step (5.1). meta data server is restarted;
Step (5.2). meta data server reads in the daily record affairs from disk array backward successively from last minor synchronous point checkpoint, and in the Disk Logs every be labeled as successfully synchronous affairs and re-execute;
Step (5.3). meta data server is revised and is gone up the current last synchronous record of minor synchronous point checkpoint sensing;
Step (5.4). meta data server normally moves.
CN2009100854839A 2009-05-22 2009-05-22 Method for recovering failed parallel file system Expired - Fee Related CN101567805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100854839A CN101567805B (en) 2009-05-22 2009-05-22 Method for recovering failed parallel file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100854839A CN101567805B (en) 2009-05-22 2009-05-22 Method for recovering failed parallel file system

Publications (2)

Publication Number Publication Date
CN101567805A CN101567805A (en) 2009-10-28
CN101567805B true CN101567805B (en) 2011-12-28

Family

ID=41283768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100854839A Expired - Fee Related CN101567805B (en) 2009-05-22 2009-05-22 Method for recovering failed parallel file system

Country Status (1)

Country Link
CN (1) CN101567805B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024021A (en) * 2010-11-04 2011-04-20 曙光信息产业(北京)有限公司 Method for logging metadata in logical file system
US8438247B1 (en) * 2010-12-21 2013-05-07 Amazon Technologies, Inc. Techniques for capturing data sets
CN102164161B (en) * 2011-01-10 2013-12-04 清华大学 Method and device for performing file layout extraction on parallel file system
CN102315984B (en) * 2011-03-18 2014-06-11 北京思特奇信息技术股份有限公司 User contact information monitoring method
CN102203779B (en) * 2011-05-03 2013-04-17 华为技术有限公司 Method for updating data and control apparatus thereof
CN102833273B (en) * 2011-06-13 2017-11-03 中兴通讯股份有限公司 Data recovery method and distributed cache system during temporary derangement
CN102541471B (en) * 2011-12-28 2014-12-10 创新科软件技术(深圳)有限公司 Storage system with multiple controllers
CN103399823B (en) * 2011-12-31 2016-03-30 华为数字技术(成都)有限公司 The storage means of business datum, equipment and system
CN102750376A (en) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 Multi-version database storage engine system and related processing implementation method thereof
CN103235747B (en) * 2013-04-24 2016-12-28 曙光信息产业(北京)有限公司 The restoration methods of metadata and system
CN103309820A (en) * 2013-06-28 2013-09-18 曙光信息产业(北京)有限公司 Implementation method for disk array cache
EP3028142A1 (en) * 2013-07-29 2016-06-08 Hewlett-Packard Development Company, L.P. Writing to files and file meta-data
CN103514298A (en) * 2013-10-16 2014-01-15 浪潮(北京)电子信息产业有限公司 Method for achieving file lock and metadata server
CN103544081B (en) * 2013-10-23 2015-08-12 曙光信息产业(北京)有限公司 The management method of double base data server and device
CN103729253B (en) * 2013-12-31 2017-08-04 深圳市科漫达智能管理科技有限公司 A kind of exclusive resource application method and device
US9853820B2 (en) * 2015-06-30 2017-12-26 Microsoft Technology Licensing, Llc Intelligent deletion of revoked data
SG11201703260QA (en) * 2015-12-30 2017-08-30 Huawei Tech Co Ltd Method for processing acquire lock request and server
CN105718217B (en) * 2016-01-18 2018-10-30 浪潮(北京)电子信息产业有限公司 A kind of method and device of simplify configuration storage pool data sign processing
CN106202387B (en) * 2016-07-08 2019-05-21 苏州超块链信息科技有限公司 A kind of data consistency concurrent maintenance method
CN108932249B (en) * 2017-05-24 2021-02-12 华为技术有限公司 Method and device for managing file system
CN107612763B (en) * 2017-11-08 2020-10-02 浪潮通用软件有限公司 Metadata management method, application server, service system, medium and controller
CN108768783B (en) * 2018-06-08 2021-10-22 郑州云海信息技术有限公司 Method and system for circularly testing network connectivity
US10884863B2 (en) 2018-07-20 2021-01-05 Red Hat, Inc. Client session reclaim for a distributed storage system
JP7111882B2 (en) * 2018-08-02 2022-08-02 ヒタチ ヴァンタラ エルエルシー Distributed recovery of server information
CN109165112B (en) * 2018-08-16 2022-02-18 郑州云海信息技术有限公司 Fault recovery method, system and related components of metadata cluster
CN109286672B (en) * 2018-09-30 2020-11-27 北京金山云网络技术有限公司 User request processing method and device and server
CN111125040B (en) * 2018-10-31 2023-09-08 华为技术有限公司 Method, device and storage medium for managing redo log
CN109445993A (en) * 2018-11-02 2019-03-08 郑州云海信息技术有限公司 A kind of detection method and relevant apparatus of file system health status
CN109558457B (en) * 2018-12-11 2022-04-22 浪潮(北京)电子信息产业有限公司 Data writing method, device, equipment and storage medium
CN110297728B (en) * 2019-06-20 2021-07-23 暨南大学 Selective data reconstruction method in file reconstruction process based on origin data
CN110990190A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Distributed file lock fault processing method, system, terminal and storage medium
CN111752685B (en) * 2020-05-22 2022-09-23 清华大学 Persistent memory transaction submitting method under multi-core architecture
CN113986855B (en) * 2021-09-17 2023-11-14 苏州浪潮智能科技有限公司 Method, system, equipment and medium for locking files by network file system
CN117997939B (en) * 2024-04-02 2024-07-02 湖南国科亿存信息科技有限公司 Method, device and system for monitoring state of NFS client

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
WO2008128837A1 (en) * 2007-04-18 2008-10-30 International Business Machines Corporation Fault recovery on a parallel computer system with a torus network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
WO2008128837A1 (en) * 2007-04-18 2008-10-30 International Business Machines Corporation Fault recovery on a parallel computer system with a torus network

Also Published As

Publication number Publication date
CN101567805A (en) 2009-10-28

Similar Documents

Publication Publication Date Title
CN101567805B (en) Method for recovering failed parallel file system
US7299378B2 (en) Geographically distributed clusters
US7996363B2 (en) Real-time apply mechanism in standby database environments
US7266669B2 (en) File system with file management function and file management method
US7543181B2 (en) Recovery from failures within data processing systems
US5724581A (en) Data base management system for recovering from an abnormal condition
US7702660B2 (en) I/O free recovery set determination
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
CA2550614C (en) Cluster database with remote data mirroring
CN111143389A (en) Transaction execution method and device, computer equipment and storage medium
US20040215998A1 (en) Recovery from failures within data processing systems
US20040215883A1 (en) Partitioned shared cache
US20050289152A1 (en) Method and apparatus for implementing a file system
KR100450400B1 (en) A High Avaliability Structure of MMDBMS for Diskless Environment and data synchronization control method thereof
US20060224639A1 (en) Backup system, program and backup method
CN102073739A (en) Method for reading and writing data in distributed file system with snapshot function
CN110196788B (en) Data reading method, device and system and storage medium
JP4286857B2 (en) Internode shared file control method
CN116089359A (en) Database snapshot generation method and device, electronic equipment and medium
JP2004062759A (en) Database log management method, its device and its program
JP3866448B2 (en) Internode shared file control method
CN116401313A (en) Shared storage database cluster information synchronization method
Zhao Database Replication and Clustering for High Availability
Vung et al. Validation Rules based on Concurrency Control (VRCC) For Network File System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111228

Termination date: 20170522

CF01 Termination of patent right due to non-payment of annual fee