CN104991739A - Method and system for refining primary execution semantics during metadata server failure substitution - Google Patents

Method and system for refining primary execution semantics during metadata server failure substitution Download PDF

Info

Publication number
CN104991739A
CN104991739A CN201510346978.8A CN201510346978A CN104991739A CN 104991739 A CN104991739 A CN 104991739A CN 201510346978 A CN201510346978 A CN 201510346978A CN 104991739 A CN104991739 A CN 104991739A
Authority
CN
China
Prior art keywords
disk
session structure
sequenceid
lasting
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510346978.8A
Other languages
Chinese (zh)
Other versions
CN104991739B (en
Inventor
李月嘉
邵冰清
刘健
董欢庆
张军伟
刘振军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Bluewhale Information Technology Co ltd
Institute of Computing Technology of CAS
Original Assignee
Tianjin Zhongke Bluewhale Information Technology Co ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Bluewhale Information Technology Co ltd, Institute of Computing Technology of CAS filed Critical Tianjin Zhongke Bluewhale Information Technology Co ltd
Priority to CN201510346978.8A priority Critical patent/CN104991739B/en
Publication of CN104991739A publication Critical patent/CN104991739A/en
Application granted granted Critical
Publication of CN104991739B publication Critical patent/CN104991739B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the fields of metadata server failure substitution technologies and non-power metadata operations, and particularly relates to a method and system for refining primary execution semantics during metadata server failure substitution. The method comprises: when a client accesses a metadata server normally, the metadata server assigning, in a session structure, a slotID, a sequenceID and a buffer for storing a response result to each operation, persistently writing the session structure to a metadata magnetic disk, and copying the slotID and the sequenceID to a backup server; after receiving the slotID and the sequenceID, the backup server placing the slotID and the sequenceID in a buffer of the backup server; and after the metadata server is shut down, the backup server reading information about the session structure from the magnetic disk, searching for a non-idempotent operation that is persistently stored on the magnetic disk and a reply result of which is not sent to the client, reconstructing a reply result according to a slot table in the session structure and sending the reply result to the client.

Description

Meta data server accurately once performed semantic method and system in losing efficacy and taking over
Technical field
The present invention relates to the field such as meta data server inefficacy substituted technique and non-idempotent operation, particularly meta data server accurately once performed semantic method and system in losing efficacy and taking over.
Background technology
In Large Scale Cluster file system, metadata and data, services isolating construction have become a kind of trend, and on the one hand, data access need not pass through meta data server, but adopt out-band method DASD, thus obtain higher data access performance, on the other hand, meta data server provides Metadata Service specially, unload data access load, thus separate unit meta data server can support higher client-access performance, manage more memory device, support larger system scale expansion, but, along with the continuous expansion of system scale, separate unit meta data server becomes the bottleneck of system for restricting expansion gradually, in order to promote the extended capability of cluster file system further, usually multiple meta data server is adopted to form Metadata server cluster at present, to disperse metadata load, support to reach the extending transversely of cluster file system by increasing meta data server quantity.
In order to ensure the high availability of metadata cluster, be generally each meta data server and be equipped with a backup server, when meta data server is delayed machine, the Metadata Service of machine server of delaying is migrated to backup server, machine server metadata information of delaying is recovered in backup server buffer memory, take over the work of machine server of delaying, realize High Availabitity.Metadata operation, is divided into non-idempotent and idempotent to operate two classes.The feature of non-idempotent operation is that twice execution will change system state twice, system will be in inconsistent state, therefore take in process in inefficacy, we need Exact recovery have non-idempotent operate in delay machine accident occur time the performed progress arrived, avoid the multiple exercise that non-idempotent operates, client is experienced less than the generation of machine event of delaying, realizes seamless taking over, what ensure that non-idempotent operate accurately once performs semanteme; Idempotent operation is then just in time contrary, and the impact that any multiple exercise produces is all identical with the impact once performed.
When occurring according to machine of delaying, metadata operation is at the schedule of machine server of delaying, and metadata operation is divided into following five classes:
(1) on the lasting metadata disk to delaying machine server, and client has received and has returned results;
(2) on the lasting metadata disk to delaying machine server, but client is before receiving and returning results, and server there occurs the machine of delaying, and this operation is idempotent operation, as read operation;
(3) on the lasting metadata disk to delaying machine server, but client receive return results before there occurs the machine of delaying, and this operation is the operation of non-idempotent, as deletion action;
(4) operation is also on the not lasting metadata disk to delaying machine server, and client has received reply result;
(5) operation is both on the not lasting metadata disk to delaying machine server, and client is also confiscated and replied result.
(1) generic operation, does not need backup server to recover, and does not need client to do any operation yet; (5) generic operation, does not need backup server to recover yet, because when client does not receive reply in specified time interval, again can initiate operation requests, and needs backup server continuation execution to return results; Backup server does not need recovery (2) generic operation yet, because client can initiate operation requests again, backup server can re-execute this operation, and the second time of idempotent operation performs and can not make the mistake; (4) generic operation, backup server needs the parameter recovering this operation, and continue to perform this operation, by it lastingly on disk, do not need to worry that the accurate of this operation once performs semanteme, because client has received reply result, request can not be initiated again, so this operation only can be arrived the disk last time lastingly.
What backup server needed guarantee accurately once to perform semanteme is (3) generic operation, because if client does not receive the reply of server in a certain time interval, again the request of identical parameters will be initiated, and for lasting to the non-idempotent operation on disk, second time performs and will make the mistake, so just need to ensure that backup server is after completing and taking over, when receiving the repetitive requests of this generic operation that client is sent, backup server only needs the reply result returning this operation, and do not need again to perform on disk, thus ensure that and accurately once perform semanteme.
When server does not delay machine, the accurate once execution semanteme of (3) generic operation can by the original session mechanism of PNFS, session mechanism, namely PNFS allows each status request carry sequence number and the response of this type of request of buffer memory, this means that each status request must have the space of a cache responses, when after client terminal start-up, the clientID of corresponding server-assignment is obtained by EXCHANGE_ID, and then, clientID is confirmed to server by CREATE_SESSION, and set up specific session respectively at client and server, when client and server sets up session, will maximum spatial cache be set simultaneously, request size, the parameters such as response size, session is made up of the slot of fixed number, each slot is by sequenceID, slotID becomes with buffer memory spatial group, all client operations are all based on specific session, each operation is by tlv triple (session ID, slot ID, sequence ID) mark, server is before beaming back response, response essential information or full detail are stored in the spatial cache of corresponding slot of server session, if client correctly receives response, then this buffer memory groove just can be used for other operations, if because reasons such as networks, client does not correctly receive response, then when client initiates request again, server is by judging that the sequenceID in sequenceID and the current cache space of asking can judge whether ask is same request, if, then server only needs the response message in the spatial cache of slot again to send to client, one is made to ask only can be performed once, thus ensure that the accurate of operation once performs semanteme.
Session due to PNFS is only present in and delays in the buffer memory of machine server, after machine event of delaying, cannot be backed up server and obtain, and therefore after machine of delaying lost efficacy and takes over, the session mechanism that original guarantee accurately once performs semanteme is no longer applicable.
At present, ensure, in the method accurately once performing semanteme of the non-idempotent operation taken in process that lost efficacy, only have Lustre file system based on the Restoration Mechanism of affairs.
Each request that client sends includes an xid sequence number, until request is assigned with a transaction sequence number (transno).Lustre is that each operation causing server state to change distributes a transaction sequence number (transno), and it is managed by server, and be piggybacked request reply message in return to client.There is a journal file being last_rcvd in the server, it records the various information of server and each client in an asynchronous manner, comprises the uuid of client, transaction sequence number (last_transno) that this client is finally submitted to, this client finally perform request and finally submits the information such as the last_xid of request to.When server receives the request of client re-transmission, for the request of xid<=last_xid, illustrate that they perform on the server, but reply message may be lost, therefore server only needs to build reply message, returns to client.But this mechanism deposits problem both ways, on the one hand the renewal of meta data server end last_rcvd journal file and the described execution causing server state to change to operate are two and independently revise, persistent content on disk may be caused inconsistent, because the atomicity of described two independent operations cannot be ensured in abnormal machine situation of delaying; On the other hand, it is mutual that this mechanism only supports to carry out a metadata between client and meta data server simultaneously, do not support that concurrent multiple metadata are mutual, have impact on system interaction efficiency.
Patent of invention " a kind of Parastor200 management node high availability method based on file-level real-time synchronization ", this invention relates to a kind of Parastor200 management node high availability method based on file-level real-time synchronization, and described method is realized by following two aspects: (1) management node storing system information synchronous; (2) management node failover.The full redundancy that this invention makes Parastor200 achieve in complete meaning by the High Availabitity realizing Parastor200 management node designs, and in system, the damage of any parts does not affect the use of storage system.The damage of any parts of management node, can be switched to service on secondary node within the several seconds.So neither impact is normal uses, the sufficient time is had again to go to repair fault, this invention proposes a kind of method realizing Parastor2000 system full redundancy based on file-level, mainly by the real-time synchronization of nodal information, ensure when parts damages, service to be switched on secondary node, but still the multiple exercise of non-idempotent operation may be there is, this invention does not relate to non-idempotent operation when the inefficacy of meta data server High Availabitity is taken over and accurately once performs guarantee.
Patent of invention " a kind of Parastor200 parallel memorizing management node high availability method based on distributed block equipment ", this invention relates to a kind of Parastor200 management node high availability method based on distributed block equipment, and described method is realized by following two aspects: (1) management node storing system information synchronous, (2) management node failover, the full redundancy that this invention makes Parastor200 achieve in complete meaning by the High Availabitity realizing Parastor200 management node designs, in system, the damage of any parts does not affect the use of storage system, the damage of any parts of management node, can within the several seconds, service be switched on secondary node, so neither impact is normal uses, the sufficient time is had again to go to repair fault, this invention proposes a kind of method realizing Parastor2000 system full redundancy based on distributed block equipment, mainly by nodal information real-time synchronization on secondary node, ensure when parts damages, service to be directly switch on secondary node.But still the multiple exercise of non-idempotent operation may be there is.This invention does not relate to non-idempotent operation when the inefficacy of meta data server High Availabitity is taken over and accurately once performs guarantee.
Summary of the invention
For the deficiencies in the prior art, the present invention proposes during the inefficacy of a kind of meta data server is taken over and accurately once perform semantic method and system.
The present invention proposes accurately once to perform semantic method during the inefficacy of a kind of meta data server is taken over, and comprising:
Step 1, when the normal accesses meta-data server of client, described meta data server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, described session structure is written on metadata disk lastingly, and described slotID and described sequenceID is copied to backup server, after described backup server receives, described slotID and described sequenceID is put into described backup server buffer memory;
Step 2, described meta data server is delayed after machine, described backup server reads the information of described session structure from disk, search the lasting non-idempotent operation to disk but not sending a reply result to client, re-construct described reply result according to the slot table in described session structure and send to client.
Described meta data server accurately once performed semantic method in losing efficacy and taking over, and described step 1 comprises: on disk, open up a block space, is specifically designed to and deposits described session structure;
Described meta data server when executable operations, simultaneously by described session structure along with operation passes to vfs layer and Local File System Layer;
When metadata operation performs Local File System Layer, after described Local File System Layer obtains the information of described session structure, described session structure and operating result are together write Disk Logs, and be synchronized to together on disk as an atomic operation, the lasting of described session structure and operating result are consistent.
Described meta data server accurately once performed semantic method in losing efficacy and taking over, and described step 2 comprises, and obtained the lasting described session structure to described disk;
Compare lasting to the sequenceID of corresponding slot in session structure described in the sequenceID of each slot in the described session structure of described disk and the buffer memory of described backup server, if the sequenceID of corresponding slot in session structure described in the lasting buffer memory being greater than described backup server to the sequenceID of certain slot in the described session structure of described disk, then described operation arrives disk lastingly, if be less than the sequenceID of corresponding slot in buffer memory session to the sequenceID of certain slot in the described session structure of described disk lastingly, then described operation does not execute, described operation is also not lasting to described disk.
Described meta data server accurately once performed semantic method in losing efficacy and taking over, also comprise: if session structure is not copied to described backup server described in the buffer memory of described backup server, then by the sequenceID of corresponding slot in session structure described in the lasting buffer memory being updated to described backup server to the sequenceID of slot in the described session structure of described disk.
Described meta data server accurately once performed semantic method in losing efficacy and taking over, also comprise: described operation arrives disk lastingly, if described client does not receive reply result when machine event of delaying occurs, then described backup server only needs to re-construct and replys result and send to described client;
Described operation is also not lasting to described disk, if described client has received reply result before machine accident of delaying occurs, it is lasting in the described session structure of described disk then the structural information of slot to be continued, if described client does not receive reply result, then retransmitted by described client and recover.
The present invention also proposes accurately once to perform semantic system during the inefficacy of a kind of meta data server is taken over, and comprising:
Lasting writing module, for when the normal accesses meta-data server of client, described meta data server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, and is written on metadata disk lastingly by described session structure;
Re-construct reply object module, delay after machine for described meta data server, described backup server reads the information of described session structure from disk, search the lasting non-idempotent operation to disk but not sending a reply result to client, re-construct described reply result according to the slot table in described session structure and send to client.
Described meta data server accurately once performed semantic system in losing efficacy and taking over, lasting writing module also for: on disk, open up a block space, be specifically designed to and deposit described session structure;
Described meta data server when executable operations, simultaneously by described session structure along with operation passes to vfs layer and Local File System Layer;
When metadata operation performs Local File System Layer, after described Local File System Layer obtains the information of described session structure, described session structure and operating result are together write Disk Logs, and be synchronized to together on disk as an atomic operation, the lasting of described session structure and operating result are consistent.
Described meta data server accurately once performed semantic system in losing efficacy and taking over, described in re-construct reply object module also for, obtain the lasting described session structure to described disk;
Compare lasting to the sequenceID of corresponding slot in session structure described in the sequenceID of each slot in the described session structure of described disk and the buffer memory of described backup server, if the sequenceID of corresponding slot in session structure described in the lasting buffer memory being greater than described backup server to the sequenceID of certain slot in the described session structure of described disk, then described operation arrives disk lastingly, if be less than the sequenceID of corresponding slot in buffer memory session to the sequenceID of certain slot in the described session structure of described disk lastingly, then described operation does not execute, described operation is also not lasting to described disk.
Described meta data server accurately once performed semantic system in losing efficacy and taking over, also comprise: if session structure is not copied to described backup server described in the buffer memory of described backup server, then by the sequenceID of corresponding slot in session structure described in the lasting buffer memory being updated to described backup server to the sequenceID of slot in the described session structure of described disk.
Described meta data server accurately once performed semantic system in losing efficacy and taking over, also comprise: described operation arrives disk lastingly, if described client does not receive reply result when machine event of delaying occurs, then described backup server only needs to re-construct and replys result and send to described client;
Described operation is also not lasting to described disk, if described client has received reply result before machine accident of delaying occurs, it is lasting in the described session structure of described disk then the structural information of slot to be continued, if described client does not receive reply result, then retransmitted by described client and recover.From inventing above, the invention has the advantages that:
By adopting persistence session, present invention effectively prevents multivariate data server non-idempotent of reforming in inefficacy is taken over and operate the inconsistency caused, make client not by the impact of machine accident of delaying, effectively ensure that non-idempotent operates in meta data server and lost efficacy to take over and accurately once perform semanteme in process.
Accompanying drawing explanation
Fig. 1 is lasting session storage node composition on inventive disk;
Fig. 2 is that the present invention creates lasting session process flow diagram;
Fig. 3 is that normal running of the present invention upgrades lasting session process flow diagram.
Embodiment
The technical problem to be solved in the present invention is that the problem that in process, non-idempotent operation accurately once performs is taken in meta data server inefficacy to can ensureing of lacking in prior art, accurately semantic method and system are once performed in providing meta data server inefficacy to take over, complete inefficacy more exactly and take over process, realize High Availabitity.
In order to reach above object, the present invention proposes accurately once to perform semantic method during meta data server inefficacy is taken over, and concrete steps are as follows:
(1) when the normal accesses meta-data server of client, server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, and slotID and sequenceID is copied to backup server, after backup server receives, put it in buffer memory;
(2) when the normal accesses meta-data server of client, deposit the information such as slotID, sequenceID and response results of each operation in session structure, session structure is written on disk lastingly simultaneously.
First, disk is opened up a block space, be specifically designed to and deposit lasting session file.
Then, server when executable operations, simultaneously by session information along with operation passes to vfs layer (virtual file system) and Local File System Layer.
Finally, when metadata operation performs Local File System Layer, after server takes session information, and this session and operating result are together write Disk Logs, and remove as an atomic operation one and brush on disk, the lasting of session and operating result are consistent, avoid the inaccuracy because different atomic operation brings.
(3) delay after machine, backup server reads session fileinfo from disk, find out the lasting non-idempotent operation to disk but also not sending a reply result to client, namely (3) above-mentioned generic operation, re-constructs according to the slot table in session and replys result and send to client.
First from disk, lasting session is read out.
Then by the sequenceID of slot each in lasting session, and in buffer memory in session the sequenceID of corresponding slot compare.If No. sequenceID of certain slot is greater than the sequenceID of corresponding slot in buffer memory session in lasting session, illustrate that this operation is lasting on disk, but the session in buffer memory does not also have enough time to be copied to backup server, in order to ensure that the accurate of this operation once performs semanteme, then the sequenceID of this slot in lasting session is updated to the sequenceID of corresponding slot in buffer memory session.If client is also confiscated when machine event of delaying occurs reply result, so backup server only needs to re-construct and replys result and send to client, and does not need to repeat.If the sequenceID of lasting certain slot of session is less than the sequenceID of corresponding slot in buffer memory session, illustrate that this operation does not also execute, also not lasting on disk.If client has received reply result before machine accident of delaying occurs, it is lasting on disk session so its slot structural information to be continued, if do not received, is then retransmitted by client and recovers.
Be below the present invention one specific embodiment, as follows:
The present invention realizes on blue whale file system bwfs, and in bwfs, server local file system is exfs, and client local file system is nfs.
First the session file storage structure that disk creates is described, as shown in Figure 1:
Lasting session file on disk, for depositing the information of former server end nfsd layer session, mainly comprise the session quantity in file and concrete session information structure etc., for 4KB block size, describe lasting session storage organization on disk.
Disk_session_seg is the session memory cell structure on disk, under default situations, a disk_session_seg can deposit the information that one has the session of 16 slot, when the reasons such as visit capacity is large cause the slot increasing number of session, a session of nfsd layer can be divided into multiple disk_session_seg on disk, and adopts the form of similar chained list to deposit.
Disk_sessionid is the session id value that nfsd layer passes over.
Slot_count represents the slot quantity of this session.
Session_seg_no represents the sequence number of this session_seg.
Next_session_seg_p represents the global position of the next seg section of this session, and 0 is end mark.
Disk_slots array deposits the sequenceID value that nfsd hands down, and minimum 2, acquiescence 16, maximum 128.If No. slotID of passing over that slotid is nfsd floor, n is slotid/default_size, m is slotid%default_size, then n and session_seg_no is compared, time identical, then the sequenceID value of this slot is stored in m position of the disk_slots array of this session_seg.
Then session information structure in local file system exfs layer internal memory is described.
The private/fsdata territory of session information by inode/dentry between nfsd layer and vfs layer (virtual file system) is transmitted; After entering the associative operation of local file system exfs layer, first from the private/fsdata territory of inode/dentry, session information was read out, before then the interpolation of lasting session being put into the journal_stop of each exfs layer associative operation before journal_start.
The management structure of each session of exfs layer is structure exfs_session, and structure exfs_session_seg deposits all slot information in each session.
Wherein disk_session_seg_p represents the position of this fragment in the lasting session file of disk.
Then for operating writing-file, server is described when carrying out non-idempotent renewal rewards theory, the step of persistence session.
(1) when file system formats, take an inode, disk creates a system file, its data block stores the specifying information of lasting all session, and in superblock, record inode position.As shown in Figure 2;
(2), when client carries out carry, initialization is carried out to lasting session structure;
(3), after client initiates operating writing-file, metadata upgrades, and therefore needs the session information of nfsd layer to be delivered to vfs layer (virtual file system) by the private/fsdata of inode/dentry.At exfs layer, when server performs exfs_write function, first obtain the session information that nfsd layer transmits.Then obtain lasting session structured fingers internal memory from superblock, and then obtain lasting session structure, and determine the block number of the application increased in daily record.After daily record starts, lasting session information and operating result one are removed and brushes on disk, until daily record terminates, as shown in Figure 3;
(4), after server end executes write operation, nfsd layer session information upgrades, slot and sequenceID in session structure is copied to backup server by server.Backup server finds machine of the delaying server stores structure for backing up created, and is updated to copying the session information obtained in corresponding structure.
Delay after machine, recovering step is as follows.
(1) backup server utilizes exfs layer functions to obtain lasting session fileinfo on disk, passes to nfsd layer;
(2) backup server reads session information from buffer memory, and is contrasted by the sequenceID of corresponding slot in the sequenceID of slot and lasting session;
If the sequenceID in lasting session is greater than the former, so its slot structural information is copied to the slot structure in buffer memory session.If now client re-send request may, illustrate that client is also confiscated when replying result and just there occurs machine accident of delaying, the former response message of delaying machine server so needing backup server to deposit according to slot spatial cache in lasting session reconstructs reply result, and send to client, and do not need to repeat.If client does not have re-send request may, when illustrating that machine accident of delaying occurs, client have received result, and this operation is lasting on disk, does not need backup server to recover;
If the sequenceID in duplicate cache session is greater than the latter, and client re-send request may, so backup server needs to re-execute this request.If client does not have re-send request may, illustrate that client have received reply result, now backup server to need slot information reproduction corresponding in duplicate cache session, in lasting session, to be finally written on disk.
The present invention also proposes accurately once to perform semantic system during meta data server inefficacy is taken over, and comprises with lower module:
Lasting writing module, for when the normal accesses meta-data server of client, described meta data server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, and is written on metadata disk lastingly by described session structure;
Re-construct reply object module, delay after machine for described meta data server, described backup server reads the information of described session structure from disk, search the lasting non-idempotent operation to disk but not sending a reply result to client, re-construct described reply result according to the slot table in described session structure and send to client.
Lasting writing module also for: on disk, open up a block space, be specifically designed to and deposit described session structure;
Described meta data server when executable operations, simultaneously by described session structure along with operation passes to vfs layer and Local File System Layer;
When metadata operation performs Local File System Layer, after described meta data server obtains the information of described session structure, described session structure and operating result are together write Disk Logs, and remove as an atomic operation one and brush on disk, the lasting of described session structure and operating result are consistent.
Described re-construct reply object module also for, obtain the lasting described session structure to described disk;
Compare lasting to the sequenceID of corresponding slot in session structure described in the sequenceID of each slot in the described session structure of described disk and the buffer memory of described backup server, if the sequenceID of corresponding slot in session structure described in the lasting buffer memory being greater than described backup server to the sequenceID of certain slot in the described session structure of described disk, then described operation arrives disk lastingly, if be less than the sequenceID of corresponding slot in buffer memory session to the sequenceID of certain slot in the described session structure of described disk lastingly, then described operation does not execute, described operation is also not lasting to described disk.If session structure is not copied to described backup server described in the buffer memory of described backup server, then by the sequenceID of corresponding slot in session structure described in the lasting buffer memory being updated to described backup server to the sequenceID of slot in the described session structure of described disk.Described operation arrives disk lastingly, if described client does not receive reply result when machine event of delaying occurs, then described backup server only needs to re-construct and replys result and send to described client; Described operation is also not lasting to described disk, if described client has received reply result before machine accident of delaying occurs, it is lasting in the described session structure of described disk then the structural information of slot to be continued, if described client does not receive reply result, then retransmitted by described client and recover.

Claims (10)

1. meta data server accurately once performed a semantic method in losing efficacy and taking over, and it is characterized in that, comprising:
Step 1, when the normal accesses meta-data server of client, described meta data server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, described session structure is written on metadata disk lastingly, and described slotID and described sequenceID is copied to backup server, after described backup server receives, described slotID and described sequenceID is put into described backup server buffer memory;
Step 2, described meta data server is delayed after machine, described backup server reads the information of described session structure from disk, search the lasting non-idempotent operation to disk but not sending a reply result to client, re-construct described reply result according to the slot table in described session structure and send to client.
2. meta data server as claimed in claim 1 accurately once performed semantic method in losing efficacy and taking over, and it is characterized in that, described step 1 comprises: on disk, open up a block space, is specifically designed to and deposits described session structure;
Described meta data server when executable operations, simultaneously by described session structure along with operation passes to vfs layer and Local File System Layer;
When metadata operation performs Local File System Layer, after described Local File System Layer obtains the information of described session structure, described session structure and operating result are together write Disk Logs, and be synchronized to together on disk as an atomic operation, the lasting of described session structure and operating result are consistent.
3. meta data server as claimed in claim 1 accurately once performed semantic method in losing efficacy and taking over, and it is characterized in that, described step 2 comprises, and obtained the lasting described session structure to described disk;
Compare lasting to the sequenceID of corresponding slot in session structure described in the sequenceID of each slot in the described session structure of described disk and the buffer memory of described backup server, if the sequenceID of corresponding slot in session structure described in the lasting buffer memory being greater than described backup server to the sequenceID of certain slot in the described session structure of described disk, then described operation arrives disk lastingly, if be less than the sequenceID of corresponding slot in buffer memory session to the sequenceID of certain slot in the described session structure of described disk lastingly, then described operation does not execute, described operation is also not lasting to described disk.
4. meta data server as claimed in claim 3 accurately once performed semantic method in losing efficacy and taking over, it is characterized in that, also comprise: if session structure is not copied to described backup server described in the buffer memory of described backup server, then by the sequenceID of corresponding slot in session structure described in the lasting buffer memory being updated to described backup server to the sequenceID of slot in the described session structure of described disk.
5. meta data server as claimed in claim 3 accurately once performed semantic method in losing efficacy and taking over, it is characterized in that, also comprise: described operation arrives disk lastingly, if described client does not receive reply result when machine event of delaying occurs, then described backup server only needs to re-construct and replys result and send to described client;
Described operation is also not lasting to described disk, if described client has received reply result before machine accident of delaying occurs, it is lasting in the described session structure of described disk then the structural information of slot to be continued, if described client does not receive reply result, then retransmitted by described client and recover.
6. meta data server accurately once performed a semantic system in losing efficacy and taking over, and it is characterized in that, comprising:
Lasting writing module, for when the normal accesses meta-data server of client, described meta data server is each operation distribution slotID, sequenceID and the spatial cache depositing response results in session structure, and is written on metadata disk lastingly by described session structure;
Re-construct reply object module, delay after machine for described meta data server, described backup server reads the information of described session structure from disk, search the lasting non-idempotent operation to disk but not sending a reply result to client, re-construct described reply result according to the slot table in described session structure and send to client.
7. meta data server as claimed in claim 6 accurately once performed semantic system in losing efficacy and taking over, and it is characterized in that, lasting writing module also for: on disk, open up a block space, be specifically designed to and deposit described session structure;
Described meta data server when executable operations, simultaneously by described session structure along with operation passes to vfs layer and Local File System Layer;
When metadata operation performs Local File System Layer, after described Local File System Layer obtains the information of described session structure, described session structure and operating result are together write Disk Logs, and be synchronized to together on disk as an atomic operation, the lasting of described session structure and operating result are consistent.
8. meta data server as claimed in claim 6 accurately once performed semantic system in losing efficacy and taking over, and it is characterized in that, described in re-construct reply object module also for, obtain the lasting described session structure to described disk;
Compare lasting to the sequenceID of corresponding slot in session structure described in the sequenceID of each slot in the described session structure of described disk and the buffer memory of described backup server, if the sequenceID of corresponding slot in session structure described in the lasting buffer memory being greater than described backup server to the sequenceID of certain slot in the described session structure of described disk, then described operation arrives disk lastingly, if be less than the sequenceID of corresponding slot in buffer memory session to the sequenceID of certain slot in the described session structure of described disk lastingly, then described operation does not execute, described operation is also not lasting to described disk.
9. meta data server as claimed in claim 8 accurately once performed semantic system in losing efficacy and taking over, it is characterized in that, also comprise: if session structure is not copied to described backup server described in the buffer memory of described backup server, then by the sequenceID of corresponding slot in session structure described in the lasting buffer memory being updated to described backup server to the sequenceID of slot in the described session structure of described disk.
10. meta data server as claimed in claim 8 accurately once performed semantic system in losing efficacy and taking over, it is characterized in that, also comprise: described operation arrives disk lastingly, if described client does not receive reply result when machine event of delaying occurs, then described backup server only needs to re-construct and replys result and send to described client;
Described operation is also not lasting to described disk, if described client has received reply result before machine accident of delaying occurs, it is lasting in the described session structure of described disk then the structural information of slot to be continued, if described client does not receive reply result, then retransmitted by described client and recover.
CN201510346978.8A 2015-06-19 2015-06-19 Meta data server failure accurate method and system for once performing semanteme in taking over Expired - Fee Related CN104991739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510346978.8A CN104991739B (en) 2015-06-19 2015-06-19 Meta data server failure accurate method and system for once performing semanteme in taking over

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510346978.8A CN104991739B (en) 2015-06-19 2015-06-19 Meta data server failure accurate method and system for once performing semanteme in taking over

Publications (2)

Publication Number Publication Date
CN104991739A true CN104991739A (en) 2015-10-21
CN104991739B CN104991739B (en) 2018-05-01

Family

ID=54303555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510346978.8A Expired - Fee Related CN104991739B (en) 2015-06-19 2015-06-19 Meta data server failure accurate method and system for once performing semanteme in taking over

Country Status (1)

Country Link
CN (1) CN104991739B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900337A (en) * 2018-06-29 2018-11-27 郑州云海信息技术有限公司 A kind of fault recovery method of Metadata Service, server, client and system
CN109165117A (en) * 2018-06-29 2019-01-08 华为技术有限公司 The method and system of data processing
CN110874298A (en) * 2019-11-13 2020-03-10 北京齐尔布莱特科技有限公司 Request data storage method and terminal equipment
CN111130896A (en) * 2019-12-29 2020-05-08 北京浪潮数据技术有限公司 NFS fault switching method and system and dual-control storage system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1533095A (en) * 2003-03-19 2004-09-29 株式会社日立制作所 Agent responding device,
US20060010442A1 (en) * 2004-07-06 2006-01-12 Oracle International Corporation System and method for managing security meta-data in a reverse proxy
CN101364930A (en) * 2008-09-24 2009-02-11 深圳市金蝶中间件有限公司 Session control method, apparatus and system
CN102662795A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Metadata fault-tolerant recovery method in distributed storage system
CN102685237A (en) * 2012-05-16 2012-09-19 东南大学 Method for requesting session maintaining and dispatching in cluster environment
CN103516549A (en) * 2013-09-27 2014-01-15 浪潮电子信息产业股份有限公司 File system metadata log mechanism based on shared object storage
CN104506625A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 Method for improving reliability of metadata nodes of cloud databases

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1533095A (en) * 2003-03-19 2004-09-29 株式会社日立制作所 Agent responding device,
US20060010442A1 (en) * 2004-07-06 2006-01-12 Oracle International Corporation System and method for managing security meta-data in a reverse proxy
CN101364930A (en) * 2008-09-24 2009-02-11 深圳市金蝶中间件有限公司 Session control method, apparatus and system
CN102662795A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Metadata fault-tolerant recovery method in distributed storage system
CN102685237A (en) * 2012-05-16 2012-09-19 东南大学 Method for requesting session maintaining and dispatching in cluster environment
CN103516549A (en) * 2013-09-27 2014-01-15 浪潮电子信息产业股份有限公司 File system metadata log mechanism based on shared object storage
CN104506625A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 Method for improving reliability of metadata nodes of cloud databases

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900337A (en) * 2018-06-29 2018-11-27 郑州云海信息技术有限公司 A kind of fault recovery method of Metadata Service, server, client and system
CN109165117A (en) * 2018-06-29 2019-01-08 华为技术有限公司 The method and system of data processing
CN108900337B (en) * 2018-06-29 2021-07-16 郑州云海信息技术有限公司 Fault recovery method, server, client and system for metadata service
CN109165117B (en) * 2018-06-29 2022-05-31 华为技术有限公司 Data processing method and system
CN110874298A (en) * 2019-11-13 2020-03-10 北京齐尔布莱特科技有限公司 Request data storage method and terminal equipment
CN111130896A (en) * 2019-12-29 2020-05-08 北京浪潮数据技术有限公司 NFS fault switching method and system and dual-control storage system

Also Published As

Publication number Publication date
CN104991739B (en) 2018-05-01

Similar Documents

Publication Publication Date Title
US10860547B2 (en) Data mobility, accessibility, and consistency in a data storage system
US11716385B2 (en) Utilizing cloud-based storage systems to support synchronous replication of a dataset
US11755415B2 (en) Variable data replication for storage implementing data backup
JP5671615B2 (en) Map Reduce Instant Distributed File System
US9235481B1 (en) Continuous data replication
US20180260125A1 (en) Synchronously replicating datasets and other managed objects to cloud-based storage systems
US7653668B1 (en) Fault tolerant multi-stage data replication with relaxed coherency guarantees
JP5539683B2 (en) Scalable secondary storage system and method
US8977593B1 (en) Virtualized CG
US8738813B1 (en) Method and apparatus for round trip synchronous replication using SCSI reads
CN106547859B (en) Data file storage method and device under multi-tenant data storage system
JP6225262B2 (en) System and method for supporting partition level journaling to synchronize data in a distributed data grid
CN102662795A (en) Metadata fault-tolerant recovery method in distributed storage system
WO2012126232A1 (en) Method, system and serving node for data backup and recovery
US20180101558A1 (en) Log-shipping data replication with early log record fetching
CN102368267A (en) Method for keeping consistency of copies in distributed system
CN110515557B (en) Cluster management method, device and equipment and readable storage medium
CN102750322B (en) Method and system for guaranteeing distributed metadata consistency for cluster file system
US10983709B2 (en) Methods for improving journal performance in storage networks and devices thereof
CN104991739A (en) Method and system for refining primary execution semantics during metadata server failure substitution
CN103294167A (en) Data behavior based low-energy consumption cluster storage replication device and method
CN103544081B (en) The management method of double base data server and device
CN109726211B (en) Distributed time sequence database
WO2022033269A1 (en) Data processing method, device and system
US20230353635A1 (en) Replication Utilizing Cloud-Based Storage Systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180501

CF01 Termination of patent right due to non-payment of annual fee