CN102833273A - Data restoring method when meeting temporary fault and distributed caching system - Google Patents

Data restoring method when meeting temporary fault and distributed caching system Download PDF

Info

Publication number
CN102833273A
CN102833273A CN2011101576931A CN201110157693A CN102833273A CN 102833273 A CN102833273 A CN 102833273A CN 2011101576931 A CN2011101576931 A CN 2011101576931A CN 201110157693 A CN201110157693 A CN 201110157693A CN 102833273 A CN102833273 A CN 102833273A
Authority
CN
China
Prior art keywords
data
server
key
replica
replica server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101576931A
Other languages
Chinese (zh)
Other versions
CN102833273B (en
Inventor
郭斌
陈典强
韩银俊
宫微微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201110157693.1A priority Critical patent/CN102833273B/en
Priority to PCT/CN2012/070849 priority patent/WO2012171345A1/en
Publication of CN102833273A publication Critical patent/CN102833273A/en
Application granted granted Critical
Publication of CN102833273B publication Critical patent/CN102833273B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/142Managing session states for stateless protocols; Signalling session states; State transitions; Keeping-state mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a data restoring method when meeting a temporary fault. The method comprises the following steps: a collaboration server initiates data operation to a replica server, and finds out a fault in the replica server, then generates Key change record containing operated data keys; after the replica server is recovered from the fault, the collaboration server initiates a data restoring operation to the replica server according to the Key change record; the replica server performs local data restoring according to the data restoring operation initiated by the collaboration server. The invention further discloses a distributed caching system for data restoring when meeting the temporary fault. According to the method and the system, the consistency among multiple replicas of data after the temporary fault is solved can be ensured, the accuracy of the data stored in the distributed caching system is improved, promote the quality attribute of the distributed caching system is promoted, and the application experience is optimized.

Description

Data recovery method during temporary derangement and distributed caching system
Technical field
The present invention relates to the cloud computing technical field, data recovery method and distributed caching system when relating in particular to a kind of temporary derangement.
Background technology
Cloud computing (Cloud Computing) is the product that grid computing (Grid Computing), Distributed Calculation (Distributed Computing), parallel computation (Parallel Computing), effectiveness calculating (Utility Computing) network storage (Network Storage Technologies), virtual (Virtualization), load balancing traditional calculations machine technologies such as (Load Balance) and network technical development merge.It is intended to be integrated into a system with powerful calculating ability to the relatively low computational entity of a plurality of costs through network.Distributed caching is a field in the cloud computing category, and its effect provides the distributed storage service of mass data and the ability of high speed reads write access.
The distributed caching system is connected and composed mutually by some server nodes and client, and wherein, server node is responsible for the storage of data, operation such as client can write the server node data, reads, upgrades, deletion.In general; The data that write can not only be kept on the individual server node, but on the multiple servers node, preserve the copy of same data, backup each other; Said data are made up of key (Key) and value (Value); Key is equivalent to the index of data, and Value is the data content of Key representative data, and Key and Value concern one to one in logic.
In the distributed caching system, guarantee that the consistency of data is key issues.After fault recovery, each copy that data are preserved on each server node in the distributed caching system may become inconsistent.For example; The corresponding data of Key are carried out repeatedly write, upgrade, during the action such as deletion; If have network failure or various hardware and software failure, then after fault recovery, the Value that the said Key that preserves on the different server nodes is corresponding maybe be different.
Prior art is after fault recovery; If will pass through the Key reading of data immediately, then each copy is taken out and comparison, select correct Value according to certain versions of data comparison rule; Simultaneously legacy data is repaired, with the consistency of a plurality of copies of keeping same data.If but from the fault recovery to needs, pass through between the Key reading of data; Repeatedly fault has taken place in the server node at a plurality of copies place in succession; Then when needs pass through the Key reading of data; May occur reading less than data read older data or each copy of reading between the situation such as new and old of having no way of, thereby reduced the qualitative attribute of distributed caching system, and had a strong impact on the application experience of distributed caching system.
Summary of the invention
In view of this, data recovery method and distributed caching system when main purpose of the present invention is to provide a kind of temporary derangement can keep the consistency of same each copy of data after the server node fault recovery in the distributed caching system.
For achieving the above object, technical scheme of the present invention is achieved in that
Data recovery method when the invention provides a kind of temporary derangement, said method comprises:
When Collaboration Server is initiated data manipulation to replica server, find to have the replica server fault, then generate the Key change record that includes each data key (Key) of operating;
After said replica server recovered from fault, said Collaboration Server was initiated the data repair operation according to said Key change record to said replica server;
The local data reparation is carried out in the data repair operation that said replica server is initiated according to said Collaboration Server.
In such scheme, said to replica server initiation data manipulation, comprising: initiate the write operation of data or upgrade operation to replica server.
In such scheme, said generation includes the Key change record of each the data Key that carried out operation between age at failure, also comprises:
Said Collaboration Server is that said replica server is set up saveset;
Between said replica server age at failure, said Collaboration Server generates the Key change record that includes each data Key that the fault manipulate crosses, and is saved in the saveset of said replica server.
In such scheme, said Collaboration Server is initiated the data repair operation according to the Key change record of being preserved to said replica server, comprising:
Said Collaboration Server obtains in the said Key change record all corresponding data trnascriptions of each Key, and identifies the data trnascription that each Key is corresponding in the said Key change record last time operated;
The said last time that use identifies carried out the data trnascription of operation said replica server was initiated the data repair operation.
In such scheme, the data trnascription that the said last time of identifying each Key correspondence in the said Key change record operated, for:
A plurality of data trnascriptions of identical Key in all data trnascriptions that obtained are carried out version relatively, obtain the data trnascription that the corresponding last time of said each Key operated.
In such scheme, said Collaboration Server obtains all corresponding data trnascriptions of each Key in the said Key change record, for:
Said Collaboration Server reads the corresponding data trnascription of said each Key and from self obtaining the corresponding data trnascription of said each Key from all corresponding replica servers of said each Key.
In such scheme, the local data reparation is carried out in the data repair operation that said replica server is initiated according to said Collaboration Server, comprising:
The data trnascription that said replica server was operated according to the last time that each Key is corresponding in the said Key change record upgrades the local data trnascription of preserving.
In such scheme, in the data repair operation that said replica server is initiated according to said Collaboration Server, to carry out after the local data reparation, said method also comprises:
Said replica server returns the reparation result to said Collaboration Server after upgrading the local data trnascription of preserving;
When said reparation result was failure, said Collaboration Server continued to initiate the Data Update operation to said replica server.
The present invention also provides a kind of distributed caching system of data repair when being used for temporary derangement, and said system comprises: Collaboration Server and one or more replica server, wherein,
Collaboration Server is used for when said one or more replica servers are initiated data manipulation, finding to have the replica server fault, then generates the Key change record that includes each the data Key that operated; And, be used for after said replica server recovers from fault,, initiating the data repair operation to said replica server according to said Key change record;
Said one or more replica server is used for after fault recovery, carries out the local data reparation according to the data repair operation that said Collaboration Server is initiated.
In such scheme, said Collaboration Server also is used for said each replica server and sets up saveset; Between said each replica server age at failure, generate the Key change record include each data Key that the fault manipulate crosses, and be saved in the saveset of said each replica server.
In such scheme; Said Collaboration Server; Also be used for obtaining all corresponding data trnascriptions of each Key of said Key change record; Identify in the said Key change record data trnascription that the corresponding last time of each Key operate, and the data trnascription that uses said last time of identifying to operate is operated to said replica server initiation data repair.
In such scheme, said replica server also is used for initiating the data trnascription that said last time that data repair manipulates operate, the data trnascription of renewal this locality preservation according to said Collaboration Server.
In such scheme, said replica server also is used for after upgrading the local data trnascription of preserving, returning the reparation result to said Collaboration Server; Said Collaboration Server also is used for when the reparation result of said replica server feedback is failure, continues to initiate the Data Update operation to said replica server.
Data recovery method and distributed caching system during temporary derangement provided by the present invention when finding the replica server fault is arranged, generate the Key change record by Collaboration Server; After said replica server recovers from fault; Initiate the data repair operation according to said Key change record to said replica server, make said replica server can in time carry out the local data reparation, thereby after having guaranteed that temporary derangement recovers; Still can keep consistency between a plurality of copies of data; Improve the accuracy of distributed caching system preservation data, promoted the qualitative attribute of distributed caching system, optimized the experience of using.
Description of drawings
The realization flow figure of the data recovery method of Fig. 1 during for a kind of temporary derangement of the present invention;
Fig. 2 is the composition structural representation of distributed caching system in a kind of specific embodiment of the present invention;
Fig. 3 is data repair realization process flow chart during the temporary derangement of distributed caching system in a kind of specific embodiment of the present invention.
Embodiment
Basic thought of the present invention is: when carrying out data manipulation, as carry out writing or when upgrading, when the Collaboration Server in the distributed caching system is found the replica server fault is arranged, generating the change record of said data and preserve of data; After said replica server fault recovery; Collaboration Server carries out data repair according to the change record of said data to said replica server; Make that the copy of data is consistent described in copy and other replica servers of said the above data of replica server; So, guaranteed that temporary derangement recovers the consistency between a plurality of copies of back data.
Data recovery method during a kind of temporary derangement of the present invention is applied to the distributed caching system, can, after temporary derangement recovers, keep the consistency between data trnascription fast, with reference to shown in Figure 1, said method mainly may further comprise the steps:
Step 101: when Collaboration Server is initiated data manipulation to replica server, find to have the replica server fault, then generate the Key change record that includes each the data Key that operated;
Particularly; Collaboration Server is after the data of the Key-Value that receives the client initiation write request or Data Update request; Need be when each replica server be initiated the write operation of data or is upgraded operation, finding has the replica server fault, then generates the Key change record.
Wherein, Collaboration Server is the normal server node of operation in the distributed caching system, is used to receive the data manipulation that client is initiated, and initiates data manipulation to each replica server accordingly.
Replica server be preserve in the Servers-all node of the current data trnascription that needs operating data in the distributed caching system, each server node except that said Collaboration Server.
Wherein, in the Key change record.
In the practical application, said Collaboration Server can be set up saveset for each replica server; Between each replica server age at failure; Said Collaboration Server generates the Key change record that includes each data Key that the fault manipulate crosses; Promptly comprise and took place between age at failure to write or the Key change record of the Key of data updated, and be saved in the saveset of each replica server.So, only need the Key that preserves data to get final product in the change record, need not preserve the Value of data, cost is very little, saves resource.
Step 102: after said replica server recovered from fault, said Collaboration Server was initiated the data repair operation according to said Key change record to said replica server;
Particularly, said Collaboration Server obtains in the said Key change record all corresponding data trnascriptions of each Key, and identifies the data trnascription that each Key is corresponding in the said Key change record last time operated; The said last time that use identifies carried out the data trnascription of operation said replica server was initiated the data repair operation.
Here, Collaboration Server carries out version relatively through a plurality of data trnascriptions to identical Key in all data trnascriptions that obtained, and obtains the data trnascription that the corresponding last time of said each Key operated.
Here; Said Collaboration Server can read the corresponding data trnascription of said each Key and from self obtaining the corresponding data trnascription of said each Key, accomplish obtaining of all corresponding data trnascriptions of said each Key from all corresponding replica servers of said each Key.
Step 103: the data repair operation that said replica server is initiated according to said Collaboration Server, carry out the local data reparation.
Particularly, the data trnascription that said replica server was operated according to the last time that each Key is corresponding in the said Key change record upgrades the local data trnascription of preserving.
Here, the data trnascription that said replica server uses when initiating the data repair operation is saved in this locality, the renewal of completion local data copy with the Value of Key that operation took place between age at failure to write or upgraded and correspondence and version number information etc.
Here, after step 103, said method also comprises: said replica server returns the reparation result to said Collaboration Server after upgrading the local data trnascription of preserving; When said reparation result was failure, said Collaboration Server continued to initiate the Data Update operation to said replica server.In said reparation result is successfully the time, finishes current data repair process.
Accordingly; The present invention also provides a kind of distributed caching system of the data repair when being used for temporary derangement, and said system comprises: Collaboration Server and one or more replica server, wherein; Collaboration Server; Be used for when said one or more replica servers are initiated data manipulation, finding to have the replica server fault, then generate the Key change record that includes each the data Key that operated; And, be used for after said replica server recovers from fault,, initiating the data repair operation to said replica server according to said Key change record; Said one or more replica server is used for after fault recovery, carries out the local data reparation according to the data repair operation that said Collaboration Server is initiated.
Wherein, said Collaboration Server also is used for said each replica server and sets up saveset; Between said each replica server age at failure, generate the Key change record include each data Key that the fault manipulate crosses, and be saved in the saveset of said each replica server.
Particularly; Said Collaboration Server; Also be used for obtaining all corresponding data trnascriptions of each Key of said Key change record; Identify in the said Key change record data trnascription that the corresponding last time of each Key operate, and the data trnascription that uses said last time of identifying to operate is operated to said replica server initiation data repair.
Wherein, said replica server also is used for initiating the data trnascription that said last time that data repair manipulates operate, the data trnascription of renewal this locality preservation according to said Collaboration Server.
Wherein, said replica server can also be used for after upgrading the local data trnascription of preserving, returning the reparation result to said Collaboration Server; Said Collaboration Server can also be used for when the reparation result of said replica server feedback is failure, continues to initiate the Data Update operation to said replica server, carries out data repair again, is successfully up to said reparation result.
Embodiment one
In the present embodiment; Distributed caching system by server node and client constitute is as shown in Figure 2; This distributed caching system comprises three server nodes (first server node, second server node and the 3rd server node) and two clients (first client and second client); Wherein, each client and each server node connect, and connect mutually between server node.
After client is initiated the Data Update operation, in data updating process, carry out the concrete implementation procedure of the data repair of temporary derangement, as shown in Figure 3, concrete steps are following:
Step 301, first client are initiated the Data Update operation, select a station server node as Collaboration Server according to the Key of data, and to said Collaboration Server is sent the Data Update request to a Key-Value;
Particularly; Key for a particular data; Can the server cluster of distributed caching system be can be regarded as the cluster of a Collaboration Server and a plurality of replica servers according to certain priority, different Key possibly have different Collaboration Servers and replica server.In addition, Collaboration Server choose the network condition that also needs reference equivalent, this network condition comprises that whether normal the operating state of each server node etc.
In the present embodiment, upgrade Key and the current network condition of the data of operation as required, select first server node as Collaboration Server.
Step 302, Collaboration Server receive said Data Update request, and the Key and the Value of the data of sending stores the renewal local data when said first client sent the Data Update request.
Here, when Collaboration Server upgrades local data,, then return the response of upgrading failure, can return step 301 and carry out again, can also finish current flow process to said first client if upgrade failure.
Step 303, Collaboration Server identifies the corresponding replica server of Key of said data according to certain rule, and initiates Data Update to each replica server that identifies and operate;
Here, Collaboration Server can be according to consistency Hash rule or according to discerning replica server by the field chopping rule.
For example; Can obtain the corresponding cryptographic hash of Key of said data through hash algorithm; Find correspondence to preserve other server nodes of the corresponding data trnascription of said Key by resulting cryptographic hash, other server nodes that found are the corresponding replica server of Key of said data.
In the present embodiment, Collaboration Server identifies the second server node and the 3rd server node is the corresponding replica server of said Key, sends the Data Update request to second server node and the 3rd server node, initiates the Data Update operation.
Step 304, Collaboration Server are after initiating the Data Update operation, and finding has server node to have fault in the corresponding replica server of said Key, generates the change record of said Key and is temporary in this locality;
Particularly, if there is fault in server node, server node can't receive information and send information.Collaboration Server is when initiating the Data Update operation to each replica server, and discovery can't be initiated the Data Update operation to a replica server, in the time of promptly can't this replica server being sent in the Data Update request, thinks that then there is fault in this replica server.
In the present embodiment, Collaboration Server is found to have fault as the 3rd server node of replica server, at this moment, generates the change record of said Key and is temporary in this locality.
Here, the change record of said Key comprises all Key that carried out current renewal operation.
Step 305, Collaboration Server receive the response that each replica server of normal operation returns, and the renewal operating result that will include the local update result of response that each replica server returns and said Collaboration Server returns to first client;
Here; After each replica server of normal operation received the Data Update request of Collaboration Server initiation, Key and the Value with data in the said Data Update request stored respectively, upgraded local data; If upgrade successfully; Then return and upgrade successful response,, then return the response of upgrading failure to said Collaboration Server if upgrade failure to said Collaboration Server.
In the practical application, under the not enough situation or analogue of memory capacity, the result of failure can appear upgrading.
If all replica servers all return the response of upgrading failure, then Collaboration Server is thought and is this time upgraded operation failure, can return step 303 or step 301 and carry out again, can also finish current flow process; Otherwise Collaboration Server thinks that this time renewal is operated successfully, can continue flow.
Here; If carrying out local data, upgrades successfully said Collaboration Server; Then return expression and upgrade successful local update result to said first client; Upgrade failure if said Collaboration Server carries out local data, then return the local update result that failure is upgraded in expression to said first client.
Said local update result carries out Data Update for said Collaboration Server
Step 306, replica server recovers normal in the fault, and beginning externally provides service;
Step 307, Collaboration Server find that replica server recovers normal, and the change record that generates in the load step 304 prepares to carry out data repair;
In the practical application; Replica server in the fault can be built couplet with said Collaboration Server after recovering normally, connects said Collaboration Server again; And can be after connection; Each server node (comprising Collaboration Server) beginning externally provides service in the distribution of notifications formula caching system, so, can know just after the notice of Collaboration Server replica server in receiving fault that replica server has recovered normal.
Step 308, Collaboration Server read the Key of the said data of upgrading operation and the version number information of Value and correspondence according to the change record that generates in the step 304 from local and all replica servers, obtain a plurality of copies of said data;
Particularly; Collaboration Server is initiated data read operation to each replica server (comprising the replica server that from fault, recovers) respectively; And carry out local data and read; Each replica server returns the result that reads who includes said data trnascription to Collaboration Server, obtains said data and is kept at the copy in each server node (comprising Collaboration Server and all replica servers).
Step 309, Collaboration Server carries out version relatively to a plurality of copies that obtain in the step 308, identifies the copy of last update;
Particularly, Collaboration Server compares the version number information of said each copy of data, identifies the copy that the last time upgrades.
Step 310, Collaboration Server carries out data repair to the replica server that from temporary derangement, recovers in the step 306, uses the copy of the last update operation that draws in the step 309;
Particularly, Collaboration Server uses the copy of the last update operation that draws in the step 309, initiates data repair to the replica server that from temporary derangement, recovers (the 3rd server node of present embodiment).
In the practical application, the replica server that Collaboration Server recovers in said temporary derangement sends the data repair request, and this data repair request package contains the copy of said data last update operation.
Step 311, the replica server that from temporary derangement, recovers is accepted data repair, carries out local data and upgrades; And return and repair the result to Collaboration Server; If repair successfully, then finish current flow process, if repairing failure; Then return step 307 and repeat data repair, up to said data repair success.
Particularly; The replica server that from temporary derangement, recovers receives the data repair request that said Collaboration Server sends; From said data repair request, extract the copy of said data last update operation; And the Key of data described in the copy of said data last update operation preserved with Value, accomplish the local data renewal.
Here,, then repair successfully, return expression to said Collaboration Server and repair successful reparation result, finish current flow process if the said replica server that from temporary derangement, recovers upgrades the local data success; If the said replica server that from temporary derangement, recovers upgrades the local data failure, then repairing failure returns the reparation result of expression repairing failure to said Collaboration Server, and returns step 307 and repeat data repair, up to said data repair success.So, after client was initiated the Data Update operation, the server node that sends temporary derangement can in time carry out data repair after recovery, guaranteed the consistency of each copy of data.
In the practical application, the replica server that recovers in the temporary derangement in the repair process once more fault or network failure or server busy for a long time response all can cause and revise failure.
The above is merely preferred embodiment of the present invention, is not to be used to limit protection scope of the present invention.

Claims (13)

1. the data recovery method a during temporary derangement is characterized in that said method comprises:
When Collaboration Server is initiated data manipulation to replica server, find to have the replica server fault, then generate the Key change record that includes each data key (Key) of operating;
After said replica server recovered from fault, said Collaboration Server was initiated the data repair operation according to said Key change record to said replica server;
The local data reparation is carried out in the data repair operation that said replica server is initiated according to said Collaboration Server.
2. the data recovery method during according to the said temporary derangement of claim 1 is characterized in that, and is said to replica server initiation data manipulation, comprising: initiate the write operation of data or upgrade operation to replica server.
3. the data recovery method during according to claim 1 or 2 said temporary derangements is characterized in that, said generation includes the Key change record of each the data Key that carried out operation between age at failure, also comprises:
Said Collaboration Server is that said replica server is set up saveset;
Between said replica server age at failure, said Collaboration Server generates the Key change record that includes each data Key that the fault manipulate crosses, and is saved in the saveset of said replica server.
4. the data recovery method during according to claim 1 or 3 said temporary derangements is characterized in that, said Collaboration Server is initiated the data repair operation according to the Key change record of being preserved to said replica server, comprising:
Said Collaboration Server obtains in the said Key change record all corresponding data trnascriptions of each Key, and identifies the data trnascription that each Key is corresponding in the said Key change record last time operated;
The said last time that use identifies carried out the data trnascription of operation said replica server was initiated the data repair operation.
5. the data recovery method during according to the said temporary derangement of claim 4 is characterized in that, the data trnascription that the said last time of identifying each Key correspondence in the said Key change record operated, for:
A plurality of data trnascriptions of identical Key in all data trnascriptions that obtained are carried out version relatively, obtain the data trnascription that the corresponding last time of said each Key operated.
6. the data recovery method during according to the said temporary derangement of claim 4 is characterized in that, said Collaboration Server obtains all corresponding data trnascriptions of each Key in the said Key change record, for:
Said Collaboration Server reads the corresponding data trnascription of said each Key and from self obtaining the corresponding data trnascription of said each Key from all corresponding replica servers of said each Key.
7. the data recovery method during according to the said temporary derangement of claim 4 is characterized in that, the local data reparation is carried out in the data repair operation that said replica server is initiated according to said Collaboration Server, comprising:
The data trnascription that said replica server was operated according to the last time that each Key is corresponding in the said Key change record upgrades the local data trnascription of preserving.
8. the data recovery method during according to the said temporary derangement of claim 7 is characterized in that, in the data repair operation that said replica server is initiated according to said Collaboration Server, carries out after the local data reparation, and said method also comprises:
Said replica server returns the reparation result to said Collaboration Server after upgrading the local data trnascription of preserving;
When said reparation result was failure, said Collaboration Server continued to initiate the Data Update operation to said replica server.
9. the distributed caching system of a data repair when being used for temporary derangement is characterized in that said system comprises: Collaboration Server and one or more replica server, wherein,
Collaboration Server is used for when said one or more replica servers are initiated data manipulation, finding to have the replica server fault, then generates the Key change record that includes each the data Key that operated; And, be used for after said replica server recovers from fault,, initiating the data repair operation to said replica server according to said Key change record;
Said one or more replica server is used for after fault recovery, carries out the local data reparation according to the data repair operation that said Collaboration Server is initiated.
10. distributed caching according to claim 9 system is characterized in that,
Said Collaboration Server also is used for said each replica server and sets up saveset; Between said each replica server age at failure, generate the Key change record include each data Key that the fault manipulate crosses, and be saved in the saveset of said each replica server.
11. distributed caching according to claim 9 system; It is characterized in that; Said Collaboration Server; Also be used for obtaining all corresponding data trnascriptions of each Key of said Key change record, identify the data trnascription that each Key is corresponding in the said Key change record last time operate, and the data trnascription that uses said last time of identifying to operate is operated said replica server initiation data repair.
12. distributed caching according to claim 11 system; It is characterized in that; Said replica server also is used for initiating the data trnascription that said last time that data repair manipulates operate, the data trnascription of renewal this locality preservation according to said Collaboration Server.
13. distributed caching according to claim 12 system is characterized in that,
Said replica server also is used for after upgrading the local data trnascription of preserving, returning the reparation result to said Collaboration Server;
Said Collaboration Server also is used for when the reparation result of said replica server feedback is failure, continues to initiate the Data Update operation to said replica server.
CN201110157693.1A 2011-06-13 2011-06-13 Data recovery method and distributed cache system during temporary derangement Active CN102833273B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110157693.1A CN102833273B (en) 2011-06-13 2011-06-13 Data recovery method and distributed cache system during temporary derangement
PCT/CN2012/070849 WO2012171345A1 (en) 2011-06-13 2012-02-02 Method and distributed cache system for data recovery in temporary fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110157693.1A CN102833273B (en) 2011-06-13 2011-06-13 Data recovery method and distributed cache system during temporary derangement

Publications (2)

Publication Number Publication Date
CN102833273A true CN102833273A (en) 2012-12-19
CN102833273B CN102833273B (en) 2017-11-03

Family

ID=47336243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110157693.1A Active CN102833273B (en) 2011-06-13 2011-06-13 Data recovery method and distributed cache system during temporary derangement

Country Status (2)

Country Link
CN (1) CN102833273B (en)
WO (1) WO2012171345A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104238963A (en) * 2014-09-30 2014-12-24 华为技术有限公司 Data storage method, device and system
CN104778179A (en) * 2014-01-14 2015-07-15 阿里巴巴集团控股有限公司 Data migration test method and system
CN105589887A (en) * 2014-10-24 2016-05-18 中兴通讯股份有限公司 Data processing method for distributed file system and distributed file system
WO2016206568A1 (en) * 2015-06-26 2016-12-29 阿里巴巴集团控股有限公司 Data update method, device, and related system
CN107153671A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 A kind of method and apparatus for realizing the read-write of multifile copy in a distributed system
CN108055159A (en) * 2017-12-21 2018-05-18 郑州云海信息技术有限公司 A kind of clustered node operation synchronous method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105338026B (en) * 2014-07-24 2018-10-09 阿里巴巴集团控股有限公司 The acquisition methods of data resource, device and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567805B (en) * 2009-05-22 2011-12-28 清华大学 Method for recovering failed parallel file system
CN101697168B (en) * 2009-10-22 2011-10-19 中国科学技术大学 Method and system for dynamically managing metadata of distributed file system
CN101964820B (en) * 2010-10-08 2014-04-09 中兴通讯股份有限公司 Method and system for keeping data consistency
CN102024016B (en) * 2010-11-04 2013-03-13 曙光信息产业股份有限公司 Rapid data restoration method for distributed file system (DFS)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴华等: "分布式文件系统中恢复机制的研究", 《微计算机信息》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778179A (en) * 2014-01-14 2015-07-15 阿里巴巴集团控股有限公司 Data migration test method and system
CN104778179B (en) * 2014-01-14 2019-05-28 阿里巴巴集团控股有限公司 A kind of Data Migration test method and system
CN104238963A (en) * 2014-09-30 2014-12-24 华为技术有限公司 Data storage method, device and system
CN104238963B (en) * 2014-09-30 2017-08-11 华为技术有限公司 A kind of date storage method, storage device and storage system
CN105589887A (en) * 2014-10-24 2016-05-18 中兴通讯股份有限公司 Data processing method for distributed file system and distributed file system
WO2016206568A1 (en) * 2015-06-26 2016-12-29 阿里巴巴集团控股有限公司 Data update method, device, and related system
CN107153671A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 A kind of method and apparatus for realizing the read-write of multifile copy in a distributed system
CN107153671B (en) * 2016-03-02 2020-11-24 阿里巴巴集团控股有限公司 Method and equipment for realizing multi-file copy reading and writing in distributed system
CN108055159A (en) * 2017-12-21 2018-05-18 郑州云海信息技术有限公司 A kind of clustered node operation synchronous method and device

Also Published As

Publication number Publication date
CN102833273B (en) 2017-11-03
WO2012171345A1 (en) 2012-12-20

Similar Documents

Publication Publication Date Title
US11314458B2 (en) Global de-duplication of virtual disks in a storage platform
US11237864B2 (en) Distributed job scheduler with job stealing
CN101964820B (en) Method and system for keeping data consistency
CN110168580B (en) Fault tolerant recovery system and method when training classifier models using distributed systems
CN102594849B (en) Data backup and recovery method and device, virtual machine snapshot deleting and rollback method and device
US9727273B1 (en) Scalable clusterwide de-duplication
CN102833273A (en) Data restoring method when meeting temporary fault and distributed caching system
CN103098016B (en) De-duplication based backup of file systems
CN101334797B (en) Distributed file systems and its data block consistency managing method
CN111656326B (en) System and method for performing database backup for repair-free recovery
US20140025638A1 (en) Method, system and serving node for data backup and restoration
US10387271B2 (en) File system storage in cloud using data and metadata merkle trees
CA2923068A1 (en) Method and system for metadata synchronization
US20180225051A1 (en) Managing data replication in a data grid
US8438130B2 (en) Method and system for replicating data
CN103034566A (en) Method and device for restoring virtual machine
US20190243682A1 (en) Real-time distributed job scheduler with job self-scheduling
WO2020040958A1 (en) Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
CN112579550B (en) Metadata information synchronization method and system of distributed file system
CN115955488A (en) Distributed storage copy cross-computer room placement method and device based on copy redundancy
CN111752892A (en) Distributed file system, method for implementing the same, management system, device, and medium
CN111770158B (en) Cloud platform recovery method and device, electronic equipment and computer readable storage medium
KR101797482B1 (en) Method, apparatus, and computer program stored in computer readable medium for recoverying block in database system
JP2015148919A (en) storage system
CN113965582A (en) Mode conversion method and system, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant