CN103095767A - Distributed cache system and data reconfiguration method based on distributed cache system - Google Patents

Distributed cache system and data reconfiguration method based on distributed cache system Download PDF

Info

Publication number
CN103095767A
CN103095767A CN2011103435923A CN201110343592A CN103095767A CN 103095767 A CN103095767 A CN 103095767A CN 2011103435923 A CN2011103435923 A CN 2011103435923A CN 201110343592 A CN201110343592 A CN 201110343592A CN 103095767 A CN103095767 A CN 103095767A
Authority
CN
China
Prior art keywords
data
malfunctioning
service node
node
data message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103435923A
Other languages
Chinese (zh)
Other versions
CN103095767B (en
Inventor
李豪伟
陈典强
郭斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201110343592.3A priority Critical patent/CN103095767B/en
Publication of CN103095767A publication Critical patent/CN103095767A/en
Application granted granted Critical
Publication of CN103095767B publication Critical patent/CN103095767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a distributed cache system and a data reconfiguration method based on the distributed cache system. The method comprises that one or more than one failure nodes receive first data information from a main server node after faults in the one or more than one failure nodes in the server node are eliminated and the server node is re-started, data reconfiguration operation corresponding to the first data information is executed; and the one or more than one failure nodes receive second data information from the main server node record when the first data information is sent out, and data reconfiguration operation corresponding to the second data information is executed. The distributed cache system and the data reconfiguration method based on the distributed cache system can achieve that data of the failure nodes and other normal work nodes can stay the same and greatly improve availability effects of the distributed cache system.

Description

Distributed cache system reaches the data reconstruction method based on distributed cache system
Technical field
The present invention relates to the communications field, in particular to a kind of distributed cache system and based on the data reconstruction method of distributed cache system.
Background technology
cloud computing (Cloud Computing) is grid computing (Grid Computing), Distributed Calculation (Distributed Computing), parallel computation (Parallel Computing), effectiveness is calculated (Utility Computing), the network storage (Network Storage Technologies), virtual (Virtualization), the product that the traditional calculations machine technology such as load balancing (Load Balance) and network technical development merge.It is intended to by network, the relatively low computational entity of a plurality of costs is integrated into a system with powerful calculating ability.Distributed caching is a field in the cloud computing category, and its effect is to provide the ability of distributed storage service and the high-speed read-write access of mass data.This distributed cache system is connected to each other by some server nodes and client and consists of.In general, the data that write can not only be kept on the individual server node, but preserve the copy of same data on many nodes, backup each other.Data are made of key (Key) and value (Value) two parts, and wherein, Key is equivalent to the index of data, and Value is the data content of Key representative.Key and Value are man-to-man relations in logic.Server node is responsible for store and management data in internal memory and disk, and at a plurality of copies of a plurality of server nodes storage data, be used for guaranteeing that the part server node delays after machine, whole system still can be used other copy datas to continue as to use normal service is provided; The operations such as client can write the server node data, reads, upgrades, deletion.
In distributed cache system, certain server node fault is delayed after machine, the meeting obliterated data, this node is delayed can't store the data of new generation during machine, the copy data that will cause the different server node to store like this is inconsistent, can cause application access to arrive wrong data, this is a more insoluble technical problem.
Summary of the invention
Break down after the machine of delaying for the part server node of distributed cache system in correlation technique, the meeting obliterated data, this failed server node fix a breakdown again add system after, with the inconsistent problem of other server node save datas, the invention provides a kind of distributed cache system and based on the data reconstruction method of distributed cache system, to address the above problem at least.
According to an aspect of the present invention, provide a kind of data reconstruction method based on distributed cache system.
Data reconstruction method based on distributed cache system according to the present invention comprises: after the one or more malfunctioning nodes from service node are eliminated fault and reactivated, one or more malfunctioning nodes receive the first data message that comes from main service node, and the execution data reconstruction operation corresponding with the first data message, wherein, the first data message comprises: the data change information after the data of one or more malfunctioning node requests and/or one or more malfunctioning node break down during reactivate; When the first data message sends, one or more malfunctioning nodes receive the second data message that comes from main service node record, and the execution data reconstruction operation corresponding with the second data message, wherein, the second data message comprises: from one or more malfunctioning nodes reactivate, and the data change information of the data that main service node gets and/or record.
In said method, one or more malfunctioning nodes also comprise before receiving and coming from the first data message of main service node: the data change information after main service node records one or more malfunctioning nodes and breaks down during reactivate; Main service node receives the request of obtaining the first data message that comes from one or more malfunctioning nodes.
In said method, main service node also comprises after receiving and coming from the request of obtaining the first data message of one or more malfunctioning nodes: main service node record reactivates the data change information of beginning from one or more malfunctioning nodes.
In said method, one or more malfunctioning nodes are carried out the data reconstruction operation corresponding with the first data message and comprised: the data in the first data message that one or more malfunctioning nodes will receive are preserved, and/or write and/or deletion action according to the data change information and executing in the first data message.
In said method, one or more malfunctioning nodes are carried out the data reconstruction operation corresponding with the second data message and comprised: the data in the second data message that one or more malfunctioning nodes will receive are preserved, and/or write and/or deletion action according to the data change information and executing in the second data message.
In said method, after one or more malfunctioning nodes receive the second data message that comes from main service node record, also comprise: main service node sends to one or more malfunctioning nodes the message that the second data message is sent, and current time is recorded as first constantly; Main service node receives the response message of the recovery normal service that comes from one or more malfunctioning nodes, and current time is recorded as second constantly; The judgement of main service node from first constantly to second constantly during, whether also have the data of one or more malfunctioning nodes and/or the data change information of record of not sending to; If have, continue to send to one or more malfunctioning nodes the data change information of data and/or record, if do not have, directly send the complete message of data reconstruction to one or more malfunctioning nodes.
In said method, the data of one or more malfunctioning node requests comprise following one of at least: internal storage data, data in magnetic disk.
In said method, the data change information after one or more malfunctioning nodes break down during reactivate comprises: the data in magnetic disk modification information.
According to a further aspect in the invention, provide a kind of distributed cache system.
Distributed cache system according to the present invention comprises: main service node and at least one is from service node; Comprise from service node: the first receiver module, be used for after eliminating fault from one or more malfunctioning nodes of service node and reactivating, reception comes from the first data message of main service node, wherein, the first data message comprises: the data change information after the data of one or more malfunctioning node requests and/or one or more malfunctioning node break down during reactivate; The first reconstructed module is used for carrying out the data reconstruction operation corresponding with the first data message; The second receiver module, be used for when the first data message sends, reception comes from the second data message of main service node record, wherein, the second data message comprises: from one or more malfunctioning nodes reactivate, and the data change information of the data that main service node gets and/or record; The second reconstructed module is used for carrying out the data reconstruction operation corresponding with the second data message.
In said system, main service node comprises: the first logging modle, the data change information after being used for recording one or more malfunctioning nodes and breaking down during reactivate; The 3rd receiver module is used for receiving the request of obtaining the first data message that comes from one or more malfunctioning nodes.
In said system, main service node also comprises: the second logging modle is used for record and reactivates the data change information of beginning from one or more malfunctioning nodes.
In said system, the first reconstructed module comprises: the first storage unit, preserve for the data of first data message that will receive; The first performance element is used for writing and/or deletion action according to the data change information and executing of the first data message.
In said system, the second reconstructed module comprises: the second storage unit, preserve for the data of second data message that will receive; The second performance element is used for writing and/or deletion action according to the data change information and executing of the second data message.
In said system, main service node also comprises: the first sending module is used for sending to one or more malfunctioning nodes the message that the second data message is sent; The 3rd logging modle is used for current time is recorded as first constantly; The 4th receiver module is used for the response message that reception comes from the recovery normal service of one or more malfunctioning nodes; The 4th logging modle is used for current time is recorded as second constantly; Judge module, be used for main service node judgement from first constantly to second constantly during, whether also have the data of one or more malfunctioning nodes and/or the data change information of record of not sending to; The second sending module is used for being output as when being at judge module, continues to send the first data that constantly get during constantly to second and/or the data change information of record to one or more malfunctioning nodes; Be output as when no at judge module, directly send the complete message of data reconstruction to one or more malfunctioning nodes.
By the present invention, the part server node that the has solved distributed cache system obliterated data after the machine of delaying that breaks down, this failed server node fix a breakdown again add system after, the inconsistent problem of data with other server nodes preservations, and then the data that reached malfunctioning node and other normal operation nodes are consistent, and have greatly improved the effect of distributed cache system availability.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the distributed cache system schematic diagram that is made of server node and client according to the embodiment of the present invention;
Fig. 2 is the flow chart based on the data reconstruction method of distributed cache system according to the embodiment of the present invention;
Fig. 3 is according to the preferred embodiment of the invention based on the flow chart of the data reconstruction method of distributed cache system;
Fig. 4 is the structured flowchart according to the distributed cache system of the embodiment of the present invention;
Fig. 5 is the structured flowchart of distributed cache system according to the preferred embodiment of the invention.
Embodiment
Hereinafter also describe in conjunction with the embodiments the present invention in detail with reference to accompanying drawing.Need to prove, in the situation that do not conflict, embodiment and the feature in embodiment in the application can make up mutually.
Fig. 1 is the distributed cache system schematic diagram that is made of server node and client according to the embodiment of the present invention.As shown in Figure 1, the copy of a plurality of server node storage data of configuration and data in above-mentioned distributed cache system, a plurality of cluster service nodes in client and distributed cache system connect, and connect mutually and normal operation between each server node in the cluster service node.Key to some particular datas, can regard a few station server nodes in server cluster as a Collaboration Server (also can call main service node) and a plurality of replica server (also can call from service node) according to certain priority in logic, different Key may have different Collaboration Servers and replica server.Synergist is responsible for processing the request from client, and data are write other several replica servers.
Need to prove, choosing of Collaboration Server needs basis network condition at that time.
Fig. 2 is the flow chart based on the data reconstruction method of distributed cache system according to the embodiment of the present invention.As shown in Figure 2, the method mainly comprises following processing:
Step S202: after the one or more malfunctioning nodes from service node are eliminated fault and reactivated, one or more malfunctioning nodes receive the first data message that comes from main service node, and the execution data reconstruction operation corresponding with the first data message, wherein, the first data message comprises: the data change information after the data of one or more malfunctioning node requests and/or one or more malfunctioning node break down during reactivate;
Step S204: when the first data message sends, one or more malfunctioning nodes receive the second data message that comes from main service node record, and the execution data reconstruction operation corresponding with the second data message, wherein, the second data message comprises: from one or more malfunctioning nodes reactivate, and the data change information of the data that main service node gets and/or record.
in correlation technique, after the part server node of distributed cache system breaks down the machine of delaying, the meeting obliterated data, this failed server node fix a breakdown again add system after, with the inconsistent problem of other server node save datas, adopt method as shown in Figure 2, the part server node that the has solved distributed cache system obliterated data after the machine of delaying that breaks down, this failed server node fix a breakdown again add system after, the inconsistent problem of data with other server nodes preservations, and then the data that reached malfunctioning node and other normal operation nodes are consistent, greatly improved the effect of distributed cache system availability.
Preferably, before above-mentioned one or more malfunctioning nodes receptions come from the first data message of main service node, can also comprise following processing:
(1) the data change information after main service node records one or more malfunctioning nodes and breaks down during reactivate;
In preferred implementation process, any record of malfunctioning node storage, at least also have one or more copy datas to be stored on other service nodes, after malfunctioning node breaks down, main service node sends to the data change request (for example: data writing or deletion data) of malfunctioning node and all carries out unsuccessfully, main service node can get off the failed information recording/of these changes, is convenient to allow malfunctioning node recover data message;
(2) main service node receives the request of obtaining the first data message that comes from one or more malfunctioning nodes;
In preferred implementation process, the data of one or more malfunctioning node requests can include but not limited to following one of at least: internal storage data, data in magnetic disk.
In preferred implementation process, the data change information after one or more malfunctioning nodes break down during reactivate can include but not limited to: the data in magnetic disk modification information.
For example, if malfunctioning node Request-rebuild internal storage data sends to main service node the full memory copy data request of obtaining; If malfunctioning node Request-rebuild data in magnetic disk, whether first failure judgement node is all lost because the disk failure Replace Disk and Press Anykey To Reboot causes data; If so, please obtain whole copy datas of this disk storage to main service node; If not, the copy data modification information that only produces after breaking down to main service node acquisition request malfunctioning node.
Need to prove, only at disk failure, and when being replaced new disk, data in magnetic disk just can all be lost, and in such cases, need to obtain whole data in magnetic disk to other nodes, carries out data reconstruction; And if disk itself does not have fault, data file still exists, such as just abnormal program termination service, only need to obtain after malfunctioning node breaks down to other nodes, the data change information that other nodes receive, and carry out the operation of these data changes, guarantee with this consistent with other service node copy datas, thereby greatly improve the speed of malfunctioning node data reconstruction.
Preferably, after above-mentioned main service node reception comes from the request of obtaining the first data message of one or more malfunctioning nodes, can also comprise following processing: main service node record reactivates the data change information of beginning from one or more malfunctioning nodes.
Need to prove, malfunctioning node needs a process from other service nodes receptions and stored copies data, distributed cache system still can receive writing and/or removal request from UE during this period of time, malfunctioning node can be recorded in data change information in daily record according to main service node, the operation that sends to main service node during reconstruct is reformed one time, guarantee that above-mentioned data change information is all carried out at malfunctioning node, guaranteed that thus copy data and other service nodes after malfunctioning node reconstruct are on all four.
Preferably, above-mentioned one or more malfunctioning node is carried out the data reconstruction operation corresponding with the first data message and be may further include following processing: the data in the first data message that one or more malfunctioning nodes will receive are preserved, and/or write and/or deletion action according to the data change information and executing in the first data message.
Preferably, above-mentioned one or more malfunctioning node is carried out the data reconstruction operation corresponding with the second data message and also be may further include following processing: the data in the second data message that one or more malfunctioning nodes will receive are preserved, and/or write and/or deletion action according to the data change information and executing in the second data message.
Preferably, after above-mentioned one or more malfunctioning nodes receive the second data message that comes from main service node record, can also comprise following processing:
(1) main service node sends to one or more malfunctioning nodes the message that the second data message is sent, and current time was recorded as for first moment (T1);
(2) main service node receives the response message of the recovery normal service that comes from one or more malfunctioning nodes, and current time was recorded as for second moment (T2).
(3) whether the judgement of main service node during the T1 to T2, also exists the data of one or more malfunctioning nodes and/or the data change information of record of not sending to; If have, continue to send to one or more malfunctioning nodes the data change information of data and/or record, if do not have, directly send the complete message of data reconstruction to one or more malfunctioning nodes.
Below in conjunction with Fig. 3, above-mentioned preferred implementation is described further.
Fig. 3 is the flow chart of the data reconstruction method in distributed cache system according to the preferred embodiment of the invention.As shown in Figure 3, the method can comprise following treatment step:
Step S302: the malfunctioning node fault, withdraw from distributed cache system;
Step S304: after one or more the breaking down from service node, the server node that stores identical data with malfunctioning node in distributed cache system and serve as the synergist role (namely, above-mentioned main service node), the data change information of beginning record trouble node processing failure, comprise deletion record information and new data writing information, be convenient to malfunctioning node according to the data change between this information recovery age at failure;
Need to prove, after main service node breaks down, have another one and serve as synergist role's server node from service node, become new main service node, receive the data change information that comes from UE.
Step S306: after malfunctioning node is fixed a breakdown, reactivated, first do not serve, the current state of node is set to carry out data reconstruction;
Step S308: malfunctioning node sends message, data change information after acquisition request copy data and fault to the synergist server node that stores its identical copies data;
Step S310: the synergist server node, after receiving the request of malfunctioning node, the starting log record begins to record that follow-up this server node processes comes from writing and/or deletion action of UE;
Step S312: the synergist server node, the response malfunctioning node obtains request of data, if malfunctioning node will obtain the full memory data, traversal is obtained data stored in memory one by one, and sends to malfunctioning node, until travel through complete; If malfunctioning node will obtain whole disk copy datas, read data files and the data of reading are sent to malfunctioning node one by one is until travel through complete; If malfunctioning node will obtain the later copy data modification information of some fault time, according to fault time, the data change information after read failure occurs one by one, and the data of reading are sent to malfunctioning node;
Step S314: malfunctioning node receives all data, and carries out Storage and Processing;
Step S316: the synergist server node reads log information;
Step S318: the synergist server node, the log information that records during the transmission data reconstruction is to malfunctioning node;
Step S320: malfunctioning node receives log information, carries out to write and/or deletion action;
Step S322: the synergist server node, after sending current log information, recording current time is T1;
Step S324: send the notification message that log information has been sent to malfunctioning node;
Step S326: after malfunctioning node receives that log information is sent message, begin to provide normal service, process requests such as reading, write and/or delete data;
Step S328: send the notification message that this node has begun normal operation to the synergist server node;
Step S330: after the synergist server node received that malfunctioning node has begun the message of normal operation, recording current time was T2;
Step S332: the synergist server node judges whether T1 still has new log information to produce constantly constantly to T2, if having, continue execution in step S334, if do not have, forwards step S338 to;
Step S334: send log information to malfunctioning node, until log information is sent;
Step S336: malfunctioning node continue to receive and processes from the uptime log information of long server node, until receive the message that reconstruct data has been sent;
Step S338: the synergist server node sends the message that reconstruct data has been sent;
Step S340: malfunctioning node is removed the reconstituted state mark, completes data reconstruction.
Fig. 4 is the structured flowchart according to the distributed cache system of the embodiment of the present invention.As shown in Figure 4, this distributed cache system can comprise: a main service node 10 and at least one is from service node 20; Comprise from service node 20: the first receiver module 200, be used for after eliminating fault from one or more malfunctioning nodes of service node and reactivating, reception comes from the first data message of main service node, wherein, the first data message comprises: the data change information after the data of one or more malfunctioning node requests and/or one or more malfunctioning node break down during reactivate; The first reconstructed module 202 is used for carrying out the data reconstruction operation corresponding with the first data message; The second receiver module 204, be used for when the first data message sends, reception comes from the second data message of main service node record, wherein, the second data message comprises: from one or more malfunctioning nodes reactivate, and the data change information of the data that main service node gets and/or record; The second reconstructed module 206 is used for carrying out the data reconstruction operation corresponding with the second data message.
Adopt distributed cache system as shown in Figure 4, the part server node that the has solved distributed cache system obliterated data after the machine of delaying that breaks down, this failed server node fix a breakdown again add system after, the inconsistent problem of data with other server nodes preservations, and then realized that malfunctioning node and other data that work nodes are consistent, and have improved the availability of distributed cache system greatly.
Preferably, as shown in Figure 5, main service node 10 can comprise: the first logging modle 100, the data change information after being used for recording one or more malfunctioning nodes and breaking down during reactivate; The 3rd receiver module 102 is used for receiving the request of obtaining the first data message that comes from one or more malfunctioning nodes.
Preferably, as shown in Figure 5, main service node 10 can also comprise: the second logging modle 104 is used for record and reactivates the data change information of beginning from one or more malfunctioning nodes.
Preferably, as shown in Figure 5, above-mentioned the first reconstructed module 202 from service node 20 may further include: the first storage unit (not shown), preserve for the data of first data message that will receive; The first performance element (not shown) is used for writing and/or deletion action according to the data change information and executing of the first data message.
Preferably, as shown in Figure 5, above-mentioned the second reconstructed module 206 from service node 20 may further include: the second storage unit (not shown), preserve for the data of second data message that will receive; The second performance element (not shown) is used for writing and/or deletion action according to the data change information and executing of the second data message.
Preferably, as shown in Figure 5, main service node 10 can also comprise: the first sending module 106 is used for sending to described one or more malfunctioning nodes the message that described the second data message is sent; The 3rd logging modle 108 is used for current time is recorded as first constantly; The 4th receiver module 110 is used for the response message that reception comes from the recovery normal service of described one or more malfunctioning nodes; The 4th logging modle 112 is used for current time is recorded as second constantly; Judge module 114, be used for described main service node judgement from described first constantly to described second constantly during, whether also have the data of described one or more malfunctioning nodes and/or the data change information of record of not sending to; The second sending module 116 is used for being output as when being at described judge module, continues to send the described first data that constantly get during constantly to described second and/or the data change information of record to described one or more malfunctioning nodes; Be output as when no at described judge module, directly send the complete message of data reconstruction to described one or more malfunctioning nodes.
Preferably, the data of above-mentioned one or more malfunctioning node requests can include but not limited to following one of at least: internal storage data, data in magnetic disk.
Preferably, the data change information after above-mentioned one or more malfunctioning node breaks down during reactivate can include but not limited to: the data in magnetic disk modification information.
From above description, can find out, the present invention has realized following technique effect: the part server node of the distributed cache system obliterated data after the machine of delaying that breaks down, this failed server node fix a breakdown again add system after, with the data consistent that other server nodes are preserved, greatly improved the availability of distributed cache system.
obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in storage device and be carried out by calculation element, and in some cases, can carry out step shown or that describe with the order that is different from herein, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step being made into the single integrated circuit module realizes.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is only the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (14)

1. data reconstruction method based on distributed cache system, described distributed cache system comprises: main service node and at least one is characterized in that from service node, comprising:
After described one or more malfunctioning nodes from service node are eliminated fault and are reactivated, described one or more malfunctioning node receives the first data message that comes from described main service node, and the execution data reconstruction operation corresponding with described the first data message, wherein, described the first data message comprises: the data change information after the data of described one or more malfunctioning node requests and/or described one or more malfunctioning node break down during reactivate;
When described the first data message sends, described one or more malfunctioning node receives the second data message that comes from described main service node record, and the execution data reconstruction operation corresponding with described the second data message, wherein, described the second data message comprises: from described one or more malfunctioning nodes reactivate, and the data that described main service node gets and/or the data change information of record.
2. method according to claim 1, is characterized in that, described one or more malfunctioning nodes also comprise before receiving and coming from the first data message of described main service node:
Data change information after described main service node records described one or more malfunctioning node and breaks down during reactivate;
Described main service node receives the request that comes from described the first data message of obtaining of described one or more malfunctioning nodes.
3. method according to claim 2, is characterized in that, described main service node also comprises after receiving and coming from the request of described the first data message of obtaining of described one or more malfunctioning nodes:
Described main service node record reactivates the data change information of beginning from described one or more malfunctioning nodes.
4. method according to claim 1, is characterized in that, described one or more malfunctioning nodes are carried out the data reconstruction operation corresponding with described the first data message and comprised:
Data in described the first data message that described one or more malfunctioning node will receive are preserved, and/or write and/or deletion action according to the data change information and executing in described the first data message.
5. method according to claim 1, is characterized in that, described one or more malfunctioning nodes are carried out the data reconstruction operation corresponding with described the second data message and comprised:
Data in described the second data message that described one or more malfunctioning node will receive are preserved, and/or write and/or deletion action according to the data change information and executing in described the second data message.
6. method according to claim 1, is characterized in that, after described one or more malfunctioning nodes receive described the second data message that comes from described main service node record, also comprises:
Described main service node sends to described one or more malfunctioning nodes the message that described the second data message is sent, and current time is recorded as first constantly;
Described main service node receives the response message of the recovery normal service that comes from described one or more malfunctioning nodes, and current time is recorded as second constantly;
Described main service node judgement from described first constantly to described second constantly during, whether also have the data of described one or more malfunctioning nodes and/or the data change information of record of not sending to; If have, continue to send to described one or more malfunctioning nodes the data change information of described data and/or record, if do not have, directly send the complete message of data reconstruction to described one or more malfunctioning nodes.
7. the described method of any one according to claim 1 to 6, is characterized in that, the data of described one or more malfunctioning node requests comprise following one of at least: internal storage data, data in magnetic disk.
8. the described method of any one according to claim 1 to 6, is characterized in that, the data change information after described one or more malfunctioning nodes break down during reactivate comprises: the data in magnetic disk modification information.
9. a distributed cache system, is characterized in that, described distributed cache system comprises: main service node and at least one is from service node;
Describedly comprise from service node:
The first receiver module, be used for after described one or more malfunctioning nodes from service node are eliminated fault and reactivated, reception comes from the first data message of described main service node, wherein, described the first data message comprises: the data change information after the data of described one or more malfunctioning node requests and/or described one or more malfunctioning node break down during reactivate;
The first reconstructed module is used for carrying out the data reconstruction operation corresponding with described the first data message;
The second receiver module, be used for when described the first data message sends, reception comes from the second data message of described main service node record, wherein, described the second data message comprises: from described one or more malfunctioning nodes reactivate, and the data that described main service node gets and/or the data change information of record;
The second reconstructed module is used for carrying out the data reconstruction operation corresponding with described the second data message.
10. system according to claim 9, is characterized in that, described main service node comprises:
The first logging modle, the data change information after being used for recording described one or more malfunctioning node and breaking down during reactivate;
The 3rd receiver module is used for receiving the request that comes from described the first data message of obtaining of described one or more malfunctioning nodes.
11. system according to claim 10 is characterized in that, described main service node also comprises:
The second logging modle is used for record and reactivates the data change information of beginning from described one or more malfunctioning nodes.
12. system according to claim 9 is characterized in that, described the first reconstructed module comprises:
The first storage unit is preserved for the data of described first data message that will receive;
The first performance element is used for writing and/or deletion action according to the data change information and executing of described the first data message.
13. system according to claim 9 is characterized in that, described the second reconstructed module comprises:
The second storage unit is preserved for the data of described second data message that will receive;
The second performance element is used for writing and/or deletion action according to the data change information and executing of described the second data message.
14. system according to claim 9 is characterized in that, described main service node also comprises:
The first sending module is used for sending to described one or more malfunctioning nodes the message that described the second data message is sent;
The 3rd logging modle is used for current time is recorded as first constantly;
The 4th receiver module is used for the response message that reception comes from the recovery normal service of described one or more malfunctioning nodes;
The 4th logging modle is used for current time is recorded as second constantly;
Judge module, be used for described main service node judgement from described first constantly to described second constantly during, whether also have the data of described one or more malfunctioning nodes and/or the data change information of record of not sending to;
The second sending module is used for being output as when being at described judge module, continues to send the described first data that constantly get during constantly to described second and/or the data change information of record to described one or more malfunctioning nodes; Be output as when no at described judge module, directly send the complete message of data reconstruction to described one or more malfunctioning nodes.
CN201110343592.3A 2011-11-03 2011-11-03 Distributed cache system and data reconstruction method based on distributed cache system Active CN103095767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110343592.3A CN103095767B (en) 2011-11-03 2011-11-03 Distributed cache system and data reconstruction method based on distributed cache system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110343592.3A CN103095767B (en) 2011-11-03 2011-11-03 Distributed cache system and data reconstruction method based on distributed cache system

Publications (2)

Publication Number Publication Date
CN103095767A true CN103095767A (en) 2013-05-08
CN103095767B CN103095767B (en) 2019-04-23

Family

ID=48207895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110343592.3A Active CN103095767B (en) 2011-11-03 2011-11-03 Distributed cache system and data reconstruction method based on distributed cache system

Country Status (1)

Country Link
CN (1) CN103095767B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106030501A (en) * 2014-09-30 2016-10-12 株式会社日立制作所 Distributed storage system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617139A (en) * 2003-11-15 2005-05-18 鸿富锦精密工业(深圳)有限公司 Electronic sending file synchronous system and method
CN102025758A (en) * 2009-09-18 2011-04-20 成都市华为赛门铁克科技有限公司 Method, device and system fore recovering data copy in distributed system
CN102024044A (en) * 2010-12-08 2011-04-20 华为技术有限公司 Distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617139A (en) * 2003-11-15 2005-05-18 鸿富锦精密工业(深圳)有限公司 Electronic sending file synchronous system and method
CN102025758A (en) * 2009-09-18 2011-04-20 成都市华为赛门铁克科技有限公司 Method, device and system fore recovering data copy in distributed system
CN102024044A (en) * 2010-12-08 2011-04-20 华为技术有限公司 Distributed file system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106030501A (en) * 2014-09-30 2016-10-12 株式会社日立制作所 Distributed storage system
CN106030501B (en) * 2014-09-30 2020-02-07 株式会社日立制作所 System, method and distributed storage system

Also Published As

Publication number Publication date
CN103095767B (en) 2019-04-23

Similar Documents

Publication Publication Date Title
JP6522812B2 (en) Fast Crash Recovery for Distributed Database Systems
CN107544862B (en) Stored data reconstruction method and device based on erasure codes and storage node
CN103268318B (en) A kind of distributed key value database system of strong consistency and reading/writing method thereof
US11397648B2 (en) Virtual machine recovery method and virtual machine management device
CN106776130B (en) Log recovery method, storage device and storage node
CN103516736A (en) Data recovery method of distributed cache system and a data recovery device of distributed cache system
US20150213100A1 (en) Data synchronization method and system
CN103942252B (en) A kind of method and system for recovering data
US9286298B1 (en) Methods for enhancing management of backup data sets and devices thereof
CN103051681B (en) Collaborative type log system facing to distribution-type file system
JP2016517124A (en) Efficient read replica
CN106933843B (en) Database heartbeat detection method and device
CN101808127B (en) Data backup method, system and server
CN109491609B (en) Cache data processing method, device and equipment and readable storage medium
CN103929500A (en) Method for data fragmentation of distributed storage system
CN103763155A (en) Multi-service heartbeat monitoring method for distributed type cloud storage system
CN111124277A (en) Deep learning data set caching method, system, terminal and storage medium
CN110392120B (en) Method and device for recovering fault in message pushing process
CN103942112A (en) Magnetic disk fault-tolerance method, device and system
WO2019020081A1 (en) Distributed system and fault recovery method and apparatus thereof, product, and storage medium
CN107729515B (en) Data synchronization method, device and storage medium
CN105493474A (en) System and method for supporting partition level journaling for synchronizing data in a distributed data grid
CN103138912A (en) Data synchronizing method and data synchronizing system
CN103209210A (en) Method for improving erasure code based storage cluster recovery performance
US20180121531A1 (en) Data Updating Method, Device, and Related System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant