CN102098344A - Method and device for synchronizing editions during cache management and cache management system - Google Patents

Method and device for synchronizing editions during cache management and cache management system Download PDF

Info

Publication number
CN102098344A
CN102098344A CN2011100419204A CN201110041920A CN102098344A CN 102098344 A CN102098344 A CN 102098344A CN 2011100419204 A CN2011100419204 A CN 2011100419204A CN 201110041920 A CN201110041920 A CN 201110041920A CN 102098344 A CN102098344 A CN 102098344A
Authority
CN
China
Prior art keywords
version
request
write request
node
synchronous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100419204A
Other languages
Chinese (zh)
Other versions
CN102098344B (en
Inventor
司成祥
许鲁
孟晓烜
刘振军
韩晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN 201110041920 priority Critical patent/CN102098344B/en
Publication of CN102098344A publication Critical patent/CN102098344A/en
Application granted granted Critical
Publication of CN102098344B publication Critical patent/CN102098344B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a cache management system for network storage, which is used for realizing high reliability of write-cache data by backing up data, which is written in a cache, of a local node to an idle network in a cluster node for caching. By the synchronous edition mechanism of 'overall edition number+ local edition number +request sequence number', overall edition numbers are adjusted synchronously and local edition numbers and request sequence numbers are adjusted asynchronously, so that the overhead of data synchronization between nodes is lowered. Moreover, a main node and a backup node can independently write caching data in a background storage system through local comparison edition information, so that caching resources can be flexibly and effectively released, and high performance of the system is realized.

Description

Synchronous version method and apparatus and cache management system thereof in a kind of cache management
Technical field
The invention belongs to areas of information technology, relate in particular to the reliability of shared buffer memory management system.
Background technology
The metadata cache technology is widely used in the storage systems at different levels as a kind of important I/O performance optimization means, it may accessed data be kept in the main memory following, utilize to use ubiquitous data access locality principle in the I/O load and quicken the I/O performance of storage system, shield at a slow speed disk unit using Effect on Performance.Under the network storage application model, be positioned at the metadata cache resource on the back-end processing node on the IO path, the performance of upper layer application is had significant effects.
Yet the unreliability of main memory has caused the conflict between memory property and the reliability.Adopting write-back (write-back) mode after receiving the write request of application data to be write in the main memory is echo reply, farthest postpones on the backstage to write disk, though this mode maximization performance, its reliability is also the poorest.Because the volatibility of main memory causes system crash if software and hardware mistake or outage etc. are unusual in this case, can cause loss of data and data inaccessible.In order to guarantee the reliability of buffer memory, tradition adopt logical write (write-through) mode synchronous data are write disk, though can guarantee data reliability, because the mechanical property of disk, write performance differs about order of magnitude than asynchronous write performance.
In network storage application model, there is the conflict between memory property and the reliability equally.In the network storage is used, adopt proprietary hardware usually, solve this conflict as NVRAM, buffer memory disk etc., but need the special hardware support, and cost an arm and a leg, cost performance is lower, and versatility is relatively poor; Do not meet the technological trend that develops based on general software and hardware storage system now.Along with the reduction of network delay and the raising of the network bandwidth, by Network Transmission write data is backuped to internal memory on the idle node, as the backup of write data buffer memory.Because data have kept backup on other node, and the probability that two or more node breaks down simultaneously is very little, can tolerate that therefore the software and hardware of node lost efficacy.After a node failure is restarted, can read in data from backup node, guaranteed the reliability of data.On the other hand, adopt network internal storage to store data, data are write promptly to return to write after the network internal storage and are replied, and also can guarantee the higher data readwrite performance.Though using network internal storage is a kind of method that solves the buffer memory reliability preferably, but still have following performance issue: (1) machine for resource recovery system efficient is low: because reliability requirement, have only host node that data really are written back to after rear end storage goes up, the data on the backup node just can discharge; If the resource anxiety appears in backup node, must wait for that then the write-back of host node finishes; (2) data consistency (synchronization mechanism) expense is very big: adopt the version mechanism of " call number+version number " to solve consistency problem in the prior art substantially, keep a version for each data page, administration overhead is bigger, and bigger to performance impact.
Summary of the invention
Therefore, the objective of the invention is to overcome the defective of above-mentioned prior art, for cache management in the network storage application model provides synchronous version mechanism and has utilized the machine for resource recovery system of version information, when guaranteeing data reliability, reading and writing data performance efficiently is provided, thereby breaks compromise (tradeoff) of reliability and performance.
According to an aspect of the present invention, the invention provides synchronous version method in a kind of cache management, the write request data that wherein said cache management writes local node to buffer memory backup to backup node, and described synchronous version method may further comprise the steps:
Step 1, judge whether then the global synchronization clock, if then, synchro system version between node then, the system version of local node and backup node is consistent, described system version comprises overall version number, city edition this shop and request sequence number, and wherein said global synchronization clock is a set time;
Whether then step 2 judges local synchronous clock, if then, then at intranodal Adjustment System version, with the city edition this shop increase by 1 of described system version, the request sequence number is initialized as 0, and overall version number remains unchanged, and wherein said local synchronous clock is a fixing short time;
Step 3 when receiving write request, increases by 1 with the request sequence number of described system version, and is current system version with the beginning version assignment of write request; Wherein, the beginning version of described write request comprises overall version number, city edition this shop and request sequence number.
According to the synchronous version method of the embodiment of the invention, the synchro system version between node in step 1 may further comprise the steps:
The overall version number of system version is increased by 1, and city edition this shop and request sequence number are initialized as 0;
Send the global synchronization request by local node to backup node, backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0.
According to the synchronous version method of the embodiment of the invention, wherein, local synchronous clock can dynamically be adjusted according to network delay and internodal clocking error, generally should be greater than 2 milliseconds; The n that described global synchronization clock is described local synchronous clock times, n>0.In some embodiments of the invention, the global synchronization clock setting is 30 seconds; Local synchronous clock is set to 10 milliseconds.
Synchronous version method according to the embodiment of the invention also comprises following step:
Step 4, the end version assignment with write request after the data of write request are saved to the rear end storage is current system version, and is stored in and finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information;
Step 5 is finished tabulation and is sent to backup node by local node described, and backup node is according to the end version of finishing the corresponding write request of list update.
According to a further aspect in the invention, the invention provides synchronous version device in a kind of cache management, the write request data that wherein said cache management writes local node to buffer memory backup to backup node, and described synchronous version device comprises:
The parts that are used for global synchronization, judge whether then the global synchronization clock, if then, synchro system version between node then, the system version of local node and backup node is consistent, described system version comprises overall version number, city edition this shop and request sequence number, and wherein said global synchronization clock is a set time;
Be used for local synchronous parts, judge whether then local synchronous clock, if then, then at intranodal Adjustment System version, the city edition this shop of described system version is increased by 1, the request sequence number is initialized as 0, and overall version number remains unchanged, and wherein said local synchronous clock is a fixing short time;
Be used for the parts that the mark write request begins version, when receiving write request, the request sequence number of described system version increased by 1, and be current system version the beginning version assignment of write request; Wherein, the beginning version of described write request comprises overall version number, city edition this shop and request sequence number.
According to the synchronous version device of the embodiment of the invention, the described global synchronization parts that are used for also comprise the parts that are used for the Adjustment System version, and with the overall version number increase by 1 of system version, city edition this shop and request sequence number are initialized as 0; And
The parts that are used for the synchro system version send the global synchronization request by local node to backup node, and backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0.
According to the synchronous version device of the embodiment of the invention, described local synchronous clock can dynamically be adjusted according to network delay and internodal clocking error, generally should be greater than 2 milliseconds; The n that described global synchronization clock is described local synchronous clock times, n>0.In some embodiments of the invention, the global synchronization clock setting is 30 seconds; Local synchronous clock is set to 10 milliseconds.
Synchronous version device according to the embodiment of the invention, also comprise: the parts that are used for the end version of mark write request, end version assignment with write request after the data of write request are saved to the rear end storage is current system version, and it is kept at finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information;
Be used for finishing synchronously the parts of tabulation, finish tabulation and send to backup node by local node described, backup node is according to the end version of finishing the corresponding write request of list update.
According to another aspect of the invention, the invention provides a kind of cache management system, comprise the shared buffer memory device, be used to from the write request of data communication equipment and distribute cache resources and recovery cache resources and comprise synchronous version device as described above.
In certain embodiments, above-mentioned cache management system also comprises writes the resource retracting device, and the described resource retracting device of writing is used to receive write-back request from the shared buffer memory device, and described write-back request is used to ask the data with write request to be saved in the rear end storage; And the version information that is used for the comparison write request, if the end version of the beginning version≤write request of write request, then delete this write request information, and directly notify the shared buffer memory device resource that recyclable this write request takies, store to the rear end otherwise preserve the write request data, the end version of revising this request is current system version.
Compared with prior art, beneficial effect of the present invention is, backups to idle network-caching in the clustered node writing to data in buffer, because the probability that two nodes break down simultaneously in the cluster is very little, so can realize writing data cached high reliability.Simultaneously, adopt the synchronous version mechanism of " overall version number+city edition this shop+request sequence number ", between node, adjust overall version number synchronously,, reduced the data between nodes synchronization overhead in the asynchronous adjustment city edition of intra-node this shop and request sequence number.And host node and backup node reclaim cache resources flexibly and effectively by relatively version information can be alone in this locality with the data cached storage system to the back-end of writing; Thereby the high-performance of the system of realization.
Description of drawings
It is following that embodiments of the present invention is further illustrated with reference to accompanying drawing, wherein:
Fig. 1 is the structural representation according to the version of the embodiment of the invention;
Fig. 2 is the structure chart according to the cache management system of the embodiment of the invention;
Fig. 3 is the workflow diagram according to each device of the cache management system of the embodiment of the invention;
Embodiment
In order to make purpose of the present invention, technical scheme and advantage are clearer, and the present invention is described in more detail by specific embodiment below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
In an embodiment of the present invention, backup to idle network-caching in the clustered node, realize writing data cached high reliability by local node being write to data in buffer.Wherein, the operated local node of write request also can be described as host node, and the idle node write to data in buffer of backup also can be described as backup node or mirror nodes, and wherein backup node or mirror nodes can have a plurality ofly, and its number can dispose according to the rank of reliability.For example, in certain embodiments, in order to improve reliability, host node can be saved in the data cached of required backup on two backup nodes simultaneously.In certain embodiments, in order to improve space availability ratio, host node can be selected idle node in the network to back up it to each write request to write to data in buffer at random.
Take to realize with the asynchronous synchronous version method that combines synchronously the high-performance of system in an embodiment of the present invention.The key problem of method for synchronous is exactly with core (fraction) synchronization, marginal portion (major part) asynchronization.In an embodiment of the present invention, provide a kind of synchronous version method that is used for cache management, every a set time synchronous Adjustment System version between node, each fixing short time is at intranodal Adjustment System version; And begin version for each write request mark, wherein the beginning version of system version and write request adopts structure as shown in Figure 1, comprises overall version number, city edition this shop and request sequence number; The Adjustment System version is meant that the overall version number with system version increases by 1 synchronously between node, and city edition this shop and request sequence number are initialized as 0; Send the global synchronization request by local node to backup node, backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0; Be meant that at intranodal Adjustment System version the city edition this shop with system version increases by 1, the request sequence number is initialized as 0, and overall version number remains unchanged.
End version assignment with write request after the data that the synchronous version method of some embodiments of the present invention also is included in write request are saved to the rear end storage is current system version, and is kept at and finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information; Describedly finish tabulation and send to backup node by local node, backup node can be according to the end version of finishing the corresponding write request of list update.
Fig. 1 is the structure according to the version of the embodiment of the invention, and described version is made up of overall version number 101, city edition this shop 102, request sequence number 103.In the present embodiment, each request that enters system all has two version numbers, and (the beginning version finishes version) wherein begins the initial version that version refers to ask the system that enters, and finishes the final version that version refers to ask the system of leaving.
The overall version number 201 of system version is the object of adjusting synchronously between node.The initialization of the overall version number of system version and adjustment are that two nodes are finished fully synchronously, promptly every a regular time, send synchronization request by cache node to backup node, carry out the synchronous of system version, the overall version number that is about to system version increases by 1, and city edition this shop and request sequence number are initialized as 0.In essence, overall version number is a concept of time, and overall version number just increases by 1 every a set time (as 30 seconds).Therein on node, for all write requests, when the overall version number of the overall version number 〉=write request of the current system version of this node+2, mean that this write request necessarily has been written on the disk in the pairing write request of the other side's node, promptly Dui Ying backup buffer memory is saved, therefore, this request shared resource on local node just can discharge.This is because every a regular time, and as 30 seconds, just the carrying out synchronously once of system version was if therefore if the overall version number of current system version illustrates that greater than the overall version number of write request there have been 30 seconds in this write request on this node; And each write request residence time in memory cache is limited, the memory cache management of operating system can guarantee that (as 30 seconds) necessarily write back to non-volatile media (as disk) with write data in a set time scope, therefore can discharge the shared resource of this write request.Wherein, add 2 purpose and be and guarantee that boundary condition sets up, add 1 also passable in fact.Between node, system version is adjusted once synchronously, and in these 30 seconds, be need not to carry out synchronously every 30 seconds, and carry out version counting separately by each node, in this process, all request compare operations do not need to carry out communication with the other side's node fully on the node, therefore can save synchronous expense.In addition, the release of local resource does not rely on the response of the other side's node yet.Also optional other values of this set time in certain embodiments, for example 10 seconds, 20 seconds, 50 seconds or the like.
The asynchronization that two nodes can be compared in the write request in later stage of the overall version number of system version.When the overall version number of each system version carries out the absolute synchronization adjustment (overall version number adds 1), the city edition this shop 202 of system version is initialized to 0.Afterwards, the adjustment of the city edition this shop 202 of system version is two memory nodes asynchronous carrying out fully, is calculated separately by each node.
The city edition this shop 202 of system version is the object that the intranodal version is adjusted; In essence, city edition this shop also is a concept of time.On two nodes, the city edition this shop of system version separately just increases by 1 every a certain fixing short time (as 10 milliseconds).For same block address, when the city edition this shop of the beginning version of the city edition this shop 〉=write request of the end version of write request on the local node+2, write request on this expression the other side node necessarily has been written on the disk in the corresponding requests of local node, means that also this request shared resource on the other side's node just can discharge simultaneously.The other side's node need obtain from local node and finish tabulation in this process.Be different from prior art, city edition this shop relatively be independently to finish on the node separately.Have only host node that data really are written back to after rear end storage goes up in the prior art, the data on the backup node just can discharge; If the resource anxiety appears in backup node, must wait for that then the write-back of host node finishes.And host node and backup node can be alone in an embodiment of the present invention with the data cached storage system to the back-end of writing, and discharge cache resources flexibly and effectively.This is because absolute time difference and short time internal clock error (Timer Skew) that write request is submitted on two nodes are limited, promptly different machines in a set time section clocking error and the time difference of Network Transmission be can not surpass certain higher limit certainly.Interconnected mechanism can guarantee that a write request enters absolute time difference that the block device stack handles respectively less than a certain fixing short time (as 10 milliseconds) at two nodes.Between two nodes, synchronous city edition this shop by discontinuity resets (when overall version number adjusts) and adjusts city edition this shop according to local clock separately, though city edition this shop is not an absolute synchronization, but synchronous relatively, the relative synchronism by city edition this shop enables two nodes and has comparativity in the later stage write request.
The adjustment of the overall version number of system version is relevant with the set time, and this set time also can be called the global synchronization clock; The adjustment of the city edition this shop of system version is relevant with the fixing short time, and this fixing short time also can be called local synchronous clock.In some embodiments of the invention, the fixing short time can dynamically be adjusted according to network delay and internodal clocking error, generally should be greater than 2 milliseconds; And the above-mentioned set time be described fixing short time n doubly, n>0.Set time is set to 30 seconds in the present embodiment; The fixing short time is set to 10 milliseconds.
The request sequence number 203 of system version all is initialized as 0 when each overall version number or the adjustment of city edition this shop; When write request of each processing, the request sequence number of system version adds 1 automatically; Beginning version replication with this write request is current system version simultaneously, can guarantee like this that on each memory node each write request can be by (overall version number | city edition this shop | request sequence number) and unique identification.
Detailed process below in conjunction with the synchronous version method of table 1 pair one embodiment of the invention describes in detail, in the present embodiment, each write request all has two version numbers, (beginning version, finish version), wherein begin version and refer to the initial version of write request, finish version and refer to that the data of write request are saved to the version of the write request after the storage of rear end.
Table 1
Overall situation version number City edition this shop The request sequence number The request numbering
1 0 0
1 1 0
1 1 1 1
1 1 2 2
1 2 0
1 3 0
1 3 1 3
1 3 1 1
2 0 0
2 0 1 4
2 0 1 2
Step 1, after the version initialization of host node and backup node, current system version is 100;
Step 2, through a fixing short time (as 10 milliseconds), local node is adjusted city edition this shop, and current system version is updated to 110;
Step 3, when receiving write request 1 (being numbered 1 write request), at first the update system version is updated to 111 by 110; And the beginning version of current system version assignment to write request 1, it finishes version and composes 0, i.e. (111,0)), and to asking downward one deck device stack to be transmitted;
Step 4, request 1 is written to local shared buffer memory; Simultaneously write request is carried out synchronization replication, its long-distance inner access mode by high performance network is sent to backup node.Request is similar to step 3 in the processing of backup node;
Step 5, when receiving write request 2 (being numbered 2 write request), at first the update system version is updated to 112 by 111; And the beginning version of current system version assignment to request 2, it finishes version and composes 0, i.e. (112,0), and to asking downward one deck device stack to be transmitted;
Step 6, write request 2 is written to local shared buffer memory; Simultaneously write request is carried out synchronization replication, the back sends to backup node by the long-distance inner access mode of high performance network.Request is similar to step 5 in the processing of backup node;
Step 7, through a fixing short time (as 10 milliseconds), local node is adjusted city edition this shop, and current system version is updated to 120;
Step 8, through a fixing short time (as 10 milliseconds), local node is adjusted city edition this shop, and current system version is updated to 130;
Step 9, when receiving request 3 (being numbered 3 write request), at first the update system version is updated to 131 by 130; And the beginning version of current system version assignment to request 3, it finishes version and composes 0, i.e. (131,0), and to asking downward one deck device stack to be transmitted;
Step 10, request 3 is written to local shared buffer memory; Simultaneously write request is carried out synchronization replication, the back sends to backup node by the long-distance inner access mode of high performance network.Request is similar to step 9 in the processing of backup node;
Step 11, write request 1 are written to the rear end storage, upgrade the end version of write request 1, give the end version of write request 1, i.e. (111,131) current system version assignment, and the beginning version of write request 1 with finish version and add and finish tabulation, send to backup node; After backup node is received and finished tabulation, upgrade the end version of corresponding requests; When backup node carries out resource when reclaiming, the beginning version by comparison of request and finish version and come auxiliary resources to reclaim;
Step 12 through a set time (as 30 seconds), is adjusted overall version number, and current system version is updated to 200, and host node sends global synchronization request, backup node update system version to backup node;
Step 13, when receiving write request 4 (being numbered 4 write request), at first the update system version is updated to 201 by 200; And the beginning version of current system version assignment to request 4, it finishes version and composes 0, i.e. (201,0), and to asking downward one deck device stack to be transmitted;
Step 14, write request 4 is written to local shared buffer memory; Simultaneously write request is carried out synchronization replication, the back sends to backup node by the long-distance inner access mode of high performance network.Request is similar to step 13 in the processing of backup node;
Step 15, write request 2 are written to the rear end storage, upgrade the end version of write request 2.Give the end version of write request 2 current system version assignment, i.e. (112,201), and the beginning version of write request 2 with finish version and add and finish tabulation, send to backup node; After backup node is received and finished tabulation, upgrade the end version of corresponding requests; When backup node carries out resource when reclaiming, the beginning version by comparison of request and finish version and come auxiliary resources to reclaim.
In another aspect of this invention, the present invention also provides the synchronous version that realizes above-mentioned synchronous version method device, comprising:
Whether then the parts that are used for global synchronization judge the global synchronization clock, if then, synchronous Adjustment System version between node then increases by 1 with the overall version number of system version, and city edition this shop and request sequence number are initialized as 0; Send the global synchronization request by local node to backup node, backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0.
Be used for local synchronous parts, whether then judge local synchronous clock, if then, then the city edition this shop with described system version increases by 1, and the request sequence number is initialized as 0, and overall version number remains unchanged.
Be used for the parts that the mark write request begins version, when receiving write request, the request sequence number of described system version increased by 1, and be current system version the beginning version assignment of write request.
Above-mentioned in some embodiments of the invention synchronous version device also comprises: the parts that are used for the end version of mark write request, end version assignment with write request after the data of write request are saved to the rear end storage is current system version, and it is kept at finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information; With the parts that are used for finishing synchronously tabulation, to finish tabulation and send to backup node described by local node, backup node is according to the end version of finishing the corresponding write request of list update.
Aspect another, the present invention also provides a kind of cache management system that comprises above-mentioned synchronous version device.Be described in detail below in conjunction with the cache management system of accompanying drawing the embodiment of the invention.
Fig. 2 is the cache management system according to the embodiment of the invention, comprises synchronous version device 201, data communication equipment 202, shared buffer memory device 203, writes cache resources retracting device 204, crash handling device 205, finishes tabulation communication device 206 and synchronized communication means 207.Each device on cache node and backup node has the functional structure of complete symmetry.
Synchronous version device 201 is inlets that external request enters system, is responsible for the generation of version information, is used between node the synchro system version and begins version for each write request mark that enters system.Each write request has two versions, beginning version and end version, and wherein, the beginning version refers to that write request enters the initial version of system, the end version refers to the final version after write request is saved to the rear end storage.When a write request enters system, synchronous version device 201 at first the update system version and with the request the initial version assignment be current system version, when a request two nodes mechanical floor passes through separately the time, the version of the write request by institute's mark just can carry out two requests between memory node relatively.When the data of write request are saved to the rear end storage, end version with request, be the final version that write request is left system, assignment is current system version, and the version information (comprise the beginning version of request and finish version) of this request is left in and finishes in the tabulation.
Data communication equipment 202 duplicates the write request data, and backups to other nodes by high performance network.Particularly, at the write request from synchronous version device 201, ask to duplicate, the source write request is forwarded to the shared buffer memory device of this node, and the write request of being duplicated sends to backup node by the remote direct memory visit.
Shared buffer memory device 203 is mainly carrying the read-write cache demand of using, and is the data allocations cache resources of using; When resource is nervous, carry out the recovery of cache resources.Shared buffer memory device 203 comprises buffer memory distributor and buffer memory partitioning device; Wherein, cache partitions has independent spatial cache, can same association, and the data of the described application current accessed of carrying in described spatial cache; The buffer memory distributor is used for having with the mode administrative institute in shared buffer memory pond the recovery of idle cache resources and responsible cache resources.
Write cache resources retracting device 204,, assist and reclaim writing cache resources by beginning version and the end version that compares write request.If the end version of the beginning version≤request of request means and can be written back to the rear end storage to the write data piece, and directly reclaims, this moment this write request is directly upwards returned success, and in system, delete this solicited message; Otherwise storage system is transmitted write request to the back-end, preserves data and stores to the rear end, and the end version of revising this request is current system version.
Crash handling device 205, when node or network broke down, after a node failure was restarted, crash handling device 205 can read in data from backup node.Because data have kept backup on other node, and the probability that two or more node breaks down simultaneously is very little, can tolerate that therefore the software and hardware of node lost efficacy, thereby guarantee the reliability of data.
Finish tabulation communication device 206, between node, send or finish receiving list information.To finish tabulation and sending to other nodes, the other side's node obtains and finishes the end version that tabulation can be upgraded corresponding requests by finishing tabulation communication device 206.
Synchronous communication device 207 is used for sending or receiving the version synchronizing information between node.
In certain embodiments, synchronous communication device 207 and to finish tabulation communication device 206 be in the user class operation and the synchronous version device moves in kernel level.
In certain embodiments, synchronous communication device 207 and finish tabulation communication device 206 and can merge into a communication device is responsible for sending and receiving message, distinguishes synchronous version message or finishes list message by the type that message is set.In certain embodiments, synchronous communication device 207 can be set and finish tabulation communication device 206, its corresponding function is finished by synchronous version device 201.
Shown in Figure 3 is a concrete workflow according to the buffer memory management method of the embodiment of the invention.
Step 301, when the IO thread carries out write data to caching system, when write request enters system, synchronous version device 201 is the update system version at first, with the beginning version assignment of write request is current system version, the end version assignment of write request is 0, and write request is forwarded to data communication equipment 202; Whether check simultaneously has the global synchronization request to need to handle, if have, then stop request and transmit, carry out global synchronization, the overall version number that is about to current system version increases by 1, city edition this shop and request sequence number are initialized as 0, reenters the request forwarding state after finishing global synchronization.
Step 302, when receiving the write request of transmitting from synchronous version device 201, data communication equipment 202 is transmitted write request to shared buffer memory device 203; Simultaneously write request is carried out synchronization replication, and send to backup node by the long-distance inner access mode of network; After source request and the request of being duplicated are all finished, just to the upper strata ending request.
Step 303 when shared buffer memory device 203 receives write request, at first, searches whether there is the corresponding cache piece at spatial cache; If exist, explanation is hit, and directly writes data in the corresponding cache piece; Otherwise, be the data allocations cache blocks from the idling-resource pond, after write data in the newly assigned cache blocks; If distributing in the process of cache blocks for write request, the cache resources anxiety occurs, then triggering system is carried out the recovery of cache resources; In the resource removal process, reclaim the write data piece as need, before reclaiming, need store write data piece backwash to the rear end, therefore transmit the write-back request of write data piece downwards.
Step 304, when writing cache resources retracting device 204 and receive write-back request from shared buffer memory device 203, beginning version that relatively should request and finish version.As if the end version of the beginning version≤request of asking, mean and can the write data piece be written back to the rear end storage, and directly reclaim, this moment this write request is directly upwards returned success, and in system, delete this solicited message; Otherwise storage system is transmitted write request to the back-end, and after the rear end storage system was finished this request, the end version of revising this request was current system version.Therefore, write that cache resources retracting device 204 makes that host node and backup node can be alone with the data cached storage system to the back-end of writing, discharge cache resources flexibly and effectively.And in the prior art, have only host node that data really are written back to after rear end storage goes up, the data on the backup node just can discharge; If the resource anxiety appears in backup node, must wait for that then the write-back of host node finishes.
Step 305 is formed the version information that is written back to the write request of storing the rear end (comprise the beginning version of this request and finish version) of some in the step 304 to finish tabulation, by finishing tabulation communication device 206, sends to backup node; After backup node is received and finished the communication tabulation, upgrade the end version of corresponding requests.Work as backup node resource anxiety like this, in the time of need carrying out the resource recovery, the recovery that can in step 304, come auxiliary resources according to the version information of asking.
As seen according to the cache management system of the embodiment of the invention by backuping to idle network-caching in the clustered node, because the probability that two nodes break down simultaneously in the cluster is very little, so can realize writing data cached high reliability writing to data in buffer.Simultaneously, adopt the synchronous version mechanism of " overall version number+city edition this shop+request sequence number ", between node, adjust overall version number synchronously,, reduced the data between nodes synchronization overhead in the asynchronous adjustment city edition of intra-node this shop and request sequence number.And host node and backup node reclaim cache resources flexibly and effectively by relatively version information can be alone in this locality with the data cached storage system to the back-end of writing; Thereby the high-performance of the system of realization.
Though the present invention is described by preferred embodiment, yet the present invention is not limited to embodiment as described herein, also comprises various changes and the variation done without departing from the present invention.

Claims (12)

1. synchronous version method in the cache management, the write request data that wherein said cache management writes local node to buffer memory backup to backup node, it is characterized in that described synchronous version method may further comprise the steps:
Step 1, judge whether then the global synchronization clock, if then, synchro system version between node then, the system version of local node and backup node is consistent, described system version comprises overall version number, city edition this shop and request sequence number, and wherein said global synchronization clock is a set time;
Whether then step 2 judges local synchronous clock, if then, then at intranodal Adjustment System version, with the city edition this shop increase by 1 of described system version, the request sequence number is initialized as 0, and overall version number remains unchanged, and wherein said local synchronous clock is a fixing short time;
Step 3 when receiving write request, increases by 1 with the request sequence number of described system version, and is current system version with the beginning version assignment of write request; Wherein, the beginning version of described write request comprises overall version number, city edition this shop and request sequence number.
2. synchronous version method according to claim 1 is characterized in that in described step 1, described between node the synchro system version may further comprise the steps:
The overall version number of system version is increased by 1, and city edition this shop and request sequence number are initialized as 0;
Send the global synchronization request by local node to backup node, backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0.
3. synchronous version method according to claim 1 is characterized in that described local synchronous clock can dynamically adjust according to network delay and internodal clocking error, generally should be greater than 2 milliseconds; The n that described global synchronization clock is described local synchronous clock times, n>0.
4. synchronous version method according to claim 1 is characterized in that described global synchronization clock is 30 seconds; Described local synchronous clock is 10 milliseconds.
5. synchronous version method according to claim 1 is characterized in that also comprising following step:
Step 4, the end version assignment with write request after the data of write request are saved to the rear end storage is current system version, and is stored in and finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information;
Step 5 is finished tabulation and is sent to backup node by local node described, and backup node is according to the end version of finishing the corresponding write request of list update.
6. synchronous version device in the cache management, the write request data that wherein said cache management writes local node to buffer memory backup to backup node, it is characterized in that described synchronous version device comprises:
The parts that are used for global synchronization, judge whether then the global synchronization clock, if then, synchro system version between node then, the system version of local node and backup node is consistent, described system version comprises overall version number, city edition this shop and request sequence number, and wherein said global synchronization clock is a set time;
Be used for local synchronous parts, judge whether then local synchronous clock, if then, then at intranodal Adjustment System version, the city edition this shop of described system version is increased by 1, the request sequence number is initialized as 0, and overall version number remains unchanged, and wherein said local synchronous clock is a fixing short time;
Be used for the parts that the mark write request begins version, when receiving write request, the request sequence number of described system version increased by 1, and be current system version the beginning version assignment of write request; Wherein, the beginning version of described write request comprises overall version number, city edition this shop and request sequence number.
7. synchronous version device according to claim 6 is characterized in that the described global synchronization parts that are used for also comprise the parts that are used for the Adjustment System version, and with the overall version number increase by 1 of system version, city edition this shop and request sequence number are initialized as 0; And
The parts that are used for the synchro system version send the global synchronization request by local node to backup node, and backup node is received the global synchronization request, and correspondingly the overall version number with its current system version increases by 1, and city edition this shop and request sequence number are initialized as 0.
8. synchronous version device according to claim 6 is characterized in that described local synchronous clock can dynamically adjust according to network delay and internodal clocking error, generally should be greater than 2 milliseconds; The n that described global synchronization clock is described local synchronous clock times, n>0.
9. synchronous version device according to claim 6 is characterized in that described global synchronization clock is 30 seconds; Described local synchronous clock is 10 milliseconds.
10. synchronous version device according to claim 6 is characterized in that also comprising:
The parts that are used for the end version of mark write request, the end version assignment with write request after the data of write request are saved to the rear end storage is current system version, and it is kept at finishes in the tabulation; The end version of described write request comprises overall version number, city edition this shop and request sequence number, and its initial value is 0; Described finish tabulation comprise some the write request of being preserved by rear end storage the beginning version and finish version information;
Be used for finishing synchronously the parts of tabulation, finish tabulation and send to backup node by local node described, backup node is according to the end version of finishing the corresponding write request of list update.
11. a cache management system comprises the shared buffer memory device, is used to write request to distribute cache resources and recovery cache resources, it is characterized in that also comprising as claim 6,7,8,9 or 10 described synchronous version devices.
12. cache management system according to claim 11 is characterized in that also comprising and writes the resource retracting device, is used for receiving from the request of described shared buffer memory device write-back, described write-back request is used to ask the data with write request to be saved in the rear end storage; And the version information that is used for the comparison write request, if the end version of the beginning version≤write request of write request, then delete this write request information, and directly notify the shared buffer memory device resource that recyclable this write request takies, store to the rear end otherwise preserve the write request data, the end version of revising this write request is current system version.
CN 201110041920 2011-02-21 2011-02-21 Method and device for synchronizing editions during cache management and cache management system Expired - Fee Related CN102098344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110041920 CN102098344B (en) 2011-02-21 2011-02-21 Method and device for synchronizing editions during cache management and cache management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110041920 CN102098344B (en) 2011-02-21 2011-02-21 Method and device for synchronizing editions during cache management and cache management system

Publications (2)

Publication Number Publication Date
CN102098344A true CN102098344A (en) 2011-06-15
CN102098344B CN102098344B (en) 2012-12-12

Family

ID=44131204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110041920 Expired - Fee Related CN102098344B (en) 2011-02-21 2011-02-21 Method and device for synchronizing editions during cache management and cache management system

Country Status (1)

Country Link
CN (1) CN102098344B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036722A (en) * 2012-12-13 2013-04-10 方正科技集团股份有限公司 Method for achieving server hot backup in diskless system
CN103049508A (en) * 2012-12-13 2013-04-17 华为技术有限公司 Method and device for processing data
WO2014015809A1 (en) * 2012-07-25 2014-01-30 腾讯科技(深圳)有限公司 Method for synchronization of ugc master and backup data and system thereof, and computer storage medium
CN103559319A (en) * 2013-11-21 2014-02-05 华为技术有限公司 Cache synchronization method and equipment for distributed cluster file system
CN104426838A (en) * 2013-08-20 2015-03-18 中国移动通信集团北京有限公司 Internet cache scheduling method and system
CN107329708A (en) * 2017-07-04 2017-11-07 郑州云海信息技术有限公司 A kind of distributed memory system realizes data cached method and system
CN109710183A (en) * 2018-12-17 2019-05-03 杭州宏杉科技股份有限公司 A kind of method of data synchronization and device
CN110636121A (en) * 2019-09-09 2019-12-31 苏宁云计算有限公司 Data acquisition method and system
CN110764710A (en) * 2016-01-30 2020-02-07 北京忆恒创源科技有限公司 Data access method and storage system of low-delay and high-IOPS
CN111209342A (en) * 2020-01-13 2020-05-29 阿里巴巴集团控股有限公司 Distributed system, data synchronization and node management method, device and storage medium
CN114567677A (en) * 2022-04-26 2022-05-31 北京时代亿信科技股份有限公司 Data processing method and device and nonvolatile storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101917481A (en) * 2010-08-19 2010-12-15 周寅 Method for realizing video network map multilevel cache based on spatial roaming position

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101917481A (en) * 2010-08-19 2010-12-15 周寅 Method for realizing video network map multilevel cache based on spatial roaming position

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014015809A1 (en) * 2012-07-25 2014-01-30 腾讯科技(深圳)有限公司 Method for synchronization of ugc master and backup data and system thereof, and computer storage medium
CN103581231B (en) * 2012-07-25 2019-03-12 腾讯科技(北京)有限公司 UGC master/slave data synchronous method and its system
CN103581231A (en) * 2012-07-25 2014-02-12 腾讯科技(北京)有限公司 UGC primary standby data synchronization method and system thereof
CN103049508B (en) * 2012-12-13 2017-08-11 华为技术有限公司 A kind of data processing method and device
CN103049508A (en) * 2012-12-13 2013-04-17 华为技术有限公司 Method and device for processing data
CN103036722A (en) * 2012-12-13 2013-04-10 方正科技集团股份有限公司 Method for achieving server hot backup in diskless system
CN104426838B (en) * 2013-08-20 2017-11-21 中国移动通信集团北京有限公司 A kind of internet buffer scheduling method and system
CN104426838A (en) * 2013-08-20 2015-03-18 中国移动通信集团北京有限公司 Internet cache scheduling method and system
CN103559319A (en) * 2013-11-21 2014-02-05 华为技术有限公司 Cache synchronization method and equipment for distributed cluster file system
CN110764710A (en) * 2016-01-30 2020-02-07 北京忆恒创源科技有限公司 Data access method and storage system of low-delay and high-IOPS
CN110764710B (en) * 2016-01-30 2023-08-11 北京忆恒创源科技股份有限公司 Low-delay high-IOPS data access method and storage system
CN107329708A (en) * 2017-07-04 2017-11-07 郑州云海信息技术有限公司 A kind of distributed memory system realizes data cached method and system
CN109710183A (en) * 2018-12-17 2019-05-03 杭州宏杉科技股份有限公司 A kind of method of data synchronization and device
CN110636121A (en) * 2019-09-09 2019-12-31 苏宁云计算有限公司 Data acquisition method and system
CN110636121B (en) * 2019-09-09 2022-07-05 苏宁云计算有限公司 Data acquisition method and system
CN111209342A (en) * 2020-01-13 2020-05-29 阿里巴巴集团控股有限公司 Distributed system, data synchronization and node management method, device and storage medium
CN111209342B (en) * 2020-01-13 2023-04-07 阿里巴巴集团控股有限公司 Distributed system, data synchronization and node management method, device and storage medium
CN114567677A (en) * 2022-04-26 2022-05-31 北京时代亿信科技股份有限公司 Data processing method and device and nonvolatile storage medium
CN114567677B (en) * 2022-04-26 2022-07-29 北京时代亿信科技股份有限公司 Data processing method and device and nonvolatile storage medium

Also Published As

Publication number Publication date
CN102098344B (en) 2012-12-12

Similar Documents

Publication Publication Date Title
CN102098344B (en) Method and device for synchronizing editions during cache management and cache management system
EP4030315A1 (en) Database transaction processing method and apparatus, and server and storage medium
US9235481B1 (en) Continuous data replication
CN102387204B (en) Method and system for maintaining consistency of cluster caching
US9459804B1 (en) Active replication switch
US9588703B1 (en) Method and apparatus for replicating the punch command
US7840536B1 (en) Methods and apparatus for dynamic journal expansion
US9910739B1 (en) Inverse star replication
US8806161B1 (en) Mirroring splitter meta data
US8935498B1 (en) Splitter based hot migration
US8335771B1 (en) Storage array snapshots for logged access replication in a continuous data protection system
US8694700B1 (en) Using I/O track information for continuous push with splitter for storage device
US10083093B1 (en) Consistent replication in a geographically disperse active environment
CN101334797B (en) Distributed file systems and its data block consistency managing method
CN103154909B (en) Distributed cache consistency protocol
CN101291347B (en) Network storage system
US11442961B2 (en) Active transaction list synchronization method and apparatus
US20140006687A1 (en) Data Cache Apparatus, Data Storage System and Method
CN105808643A (en) Redis memory database refreshing method
CN102339283A (en) Access control method for cluster file system and cluster node
CN101567805A (en) Method for recovering failed parallel file system
US20150189039A1 (en) Memory Data Access Method and Apparatus, and System
US20090063807A1 (en) Data redistribution in shared nothing architecture
CN102521038A (en) Virtual machine migration method and device based on distributed file system
CN102710763B (en) The method and system of a kind of distributed caching pond, burst and Failure Transfer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121212

CF01 Termination of patent right due to non-payment of annual fee