CN102934114B - For the checkpoint of file system - Google Patents

For the checkpoint of file system Download PDF

Info

Publication number
CN102934114B
CN102934114B CN201180029522.9A CN201180029522A CN102934114B CN 102934114 B CN102934114 B CN 102934114B CN 201180029522 A CN201180029522 A CN 201180029522A CN 102934114 B CN102934114 B CN 102934114B
Authority
CN
China
Prior art keywords
checkpoint
data
written
file system
memory storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201180029522.9A
Other languages
Chinese (zh)
Other versions
CN102934114A (en
Inventor
J.M.卡吉尔
T.J.米勒
W.R.蒂普顿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN102934114A publication Critical patent/CN102934114A/en
Application granted granted Critical
Publication of CN102934114B publication Critical patent/CN102934114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • G06F16/1767Concurrency control, e.g. optimistic or pessimistic approaches
    • G06F16/1774Locking methods, e.g. locking methods for file systems allowing shared and concurrent access to files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1865Transactional file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/282Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Abstract

The various aspects of theme as described herein relate to the checkpoint for file system.In certain aspects, the renewal for file system is organized into checkpoint bucket.When wishing checkpoint, follow-up renewal is directed to another checkpoint bucket.After having carried out each global table for the every renewal in current check point bucket upgrading, create the logical copy of described global table.This logical copy is stored as a part for checkpoint data.In order to help to recover, just checkpoint manager can wait until that all more new capital of current check point bucket are written to memory storage final inspection point data after being written to memory storage.This final inspection point data can relate to the logical copy of described global table, and comprises the Validation Code of the correctness verifying checkpoint data.

Description

For the checkpoint of file system
Background technology
May occur to have a power failure or the system failure in the middle of the process to memory device write data.When this happens, data may lose or become inconsistent.For example, if extract generation systems fault in the middle of the process of cash account-holder from ATM, then described transaction may tend to bank or account-holder partially.As another example, if during relating to the long-time numerical behavior taken inventory and get generation systems fault, then take a long time described calculating of reforming possibly.
Here theme required for protection is not limited to the embodiment that solves any shortcoming or only operate in such as environment described above.On the contrary, this background is provided to describe just in order to the exemplary technology area can putting into practice embodiments more as described herein is wherein described.
Summary of the invention
In brief, the various aspects of theme as described herein relate to the checkpoint for file system.In certain aspects, the renewal for file system is organized into checkpoint bucket.When wishing checkpoint, follow-up renewal is directed to another checkpoint bucket.After the every renewal in current check point bucket having been carried out each global table upgrade, create the logical copy of described global table.This logical copy is stored as a part for checkpoint data.In order to help to recover, just checkpoint manager can wait until that all more new capital of current check point bucket are written to memory storage final inspection point data after being written to memory storage.This final inspection point data can relate to the logical copy of described global table, and comprises the Validation Code of the correctness verifying checkpoint data.
Content of the present invention is provided to be to identify some aspects of the theme further described in embodiment part below briefly.Content of the present invention is not intended to the key or the essential feature that identify theme required for protection, is also not intended to the scope being used to limit theme required for protection.
Unless referred else clearly in context, otherwise " theme as described herein " this representation refers to the theme described in embodiment part.Term " each/some aspects " should be understood to " at least one aspect ".Identify the theme described in embodiment part some in be not intended to identify key or the essential feature of theme required for protection.
In the accompanying drawings as an example and unrestricted show theme as described herein aforementioned various aspects and other aspects, wherein identical Reference numeral refers to similar element, and in the accompanying drawings:.
Accompanying drawing explanation
Fig. 1 represents the block scheme that the various aspects of theme as described herein can be merged into exemplary universal computing environment wherein;
Fig. 2 is the block scheme of the exemplary arrangement of the assembly representing the system that the various aspects of theme as described herein can operate wherein;
Fig. 3 is the block scheme of the various aspects that theme as described herein is shown;
Fig. 4 is the diagram of the renewal for file system of the various aspects represented generally according to theme as described herein;
Fig. 5 is the block scheme of the exemplary checkpoint bucket of the various aspects illustrated according to theme as described herein; And
Fig. 6-8 be represent generally according to the various aspects of theme as described herein the process flow diagram of contingent exemplary action.
Embodiment
definition
Here used term " comprises " and variant should be understood to mean the open term of " including but not limited to ".Unless referred else clearly in context, otherwise term " or " should be understood to " and/or ".Term "based" should be understood to " at least in part based on ".Term " embodiment " and " embodiment " should be understood to " at least one embodiment ".Term " another embodiment " should be understood to " at least one other embodiment ".Below other clear and definite and implicit definition can be included in.
illustrative Operating Environment
Fig. 1 shows an example of the suitable computing system environment 100 can implementing the various aspects of theme as described herein thereon.Computing system environment 100 is only an example of suitable computing environment, and is not intended to hint about the use of the various aspects of theme as described herein or any restriction of envelop of function.Computing environment 100 should not be interpreted as having any dependence or requirement about any one assembly shown in Illustrative Operating Environment 100 or assembly combination yet.
The various aspects of theme as described herein are suitable for operating with other universal or special computing system environment multiple or configuration.Be applicable to the known computing system of the various aspects of theme as described herein, the example of environment or configuration comprises personal computer, server computer, hand-held or laptop devices, multicomputer system, based on the system of microcontroller, Set Top Box, programmable consumer electronics device, network PC, small-size computer, mainframe computer, PDA(Personal Digital Assistant), game apparatus, printer, various electrical equipment (comprises Set Top Box, media center or other electrical equipment), the calculating device embedding or be attached on automobile, other moving device, comprise distributed computing environment of any aforementioned system or device etc.
The various aspects of theme as described herein can be described in the general situation of the computer executable instructions of the program module such as performed by computing machine and so on.In general, program module comprises the routine, program, object, assembly, data structure etc. implementing particular task or implement particular abstract data type.The various aspects of theme as described herein can also be practiced in distributed computing environment, and wherein each task is implemented by by communication network links remote processing devices together.In a distributed computing environment, each program module can be in simultaneously comprise in the local and remote computer-readable storage medium of feram memory part.
With reference to Fig. 1, the example system for the various aspects implementing theme as described herein comprises the general-purpose computations device of the form taking computing machine 110.Computing machine can comprise any electron device that can perform instruction.The assembly of computing machine 110 can comprise processing unit 120, system storage 130 and the various system components comprising system storage is coupled to the system bus 121 of processing unit 120.System bus 121 can be several types bus structure in the middle of any one, comprising the memory bus of any one used in the middle of multiple bus architecture or Memory Controller, peripheral bus and local bus.Unrestricted as an example, such architecture comprises ISA(Industry Standard Architecture) bus, MCA (MCA) bus, enhancement mode ISA(EISA) bus, VESA (VESA) local bus, periphery component interconnection (PCI) bus (it is also referred to as mezzanine bus), periphery component interconnection expansion (PCI-E) bus, advanced graphics port (AGP) and quick PCI(PCIe).
Computing machine 110 generally includes multiple computer-readable medium.Computer-readable medium can be any usable medium can accessed by computing machine 110, and comprises volatibility and non-volatile media and removable and non-removable medium.Unrestricted as an example, computer-readable medium can comprise computer-readable storage medium and communication media.
Computer-readable storage medium comprises the volatibility and non-volatile, removable and non-removable medium implemented according to any method or technology, for the information storing such as computer-readable instruction, data structure, program module or other data and so on.Computer-readable storage medium comprise RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital universal disc (DVD) or other optical disk storage apparatus, magnetic holder, tape, disk storage device or other magnetic memory devices or can be used to store desired by information and any other media can accessed by computing machine 110.
Communication media realizes computer-readable instruction, data structure, program module or other data in the data-signal of modulation usually, such as carrier wave or other transmission mechanisms, and comprises any information delivery media.Term " data-signal of modulation " one that means signal or more item characteristic is set by the mode of coded message in described signal or changes.Unrestricted as an example, communication media comprises such as cable network or the direct wire medium to connect and so on of line, and the wireless medium of such as acoustics, RF, infrared and other wireless mediums and so on.The various combinations of aforementioned any content also should be included in the scope of computer-readable medium.
System storage 130 comprises the computer-readable storage medium of the form of volatibility and/or nonvolatile memory, such as ROM (read-only memory) (ROM) 131 and random-access memory (ram) 132.Basic input/output 133(BIOS) be usually stored in ROM131, it comprises the basic routine of transmission information between each element of such as helping during starting shooting in computing machine 110.RAM132 usually comprises and can be accessed immediately by processing unit 120 and/or the current data that just operated by it and/or program module.Unrestricted as an example, Fig. 1 shows operating system 134, application program 135, other program modules 136 and routine data 137.
Computing machine 110 can also comprise that other are removable/non-removable, volatile/nonvolatile computer storage media.As just citing, Fig. 1 shows and carries out reading or to the hard disk drive 141 of its write from non-removable non-volatile magnetic media, carry out reading or to the disc driver 151 of its write from non-removable non-volatile magnetic disk 152, and from removable anonvolatile optical disk 156(such as CDROM or other optical mediums) carry out reading or to the CD drive 155 of its write.Can be used in that other in described Illustrative Operating Environment are removable/non-removable, volatile/nonvolatile computer storage media comprises tape cassete, flash card, digital universal disc, other CDs, digital video cassette, solid-state RAM, solid-state ROM etc.Hard disk drive 141 is connected to system bus 121 by non-removable memory interface (such as interface 140) usually, and disc driver 151 and CD drive 155 are connected to system bus 121 by removable memory interface (such as interface 150) usually.
Above to discuss and driver shown in Figure 1 and the computer-readable storage medium that is associated thereof provide storage to for the computer-readable instruction of computing machine 110, data structure, program module and other data.In FIG, for example, hard disk drive 141 is illustrated as and stores operating system 144, application program 145, other program modules 146 and routine data 147.Notice that these assemblies can be identical or different with operating system 134, application program 135, other program modules 136 and routine data 137.Operating system 144, application program 145, other program modules 146 and routine data 147 are presented different Reference numerals here to illustrate that it is at least different copy.
User can by entering apparatus to computing machine 110 input command and information, and described entering apparatus is such as keyboard 162 and the indicating device 161 being commonly referred to as mouse, trace ball or touch pad.Other entering apparatus (not shown) can comprise microphone, operating rod, game paddle, dish-shaped satellite-signal dual-mode antenna, scanner, touch-sensitive screen, handwriting pad etc.These and other entering apparatus are connected to processing unit 120 often through the user's input interface 160 being coupled to system bus, but also can be connected with bus structure by other interfaces, such as parallel port, game port or USB (universal serial bus) (USB).
The display device of monitor 191 or other types is also connected to system bus 121 by the interface of such as video interface 190 and so on.In addition to the monitor, computing machine can also comprise other peripheral output devices, such as loudspeaker 197 and printer 196, and it can connect by exporting peripheral interface 195.
The logic that computing machine 110 can use one or more platform remote computer (such as remote computer 180) connects and operates in networked environment.Remote computer 180 can be personal computer, server, router, network PC, peer device or other common network node, and about the many or all elements described by computing machine 110 before generally including, but illustrate only feram memory part 181 in FIG.Logic depicted in figure 1 connects and comprises Local Area Network 171 and wide area network (WAN) 173, but also can comprise other networks.Such networked environment is common in office, enterprise-wide. computer networks, Intranet and the Internet.
When being used in LAN networked environment, computing machine 110 is connected to LAN171 by network interface or adapter 170.When being used in WAN networked environment, computing machine 110 can comprise modulator-demodular unit 172 or for by WAN173(such as the Internet) set up other devices of communication.Modulator-demodular unit 172 can be inner or outside, and it can be connected to system bus 121 by user's input interface 160 or other suitable mechanism.In networked environment, the program module described about computing machine 110 or its various piece can be stored in remote memory storage device.Unrestricted as an example, remote application 185 is illustrated as and resides on storage component part 181 by Fig. 1.Will be appreciated that, it is exemplary that shown network connects, and can use other devices setting up communication linkage between the computers.
checkpoint processes
As mentioned before, may occur to have a power failure and the system failure while memory device write data.This may cause the data stored on the storage device to be in inconsistent state.In order to address this problem and other problems, checkpoint can be write to memory device.
Fig. 2 is the block scheme of the exemplary arrangement of the assembly representing the system that the various aspects of theme as described herein can operate wherein.Assembly shown in Fig. 2 is exemplary, and is not intended to all comprise all component that may need or comprise.In other embodiments, when not deviating from the spirit or scope of various aspects of theme as described herein, the assembly that composition graphs 2 describes and/or function can be included in other (shown or unshowned) assemblies or be placed in sub-component.In certain embodiments, composition graphs 2 describe assembly and/or function can be distributed on multiple device.
Translate into Fig. 2, system 205 can comprise one or more item application 210, API215, file system component 220, thesaurus 250, communication mechanism 255 and other assembly (not shown).System 205 can comprise one or more calculating device.Such device such as can comprise personal computer, server computer, hand-held or laptop devices, multicomputer system, based on the system of microcontroller, Set Top Box, programmable consumer electronics device, network PC, small-size computer, mainframe computer, cell phone, PDA(Personal Digital Assistant), game apparatus, printer, various electrical equipment (comprises Set Top Box, media center or other electrical equipment), the calculating device embedding or be attached on automobile, other moving device, comprise distributed computing environment of any aforementioned system or device etc.
When system 205 comprises individual devices, the exemplary means that can be configured to serve as system 205 comprises the computing machine 110 of Fig. 1.When system 205 comprises multiple device, each in the middle of described multiple device can comprise the computing machine 110 of the Fig. 1 similarly or differently configured.
File system component 220 can comprise RMAN 225, checkpoint manager 230, I/O manager 235, write plan manager 237 and other assembly (not shown).Here used term " assembly " be to be understood as comprise a device all or part of, one or more software module or the general collection of its various piece, one or more software module or its various piece certain combination and one or more device or its various piece etc.
Communication mechanism 255 allows system 205 and other entity communications.For example, communication mechanism 255 can allow system 205 to communicate with the application on distance host.Communication mechanism 255 can be network interface or adapter 170, modulator-demodular unit 172 or any other mechanism for setting up communication described in conjunction with Figure 1.
Thesaurus 250 is any storage mediums of the access that can provide for data.Described thesaurus can comprise volatile memory (such as high-speed cache) and nonvolatile memory (such as permanent storage device)." term " data should be broadly understood anything that comprise and can be represented by one or more Computer Storage element.Logically, data can be represented as a series of 1 and 0 in volatibility or nonvolatile memory.In the computing machine with nonbinary storage medium, data can be represented according to the ability of storage medium.Data can be organized into dissimilar data structure, and comprising such as numeral, alphabetical etc. simple data type, classification, link or other relevant data types, comprise data structure of other data structures multiple or simple data type etc.Some examples of data comprise information, program code, program state, routine data, other data etc.
Thesaurus 250 can comprise the volatile memory, other memory storages, aforementioned certain every combination etc. of harddisk storage device, other Nonvolatile memory devices, such as RAM and so on, and can distribute across multiple device.Thesaurus 250 can be outside, inner, or comprises the inside and outside assembly of the system of being in 205 simultaneously.
Thesaurus 250 can be visited by memory controller 240.Here used " access " can comprise read data, write data, delete data, more new data, comprise aforementioned every in the middle of certain combination etc. of two or more items.Memory controller 240 can receive the request for access thesaurus 250, and can suitably meet such request.Memory controller 240 can be arranged to make it not ensure to be written into thesaurus 250 by according to the order receiving data.In addition, memory controller 240 actual for asked data be written to the nonvolatile memory of thesaurus 250 before, memory controller 240 just can indicate it to be written with described data.
Described one or more item application 210 is included in any process that may relate in the process of establishments, deletion or more new data.Such process can perform under user model or kernel mode.Here used term " process " and modification thereof can comprise the object etc. of one or more traditional process, thread, assembly, storehouse, execution task.A process can be implemented with the combination of hardware, software or hardware and software.In one embodiment, a process is any mechanism (calling howsoever) can implemented an action or be used to an execution action.A process can be distributed on multiple device or individual devices.Described one or more item application 210 can make file system requests (such as by function/method call) by API215 to I/O manager 235.
I/O manager 235 can be determined to memory controller 240(or other intermediate modules a certain) send which bar or which I/O asks.I/O manager 235 can also along with the continuation of the operation be associated with file system requests, complete or failure and apply 210 return datas to one or more item.When file system requests relates to affairs, I/O manager 235 can notify task manager (not shown), thus task manager can suitably management transaction.In certain embodiments, the function of task manager can be included in I/O manager 235.
File system component 220 can use Copy on write, original position to write, aforementioned every certain combination etc. is written to thesaurus 250 file system object or about the metadata of file system object.Term " file " can comprise catalogue, the file system object without filial generation (it is such as regarded as file sometimes), alternative document system object etc.
In Copy on write, before the data of amendment file, the portion of the data that will be modified is copied to another position.In position in write, the data of file can be revised in position, and raw data is not copied to another position.The mixing of Copy on write and original position write can comprise implements Copy on write for the metadata about file, implements original position write for the data comprised hereof simultaneously.
Can in the situation of affairs the object of updating file system.Affairs are can by of various attribute description group operation, and described attribute such as comprises atomicity, consistance, isolation and persistence.Here used " affairs " can at least be defined by consistance attribute, and can be defined by above or more other attributes of item.
Consistance attribute refers to the data mode allowed about one or more file.Before the transaction started or after affairs complete, each file of file system should be in allowed state (although it can experience the state be not allowed to during affairs).For example, banking business may be implemented as the set be made up of two operations: from the debit of an account, and for the credit of another account.In this embodiment, consistance can be defined as making the combination account balance of bank and account-holder to be a constant (such as T=A+B, wherein T is a constant, A=bank balance, B=account-holder surplus).In order to implement consistance in this embodiment, described debit and credit operation only need to be for the identical amount of money, and or the two all complete, or not complete on each account.
Checkpoint can be write to indicate the consistent state of file system.Checkpoint can comprise one or more Validation Code (such as one or more School Affairs, hash or other data), and it can be used to determine checkpoint and/or whether the data that are associated with checkpoint are correctly written and coil.When recovering, the checkpoint of the last write can be found.(multiple) Validation Code of described checkpoint can be used subsequently whether to be correctly written to the data determining this checkpoint and/or be associated with this checkpoint coil.If not, then can locate previous checkpoint and check its validity, until find effective checkpoint.Once find nearest effective checkpoint, just know the nearest consistent state of file system.The file system operation occurred after this point can be dropped, or desirably can implement additional recovery action.
In one embodiment, the object in file system can use D nrepresent, wherein n is to this object of system banner.Each object in file system is serialized (namely can be represented as the data on thesaurus 250) and de-serialization.Each object identifier is associated with its position on thesaurus 250 by Object table.
In amendment affairs, first time upgrades D ntime, search its position in Object table by utilizing n and find D n.In order to in one example in which, D nmemory location on thesaurus 250 is referred to as L 1.
L is read subsequently from thesaurus 250 1content, by described object de-serialization (such as converting the structure of described object from Serialization formats to), and the various piece that will revise of described object can be copied in main system memory.In memory described various piece (or its copy) is implemented and upgrade.Various piece in combined memory is modified, and on thesaurus 250, in order to specify one or more reposition through the part of amendment, (it is referred to as L 2).
These copies in main system memory are here referred to as " logical copy " of object sometimes.The logical copy of object comprises the one or more data structures that can be used to represent this object.Logically, logical copy is the duplicate of object.Physically, logical copy can comprise the data (pointer comprising pointing to other data) that can be used to the duplicate creating object.For example, in one implementation, logical copy can be the actual copy (such as by bit-copy) of object, or comprises the data structure that can be used to the data creating described object.
In another kind of implementation, the logical copy of unmodified can comprise the one or more pointers pointing to primary object.Along with described logical copy is modified, some pointers in this logical copy can point to new memory location (such as the changing section of described logical copy), and other pointers then can point to the various piece (such as the non-changing section of described logical copy) of primary object.Utilize described pointer, the data through amendment can be utilized to construct copy through amendment together with the data of the unmodified of primary object.Such as can implement and create logical copy to reduce the storage space needed for duplicate creating object.
In addition, although sometimes refer to serialization and de-serialization here, be not intended to the various aspects of theme as described herein to be restricted to serialization considered traditionally and de-serialization.In one embodiment, serialization version can be identical by bit with de-serialization version.In another embodiment, the bit of serialization version can be packaged according to the form and order being different from de-serialization version.In fact, in one embodiment, serialization and de-serialization are to be understood as and mean for storing and fetching any mechanism of the data representing object from thesaurus.Other mechanism such as can comprise with text formatting the attribute of object be written to thesaurus, with markup language the attribute coding of object in storage, by the attribute of object and other characteristic storage other modes on thesaurus etc.
Under the wish of system (such as affairs submit to after or a certain other times), described system can through amendment logical copy serialization get back to stable, but this is at position L 2place implements.Described intention logical copy through amendment being write back to reposition is referred to as write plan.Write plan can identify the renewal of the arbitrary number for one or more object.Write plan can relate to the change occurred in more than in affairs.Multiple write plan mix can be become single write plan.
Write plan manager 237 can participate in creating the write plan for every renewal.When write plan relates to multiple file system object (such as in the situation of affairs), write plan manager 237 can operate to generate the write plan of instruction All Files system object position on the storage means involved in described affairs, to be consistent state for file system.
When occurring after revising immediately preceding checkpoint, the block (it can be replicated in multiple position) being referred to as recovery block can be revised, to point to initial (the i.e. L of the logical copy through amendment 2).L 2a field in the object at place points to the position next will be written into.This field represents the ring in the write plan chain occurred between checkpoint.
In conjunction with the request sent for write logical copy, amendment can be made to described Object table.Specifically, can be set to by the positional value of object identifier index will at positional value (the i.e. L of this storage through the logical copy of amendment 2).By making like this for object D nfollow-up the searching of position will be introduced to position L 2, i.e. the redaction of object.
If the more than one object of a transactions modify, such as D iand D j, then described object is regarded as each other " atom combination ", and is written in the works a write.Write plan can indicate this relation (such as with the form of the link for involved object).
In this manner, the object of arbitrary number can be continued.According to the mode identical with any other object, Object table can also be periodically written to thesaurus 250.
In conjunction with transmission for request Object table being written to thesaurus 250, can also send to memory controller 240 and wash away order.Wash away the nonvolatile memory that order instruction memory controller 240 is written to all data be not yet written into from its volatile memory thesaurus 250.
Periodically checkpoint can be written to memory storage, as will be described in more detail below.Checkpoint can be indicated by the check point record stored by thesaurus 250.Checkpoint can be written at any time, and can become stable/lasting after washing away.Stablize/refer to checkpoint to be lastingly stored on the nonvolatile memory of thesaurus.
After checkpoint is stable/lasting, can be used to any in the early time and do not recycled by the space of the object copies (or its part) used.After having washed away, subsequently recovery block is pointed to the initial of ensuing write plan chain.In one embodiment, described recovery block can the reposition of the initial sensing Object table of said write plan chain.
Composition graphs 3 describes a more concrete example, and this figure is the block scheme of the various aspects that theme as described herein is shown.As shown in the figure, Fig. 3 shows primary memory 305 and thesaurus 250.Line 307 represents the division between primary memory 305 and thesaurus 250.Object above line 310 is in primary memory, in the volatibility that the object below line 310 is then in thesaurus 250 or nonvolatile memory.
In primary memory 305, object 314-316 is shown.In implementation process, object 314-316 can be the de-serialization logical copy of object 319-321 respectively.Object 319 is in position 1550 place on thesaurus 250, and object 320 is in position 200 place on thesaurus 250, and object 321 is in position 800 place on thesaurus 250.
Object table 310 comprises the key value pair of the position of denoted object 314-316 on thesaurus 250.Utilize the identifier (n) of object 314-316 to each key value to carrying out index.
When transactions modify object 316 (such as by be foo.txt by its name changing), consistance assembly (the consistance assembly 220 of such as Fig. 2) can determine new memory location (such as position 801) for the object after upgrading.If described to liking a file, then in the situation of affairs, upgrade its title the catalogue comprising this file also may be caused also to be involved in described affairs.For example, when filename is changed, represents the object of this file and represent that the object comprising the catalogue of this file may need to be involved in described affairs.In this case, the catalogue comprising described object is represented as object 314, and the logical copy of catalogue (such as object 318) after upgrading is represented as the object 323 in thesaurus 250.In addition, table 310 has logically been updated to table 311 to indicate the new memory location (namely 801 and 1000) of the object (i.e. object 317 and 318) through amendment.
Such as can be indicated clearly by other assemblies a certain of I/O manager 235 or Fig. 2 or be determined that in the situation of affairs, revise a certain object also can affect another object.
When relating to two or more objects in the renewal affairs, described object is regarded as " atom combination ", as mentioned before.In recovery operation, unless found the change of all objects corresponding to and change in the situation of described affairs in thesaurus 250, otherwise the institute found changes and is all dropped.In other words, if the change that have found corresponding to one of them object but do not find change corresponding to another object, then abandon the change corresponding to one of them object described.
In order to atom is in conjunction with two or more objects, in one embodiment, can in thesaurus 250, store a pointer or by other means it is associated with each object.A pointer can indicate the memory location of another object (or its part) related in affairs.If do not relate to additional object in affairs, then described pointer can point to one " dead block ", or indicates the memory location of " head " object of another write plan.Object (or its part) that this object can comprise a write plan, the process of said write plan is revised etc.
Except pointing to the pointer of next memory location, the data of the correct content of the object indicating institute's " sensing " can also be stored in thesaurus 250.For example, the hash of the correct content of the object pointed by instruction can be stored.
In the example that Fig. 3 provides, the pointer be associated with object 322 can point to the memory location be associated with object 323.Described pointer combines these two objects.If during restoration do not find one of them object or its not there is correct content, then can abandon by the change of found object encoding.
Due to the character of thesaurus 250, possibly cannot ensure that first which object will be written to the nonvolatile memory of thesaurus 250.If first object 322 is written into and object 323 is not written into, then the pointer from object 322 will point to the memory location may with false data.But pass through the hash of the data calculating described memory location place and this hash and the hash stored for object 322 are compared, can the Data Detection at position 1000 place for having invalid data.In this case, during restoration, RMAN (RMAN 225 of such as Fig. 2) can abandon the change represented by object 322 and 323.
Recovery block 330 points to first memory location (being position 801 in this example) that should store data after checkpoint at this place.Recovery block 330 can also comprise the hash that utilizes the correct content being stored in the object at described first memory location place to calculate or associated.
Fig. 4 is the diagram occurring in the renewal in file system of the various aspects represented generally according to theme as described herein.Global table 405 comprises the Object table that identifies the position of each object on thesaurus and the distribution data about the space be assigned with on thesaurus 250.Show ongoing renewal 410 in addition.When renewal touches time shaft 415, described renewal completes, and no longer needs any content revising global table 405.Each the renewal line upgrading 410 can represent multinomial renewal.If need to make multinomial renewal together to keep consistency, then can make described renewal in the situation of affairs.
In order to make checkpoint effective, need to write described checkpoint under consistent state.For Copy on write file system, when upgrading a certain object, the logical copy of the object revised is stored in a new position of file system.This reposition is reflected by the renewal for Object table in Object table.In order to consistance, Object table reflects that the renewal be not yet written on dish will be wrong, this is because described renewal may not be written on dish completely before the system failure.Similarly, if upgraded and be written to that dish is upper and other affairs are relevant renewal is also done but Object table does not show described renewal, then this will be also wrong.
In order to ensure consistance, when needing to reflect the metadata corresponding to and upgrade in global table, select checkpoint.If represent that each line instruction of renewal 410 can make the one-period of renewal to global table 405 for described renewal, then implement checkpoint in the time 520 and may produce inconsistent state, implement checkpoint in the time 525 and then will produce consistent state.
Fig. 5 is the block scheme of the exemplary checkpoint bucket of the various aspects illustrated according to theme as described herein.In order to solve above-mentioned problem and other problems, each renewal can be associated with a checkpoint bucket (such as one of them bucket 515).Checkpoint bucket refer to be shown in the checkpoint data of checkpoint be written to dish upper before need to upgrade global table at least to consider the logical concept of the write plan of the renewal be associated with described checkpoint bucket.In other words, need to upgrade global table to consider position and the assignment information of every renewal of bucket, although described renewal is current or may may not be written to these positions.
Can periodically (such as based on recover the checkpoint timer of window expire time, after the write that there occurs certain number of times, after beyond other threshold values a certain etc.) make the decision generating checkpoint.When this happens, checkpoint manager can upgrade the data (such as data structure 510) that instruction checkpoint bucket is associated with follow-up renewal.For example, checkpoint manager can obtain the mutex lock (such as locking 505) in the data (such as data structure 510) of instruction current check point bucket.After checkpoint manager obtains the mutex lock in described data, checkpoint manager can upgrade described data correspond to follow-up renewal new checkpoint bucket with instruction.All follow-up more new capital is associated with described new checkpoint bucket, until described data are changed to indicate another checkpoint bucket corresponding to follow-up renewal.
Checkpoint bucket can be regarded as a kind of logical concept, and can implement in several ways.For example, in one implementation, checkpoint bucket may be implemented as the data structure of such as list and so on, and described list has each pointer upgraded pointing to and be associated with checkpoint bucket.As another example, checkpoint bucket may be implemented as and upgrades for each the data kept, and wherein said data indicate the checkpoint be associated with described renewal.As another example, checkpoint bucket may be implemented as counting semaphore.In this embodiment, may not knowing which upgrades still needs to be written on dish, but knows the counting still needing the renewal be written on dish.Read/write can be used in this embodiment to lock.
Example is above not intended to comprise or all modes of exhaustive enforcement checkpoint bucket.In fact, based on instruction here, those skilled in the art will recognize that other mechanism many for implementing checkpoint bucket.
After instruction corresponds to the checkpoint bucket (such as by changing data structure 510) of follow-up renewal, checkpoint manager can wait the write plan of all renewals corresponded in current check point bucket to be generated.After generating the write plan corresponding to all renewals in current check point bucket (but may memory storage be written into), checkpoint manager can obtain the snapshot of the global table 405 of Fig. 4, and creates for the write plan snapshot of global table 405 being written to thesaurus.Can be a logical copy of global table 405 by snapshot creation by Copy on write or other mechanism.
Get back to Fig. 4, the write plan of the renewal after can generating corresponding to checkpoint and be written on dish, all renewals simultaneously in the current check point bucket to be generated such as checkpoint manager, and checkpoint manager also generates the write plan for write checkpoint simultaneously.But when the snapshot obtaining global table attempted by checkpoint manager, checkpoint manager can obtain the mutex lock in global table 405 before creating described snapshot.While checkpoint manager has mutex lock, still even can these write plans can be stored on thesaurus for other more newly-generated write plans, but until checkpoint manager discharges its mutex lock, global table (such as Object table) just can be updated to point to these write plans.In conjunction with the described locking of release, checkpoint manager can send a signal (such as causing a certain event), and its instruction subsequent examination point has been activated and follow-up renewal can upgrade global table.
In order to help to recover, checkpoint can be written to together with Validation Code dish upper in case according to rule verification below checkpoint:
1, wait for that the data indicated by write plan are written to dish upper (such as waiting for that all renewals be associated with checkpoint are written on dish);
2, ask all data be associated with checkpoint to be written to and coil (such as asking the logical copy of metadata to be written on dish);
3, initiation or wait are washed away and are waited for and show to wash away the confirmation be successfully completed.
4, the Validation Code corresponding to and be written to the checkpoint on dish is generated.In one embodiment, described Validation Code can correspond to the subset being written to the data on dish.For example, if the data corresponding to file are stored in tree, each node of wherein said tree comprises the Validation Code corresponding to its filial generation, then described Validation Code can be the root node corresponding to tree.In this embodiment, Validation Code can write together with root node, and can be used to verify that described Validation Code is correct.
5, request is written to Validation Code (and any data be associated of such as root node and so on) on dish.Noted before system jam, Validation Code may in fact on arrival dish.If it's not true, then described checkpoint is not effective checkpoint.
Had these rules, during restoration, if find checkpoint on the storage means and the internal verification code of checkpoint is effective, then other data be associated with checkpoint also expection to be stored on described memory storage and to be effective.If Validation Code is included in root node, then other data (such as pointing to the pointer of other nodes in tree) in root node can be used to find the remainder data corresponding to checkpoint.
Alternatively, correspond to each Validation Code upgraded be associated with checkpoint and can be written to memory storage.For example, checkpoint can indicate each block of all renewals that should occur before this checkpoint and after previous checkpoint.For each indicated block, checkpoint can store the Validation Code of the correct content of this block of instruction.Between the convalescence in this alternative, in order to verify checkpoint, the Validation Code that be associated of each block about its checkpoint can be verified.
Get back to Fig. 2, in one embodiment, checkpoint manager 230 can operate to implement and comprise the following action:
1, the first checkpoint will be associated with the request for updating file system object is determined.As previously mentioned, checkpoint manager 230 can realize this point by upgrading a certain data structure (data structure 510 of such as Fig. 5) to point to new checkpoint bucket.Subsequently along with the every subsequent request first received for upgrading, this request can be assigned to this new checkpoint bucket.
Note used term " first " here and do not mean that first actual checkpoint; Contrary its is used to distinguish with " second " checkpoint.In other words, if there is N number of checkpoint, then the first checkpoint can be any X of wherein 1<=X<=N, and the second checkpoint can be wherein 1<=Y<=N and any Y of X<>Y.
2, the memory storage checkpoint data be associated with checkpoint being written to file system is determined when.For example, can be that checkpoint timer expires, can be that more new data is exceeded, or other threshold values a certain can be used for having determined the time writing checkpoint data.
3, the second checkpoint is determined for the subsequent request for updating file system object.As previously mentioned, checkpoint manager 230 can realize this point by upgrading described data structure after obtain mutex lock (such as locking 505) in data structure (data structure 510 of such as Fig. 5).
4, pending file system consistent state and simultaneously allow prepare for subsequent request write data.When all renewals be associated with current check point bucket are indicated on (being such as successfully writing to) memory storage, consistent state occurs.Allow to prepare subsequent request write data are comprised to allow write plan is generated for subsequent request and is written to memory storage, but until the logical copy creating metadata (such as global table) just allows to upgrade described metadata.
5, the logical copy of the metadata of file system is created.This point can be realized by the snapshot obtaining global table as previously mentioned.
6, the logical copy of metadata is written to memory storage.In one embodiment, this respect can comprise request and logical copy is written to memory storage and waits for the confirmation being written to memory storage about described logical copy.In another embodiment, the described copy on memory storage is labeled as clean before can being included in and allowing the follow-up renewal for metadata by this respect, thus makes described renewal cause Copy on write.
7, near one item missing Validation Code is written to memory storage.As previously mentioned, Validation Code can be used whether to be written to memory storage to the renewal before determining checkpoint and whether check point record itself is effective.
API215 can receive the request for being modified in referent in affairs.Responsively, I/O manager 235 can at a certain memory location (the such as L of thesaurus 1) place locates described object, creates the logical copy of described object, makes a change in the situation of affairs to described object, determine storing the second memory location (the such as L through the logical copy changed 2), send for the request of write through the logical copy of change to memory controller 240, and renewal volatile data structure (such as Object table 310) is stored in the second memory location place to indicate logical copy.
If API215 receives the request for another object related in described affairs, then I/O manager 235 can implement additional action, and what other objects described and the first object are combined comprising establishment associates (such as writing plan).Subsequently, in conjunction with transmission for the request amendment of each object being written to memory storage, I/O manager 235 can also send request for described association being written to memory storage to memory controller 240.
Fig. 6-8 be represent generally according to the various aspects of theme as described herein the process flow diagram of contingent exemplary action.Simple in order to explain, the method that composition graphs 6-8 describes is described and is described as series of steps.Be to be understood that and recognize, the various aspects of theme as described herein are not limited to shown step and/or the order of each step.In one embodiment, each step occurs in sequence according to described below.But in other embodiments, each step can occur concurrently, occur in sequence according to another and/or with here do not provide and occur together with other steps described.In addition, the shown method implementing the various aspects according to theme as described herein in steps may not be needed.In addition understanding is also recognized by those skilled in the art, and described method can be represented as the state of a series of cross-correlation alternatively by constitutional diagram or be represented as event.
Translate into Fig. 6, in square frame 605, described action starts.In square frame 610, make instruction and upgrade the instruction of gathering and being associated with the first checkpoint by first.This respect can will to be associated follow-up renewal with instruction and to realize by revising a certain data structure with the first checkpoint.This respect such as can relate to acquisition and discharge locking and renewal pointer or other data structures to point to a certain checkpoint bucket, as mentioned before.Again note, " first " can mean any checkpoint of file system, and is used to this checkpoint and subsequent examination point to distinguish.For example, with reference to Fig. 2 and 5, checkpoint manager 230 can obtain the locking 505 in data structure 510, and upgrades described pointer to point to one of them checkpoint bucket 515.
In square frame 615, receive and upgrade and it is associated with the first checkpoint.For example, update request can be received by API215 from (multiple) application 210 with reference to Fig. 2, I/O manager 235.When receiving renewal, it can be associated with checkpoint.
In block 620, determine that the checkpoint data by the first checkpoint are written to the memory storage of file system.For example, with reference to Fig. 2, checkpoint manager 230 can determine that checkpoint timer expires, and can determine thus to be written to thesaurus 250 by checkpoint.
In square frame 625, be used to indicate acquisition locking in the data structure corresponding to the checkpoint of follow-up renewal.For example, with reference to Fig. 2 and 5, checkpoint manager 230 can obtain the locking 505 in data structure 510.
In block 630, described data structure is upgraded to point to another checkpoint.Revise the instruction of this data structure to be associated any renewal occurred after first upgrades set with subsequent examination point.For example, with reference to Fig. 2 and 5, checkpoint manager 230 can more new data structure 510 to point to another checkpoint bucket 515.
In square frame 635, discharge described locking.For example, with reference to Fig. 2 and 5, checkpoint manager 230 can discharge locking 505.
In square frame 640, generate the write plan corresponding to described renewal.Each write plan upgrades at least one planning location at least one in the middle of the set data instruction memory storage upgraded for expression first.For example, with reference to Fig. 2, write plan manager 237 can participate in creating the write plan for the renewal be associated with a certain checkpoint.
In square frame 645, for said write schedule regeneration metadata.This metadata instruction correspond to each write plan memory location (although said write plan may by or not yet may be written to memory storage).For example, with reference to Fig. 2, write plan manager 237 can upgrade global table to indicate by the memory location object of said write plan modification.
After square frame 645, described action continues in the square frame 705 of Fig. 7.Translate into Fig. 7, in square frame 705, locking is obtained for described metadata.For example, with reference to Fig. 2 and 4, checkpoint manager 230 can obtain locking in global table 405.Checkpoint manager 230 can wait until metadata reflect corresponding to first upgrade set in the middle of all renewals memory location till (although all these renewals may by or may not yet be written to these memory locations).
In block 710, the logical copy of metadata is created.As previously mentioned, this respect can relate to the new copy creating metadata, is clean metadata token thus make the follow-up renewal for described metadata cause Copy on write, or certain other logical copy mechanism.For example, with reference to Fig. 2 and 4, checkpoint manager 230 can make the logical copy of global table 405.
In square frame 715, discharge described locking.For example, with reference to Fig. 2 and 4, checkpoint manager 230 can discharge the locking in global table 405.
In block 720, the write plan for write first checkpoint data is created.Create this write plan to be written on dish and to occur concurrently with for the more newly-generated write plan (and being written on dish) after described checkpoint and corresponding to the data of current write plan.For example, with reference to Fig. 2, checkpoint manager 230 can use write plan manager 237 to create the write plan of the checkpoint data for the first checkpoint.These data can comprise the logical copy of global table, as mentioned before.
In square frame 725, in one embodiment, till checkpoint manager can be waited until the first renewal set all more new capital is successfully writing to memory storage.All more after new capital is successfully writing to memory storage, upgrade manager and can write the final inspection point record comprising Validation Code subsequently.As mentioned before, Recovery processing is allowed to check described Validation Code simply like this to determine whether that all more new capital expection corresponding to described checkpoint is written to memory storage.
In another embodiment, checkpoint manager can write several Validation Codes in a check point record.The memory location that these Validation Codes can upgrade the every renewal in the middle of gathering with first is associated.In this embodiment, checkpoint manager can wait until that these renewals are written to memory storage, or can write check point record when not waiting for.If an option after have selected, then, compared with being on dish with the effective check point record of checking, during restoration may relating to more and find suitable checkpoint.
In square frame 730, checkpoint data can be written to memory storage.The write plan that this respect such as can relate to being associated with checkpoint data is written to memory storage.As another example, this respect can relate to the check point record of the logical copy relating to each global table is written to memory storage.For example, with reference to Fig. 2, the checkpoint manager 230 write plan corresponding to checkpoint data of can asking is written to memory storage.
In square frame 735, near one item missing Validation Code is written to memory storage.Can be combined with the measure check point record of the logical copy relating to each global table being written to memory storage by the measure at least one Validation Code being written to memory storage.For example, with reference to Fig. 2, checkpoint manager 230 can be written to memory storage check point record, and described check point record relates to the logical copy of each global table and comprises the Validation Code of the content for verifying described check point record.
In square frame 740, other actions (if any) can be implemented.
Translate into Fig. 8, in square frame 805, described action starts.In block 810, recovery request is received.For example, with reference to Fig. 2, RMAN 225 can receive for the data be stored on thesaurus 250 being implemented to the recovery request recovered.
In square frame 815, localization examination point data.For example, with reference to Fig. 2, RMAN 225 can be located and is stored in thesaurus 250(or other thesauruss a certain) on nearest checkpoint data.
In square frame 820, Validation Code is utilized to verify described checkpoint data.For example, with reference to Fig. 2, RMAN 225 can the School Affairs of calculating inspection point data, and this School Affairs is compared with the School Affairs stored together with the data of checkpoint.If two School Affairs couplings, then can think that checkpoint is effective.If wish to carry out extra checking, then RMAN can be attempted verifying the one or more objects indicated by each global table involved by the data of checkpoint.
In square frame 825, other actions (if any) can be implemented.
Can see from embodiment above, disclosed the various aspects of the checkpoint related to for file system.Although the various aspects of theme as described herein can have various amendment and replacing structure, shown in the drawings and at previous embodiment its specific illustrated embodiment.But should be understood that; the various aspects of theme required for protection are not intended to be limited to disclosed concrete form; on the contrary, all modifications, replacing structure and the equivalent that are intended that in the spirit and scope containing the various aspects dropping on theme as described herein here.

Claims (14)

1., at least in part by a computer-implemented method, described method comprises:
Instruction will upgrade set first and be associated with the first checkpoint;
Determine the memory storage checkpoint data about the first checkpoint being written to file system, described file system uses Copy on write to carry out the data of updating file system;
Indicate and be associated any renewal occurred after first upgrades set with subsequent examination point;
Generate and upgrade the write plan of gathering for first, each write plan instruction corresponds at least one planning location on the memory storage of at least one the data upgraded in the middle of expression first renewal set;
More new metadata in case the allotment indicating file system according to this and correspond to by the memory location of the file system object of said write plan modification;
Create the logical copy of metadata;
Create the write plan for write first checkpoint data, allow the write plan generated concurrently with the plan of establishment said write corresponding to follow-up renewal simultaneously; And
At least one Validation Code is written to memory storage, and described at least one Validation Code is a part for checkpoint data, and described at least one Validation Code can be used to determine whether the first renewal set is correctly written memory storage.
2. the method for claim 1, it also comprises: before described at least one Validation Code is written to memory storage, waits for that expression first upgrades the data gathered and is written to memory storage.
3. the method for claim 1, wherein, at least one Validation Code is written to memory storage comprise in one block individual event Validation Code is written to memory storage together with other data of the root node of at least one tree data structure relating to the logical copy representing described metadata, and comprises and calculate described individual event Validation Code to verify described piece.
4. the method for claim 1, it also comprises: read described at least one Validation Code, other Validation Codes of one item missing are calculated to from the data memory storage, described at least one Validation Code and described other Validation Codes of at least one item are compared, and determines that expression first upgrades all data gathered and whether is successfully written to memory storage thus.
5. the method for claim 1, wherein, instruction will upgrade set first and be associated with the first checkpoint and comprise: upgrades a data structure, its indicate for be updated in described data structure indicate another checkpoint before any renewal of occurring by use first checkpoint.
6. the process of claim 1 wherein, create the logical copy of metadata and comprise: with at least one the data upgraded upgraded in the middle of set are written to memory storage and create described logical copy concurrently expression first.
7. a kind of system in computing environment, it comprises:
The interface of the request of the file system object received for updating file system can be operated;
Can operate to determine to be sent to the I/O manager that thesaurus is asked with one that meets described request or more item I/O; And
Checkpoint manager, it can operate:
Determine the first checkpoint will be associated with the request for updating file system object, every request can be assigned to different checkpoints by wherein said checkpoint manager;
Determine the memory storage by the checkpoint data be associated with described checkpoint being written to file system;
Determine to correspond to the second checkpoint for the subsequent request of updating file system object;
The consistent state of pending file system and simultaneously allow to prepare for subsequent request write data;
Create the logical copy of the metadata of file system;
Described logical copy is written to memory storage; And
At least one Validation Code is written to memory storage, and described at least one Validation Code can be used to determine whether the renewal before described checkpoint is written to memory storage.
8. the system of claim 7, wherein, described checkpoint manager can operate to determine that the checkpoint by being associated with the request for updating file system object comprises: described checkpoint manager can operate and upgrade a data structure, it indicates for the renewal occurred before determining that the checkpoint data be associated with the first checkpoint are written to the memory storage of file system by this checkpoint of use, and for the renewal after this occurred by use second checkpoint.
9. the system of claim 7, wherein, the memory storage that described checkpoint manager can operate to determine the checkpoint data be associated with checkpoint to be written to file system comprises: described checkpoint manager can operate determines that checkpoint timer expires, and described checkpoint timer is based on recovery window.
10. the system of claim 7, it also comprises the write plan manager that can operate to generate write plan, and said write plan instruction will upgrade with file system object in combination to keep the position of All Files system object on described memory storage of the consistent state of file system.
The system of 11. claims 7, wherein, the consistent state that described checkpoint manager can operate pending file system comprises: till on the memory storage that all more new capital that described checkpoint manager can operate by the time to be associated with the first check point file system object are indicated on file system.
The system of 12. claims 7, wherein, described checkpoint manager can operate to allow to prepare to comprise for subsequent request write data: described checkpoint manager can operate and allow generate write plan for subsequent request and be written into memory storage, but until the logical copy creating metadata just allows to upgrade described metadata.
13. 1 kinds at least in part by computer-implemented method, described method comprises:
Receive the recovery request for file system;
The checkpoint data of localization examination point on the memory storage of file system, described checkpoint data previously passedly comprise following action and generate:
Instruction is assigned to subsequent examination point by any renewal occurred after the every renewal be associated with described checkpoint;
For the every more newly-generated write plan be associated with described checkpoint, each write plan instruction at least one planning location on the memory storage representing one of them described renewal;
More new metadata is to indicate by the memory location of the object of said write plan modification;
Create the logical copy of metadata; And
About described checkpoint, at least one Validation Code is written to memory storage; And
Described Validation Code is utilized to verify described checkpoint data.
The method of 14. claims 13, wherein, described checkpoint data comprise to utilize described Validation Code to verify: the School Affairs calculating described checkpoint data, and the School Affairs of described checkpoint data and described Validation Code are compared.
CN201180029522.9A 2010-06-15 2011-06-01 For the checkpoint of file system Active CN102934114B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/815,418 US8224780B2 (en) 2010-06-15 2010-06-15 Checkpoints for a file system
US12/815418 2010-06-15
PCT/US2011/038811 WO2011159476A2 (en) 2010-06-15 2011-06-01 Checkpoints for a file system

Publications (2)

Publication Number Publication Date
CN102934114A CN102934114A (en) 2013-02-13
CN102934114B true CN102934114B (en) 2015-11-25

Family

ID=45097053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180029522.9A Active CN102934114B (en) 2010-06-15 2011-06-01 For the checkpoint of file system

Country Status (11)

Country Link
US (3) US8224780B2 (en)
EP (1) EP2583202B1 (en)
JP (1) JP5735104B2 (en)
KR (2) KR101840996B1 (en)
CN (1) CN102934114B (en)
AU (1) AU2011265653B2 (en)
BR (1) BR112012031912B1 (en)
CA (1) CA2803763C (en)
RU (1) RU2554847C2 (en)
TW (1) TWI492077B (en)
WO (1) WO2011159476A2 (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8433865B2 (en) 2009-12-11 2013-04-30 Microsoft Corporation Consistency without ordering dependency
US8793440B2 (en) 2010-06-17 2014-07-29 Microsoft Corporation Error detection for files
US9155320B2 (en) * 2011-07-06 2015-10-13 International Business Machines Corporation Prefix-based leaf node storage for database system
US8776094B2 (en) 2011-08-11 2014-07-08 Microsoft Corporation Runtime system
US9766986B2 (en) * 2013-08-08 2017-09-19 Architecture Technology Corporation Fight-through nodes with disposable virtual machines and rollback of persistent state
US9838415B2 (en) 2011-09-14 2017-12-05 Architecture Technology Corporation Fight-through nodes for survivable computer network
US9769250B2 (en) 2013-08-08 2017-09-19 Architecture Technology Corporation Fight-through nodes with disposable virtual machines and rollback of persistent state
US8543544B2 (en) * 2012-01-06 2013-09-24 Apple Inc. Checkpoint based progressive backup
FR2989801B1 (en) * 2012-04-18 2014-11-21 Schneider Electric Ind Sas METHOD FOR SECURE MANAGEMENT OF MEMORY SPACE FOR MICROCONTROLLER
KR102050723B1 (en) 2012-09-28 2019-12-02 삼성전자 주식회사 Computing system and data management method thereof
US9003228B2 (en) * 2012-12-07 2015-04-07 International Business Machines Corporation Consistency of data in persistent memory
US9304998B2 (en) * 2012-12-19 2016-04-05 Microsoft Technology Licensing, Llc Main-memory database checkpointing
US10002077B2 (en) 2014-01-31 2018-06-19 Hewlett Packard Enterprise Development Lp Persistent memory controller based atomicity assurance
CN103984609B (en) * 2014-05-28 2017-06-16 华为技术有限公司 A kind of method and apparatus that checkpoint is reclaimed in file system based on copy-on-write
US10635504B2 (en) 2014-10-16 2020-04-28 Microsoft Technology Licensing, Llc API versioning independent of product releases
US9892153B2 (en) * 2014-12-19 2018-02-13 Oracle International Corporation Detecting lost writes
CN106294357B (en) * 2015-05-14 2019-07-09 阿里巴巴集团控股有限公司 Data processing method and stream calculation system
US10031817B2 (en) 2015-11-05 2018-07-24 International Business Machines Corporation Checkpoint mechanism in a compute embedded object storage infrastructure
US10200401B1 (en) 2015-12-17 2019-02-05 Architecture Technology Corporation Evaluating results of multiple virtual machines that use application randomization mechanism
US10284592B1 (en) 2015-12-17 2019-05-07 Architecture Technology Corporation Application randomization mechanism
US10200406B1 (en) 2015-12-17 2019-02-05 Architecture Technology Corporation Configuration of application randomization mechanism
US10412114B1 (en) 2015-12-17 2019-09-10 Architecture Technology Corporation Application randomization mechanism
US10007498B2 (en) 2015-12-17 2018-06-26 Architecture Technology Corporation Application randomization mechanism
US10412116B1 (en) 2015-12-17 2019-09-10 Architecture Technology Corporation Mechanism for concealing application and operation system identity
CN105930223A (en) * 2016-04-24 2016-09-07 湖南大学 Method for reducing size of check point file
US20220100374A1 (en) * 2016-08-19 2022-03-31 Ic Manage Inc Hierarchical file block variant tracking for performance in parallelism at multi-disk arrays
KR101969799B1 (en) * 2016-09-07 2019-04-17 울산과학기술원 Electronic device and controlling method thereof
CN108509460B (en) 2017-02-28 2021-07-20 微软技术许可有限责任公司 Data consistency checking in distributed systems
US10554685B1 (en) 2017-05-25 2020-02-04 Architecture Technology Corporation Self-healing architecture for resilient computing services
US20190102262A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Automated continuous checkpointing
KR102022481B1 (en) * 2017-12-06 2019-09-18 연세대학교 산학협력단 Method for Generating Checkpoint of High Performance Computing System Using GPU Usage
KR102468737B1 (en) * 2017-12-19 2022-11-21 에스케이하이닉스 주식회사 Memory system and operating method thereof
US10754785B2 (en) 2018-06-28 2020-08-25 Intel Corporation Checkpointing for DRAM-less SSD
US11416453B2 (en) * 2019-04-23 2022-08-16 EMC IP Holding Company LLC Facilitating checkpoint locks for distributed systems
US11822435B2 (en) 2020-07-06 2023-11-21 Bank Of America Corporation Consolidated data restoration framework
US11620084B2 (en) 2020-12-30 2023-04-04 Samsung Electronics Co., Ltd. Storage device including memory controller and operating method of memory controller
CN113538754B (en) * 2021-06-08 2023-04-07 福建新大陆通信科技股份有限公司 CTID intelligent door lock authorization data management method and system
CN116048384A (en) * 2022-11-02 2023-05-02 中国科学院空间应用工程与技术中心 Writing method and system of metadata of file system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007127361A2 (en) * 2006-04-28 2007-11-08 Network Appliance, Inc. System and method for providing continuous data protection
US7412460B2 (en) * 2003-06-19 2008-08-12 International Business Machines Corporation DBMS backup without suspending updates and corresponding recovery using separately stored log and data files

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4959771A (en) * 1987-04-10 1990-09-25 Prime Computer, Inc. Write buffer for a digital processing system
US5963962A (en) * 1995-05-31 1999-10-05 Network Appliance, Inc. Write anywhere file-system layout
DE69435146D1 (en) * 1993-06-03 2008-11-13 Network Appliance Inc Method and apparatus for describing arbitrary areas of a file system
US6035399A (en) * 1995-04-07 2000-03-07 Hewlett-Packard Company Checkpoint object
US5864657A (en) * 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5907849A (en) * 1997-05-29 1999-05-25 International Business Machines Corporation Method and system for recovery in a partitioned shared nothing database system using virtual share disks
US6185663B1 (en) * 1998-06-15 2001-02-06 Compaq Computer Corporation Computer method and apparatus for file system block allocation with multiple redo
US6374264B1 (en) * 1998-09-04 2002-04-16 Lucent Technologies Inc. Method and apparatus for detecting and recovering from data corruption of a database via read prechecking and deferred maintenance of codewords
JP2001101044A (en) * 1999-09-29 2001-04-13 Toshiba Corp Transactional file managing method and transactional file system and composite transactional file system
US6571259B1 (en) * 2000-09-26 2003-05-27 Emc Corporation Preallocation of file system cache blocks in a data storage system
US6629198B2 (en) * 2000-12-08 2003-09-30 Sun Microsystems, Inc. Data storage system and method employing a write-ahead hash log
US7730213B2 (en) * 2000-12-18 2010-06-01 Oracle America, Inc. Object-based storage device with improved reliability and fast crash recovery
US6678809B1 (en) * 2001-04-13 2004-01-13 Lsi Logic Corporation Write-ahead log in directory management for concurrent I/O access for block storage
JP2003223350A (en) * 2002-01-29 2003-08-08 Ricoh Co Ltd Data base system
US6993539B2 (en) 2002-03-19 2006-01-31 Network Appliance, Inc. System and method for determining changes in two snapshots and for transmitting changes to destination snapshot
US20040216130A1 (en) * 2002-08-30 2004-10-28 Keller S. Brandon Method for saving and restoring data in software objects
US7457822B1 (en) 2002-11-01 2008-11-25 Bluearc Uk Limited Apparatus and method for hardware-based file system
US7216254B1 (en) 2003-03-24 2007-05-08 Veritas Operating Corporation Method and system of providing a write-accessible storage checkpoint
JP3798767B2 (en) * 2003-06-26 2006-07-19 株式会社東芝 Disk control system and disk control program
US7401093B1 (en) 2003-11-10 2008-07-15 Network Appliance, Inc. System and method for managing file data during consistency points
US7721062B1 (en) 2003-11-10 2010-05-18 Netapp, Inc. Method for detecting leaked buffer writes across file system consistency points
US7054960B1 (en) 2003-11-18 2006-05-30 Veritas Operating Corporation System and method for identifying block-level write operations to be transferred to a secondary site during replication
US7168001B2 (en) * 2004-02-06 2007-01-23 Hewlett-Packard Development Company, L.P. Transaction processing apparatus and method
US7664965B2 (en) * 2004-04-29 2010-02-16 International Business Machines Corporation Method and system for bootstrapping a trusted server having redundant trusted platform modules
JP4104586B2 (en) * 2004-09-30 2008-06-18 株式会社東芝 File system having file management function and file management method
US7505410B2 (en) * 2005-06-30 2009-03-17 Intel Corporation Method and apparatus to support efficient check-point and role-back operations for flow-controlled queues in network devices
US7693864B1 (en) * 2006-01-03 2010-04-06 Netapp, Inc. System and method for quickly determining changed metadata using persistent consistency point image differencing
US20070234342A1 (en) * 2006-01-25 2007-10-04 Flynn John T Jr System and method for relocating running applications to topologically remotely located computing systems
CN101501653B (en) * 2006-02-06 2012-04-04 X档案公司 Long term backup on disk
US7870356B1 (en) * 2007-02-22 2011-01-11 Emc Corporation Creation of snapshot copies using a sparse file for keeping a record of changed blocks
US8495573B2 (en) 2007-10-04 2013-07-23 International Business Machines Corporation Checkpoint and restartable applications and system services
TWI476610B (en) * 2008-04-29 2015-03-11 Maxiscale Inc Peer-to-peer redundant file server system and methods
JP5425922B2 (en) * 2008-10-30 2014-02-26 インターナショナル・ビジネス・マシーンズ・コーポレーション Method, system, and computer program for performing data writing on a storage device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7412460B2 (en) * 2003-06-19 2008-08-12 International Business Machines Corporation DBMS backup without suspending updates and corresponding recovery using separately stored log and data files
WO2007127361A2 (en) * 2006-04-28 2007-11-08 Network Appliance, Inc. System and method for providing continuous data protection

Also Published As

Publication number Publication date
KR20130115995A (en) 2013-10-22
EP2583202A4 (en) 2017-03-15
CA2803763A1 (en) 2011-12-22
BR112012031912A2 (en) 2016-11-08
CN102934114A (en) 2013-02-13
BR112012031912B1 (en) 2020-11-17
US8224780B2 (en) 2012-07-17
US8924356B2 (en) 2014-12-30
CA2803763C (en) 2019-02-26
US20150178165A1 (en) 2015-06-25
EP2583202A2 (en) 2013-04-24
KR20170132338A (en) 2017-12-01
US20110307449A1 (en) 2011-12-15
KR101805948B1 (en) 2017-12-07
JP2013528883A (en) 2013-07-11
US20120259816A1 (en) 2012-10-11
JP5735104B2 (en) 2015-06-17
EP2583202B1 (en) 2022-01-12
WO2011159476A3 (en) 2012-02-16
RU2012154324A (en) 2014-06-20
AU2011265653B2 (en) 2014-05-15
WO2011159476A2 (en) 2011-12-22
TWI492077B (en) 2015-07-11
KR101840996B1 (en) 2018-03-21
AU2011265653A1 (en) 2012-12-13
RU2554847C2 (en) 2015-06-27
TW201202981A (en) 2012-01-16

Similar Documents

Publication Publication Date Title
CN102934114B (en) For the checkpoint of file system
US10437795B2 (en) Upgrading systems with changing constraints
CN106030533B (en) It is executed by split process and retries affairs automatically
CN101604335B (en) Systems and methods for automatic database or file system maintenance and repair
CN101361047B (en) Method and system for data protection in storage systems
US9146735B2 (en) Associating workflows with code sections in a document control system
CN101401097A (en) Detecting database events using recovery logs
US20120259824A1 (en) Maintaining index data in a database
CN102667720B (en) Do not sort the concordance relied on
JP2006072986A (en) Verifying dynamically generated operations on data store
US20060200500A1 (en) Method of efficiently recovering database
US8214393B2 (en) Integrating database deployment with code deployment
CN114925084B (en) Distributed transaction processing method, system, equipment and readable storage medium
CN101416143A (en) User interface morph based on permissions
JP4432087B2 (en) Database update management system, program and method
CN104572439A (en) Regression alert method and system
CN113672277B (en) Code synchronization method, system, computer device and storage medium
CN117891794A (en) Log generation method and device, terminal equipment and storage medium
CN117193740A (en) Data distribution method, device, computing equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: MICROSOFT TECHNOLOGY LICENSING LLC

Free format text: FORMER OWNER: MICROSOFT CORP.

Effective date: 20150611

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20150611

Address after: Washington State

Applicant after: Micro soft technique license Co., Ltd

Address before: Washington State

Applicant before: Microsoft Corp.

C14 Grant of patent or utility model
GR01 Patent grant