WO2019091085A1 - 一种快照比对的方法和装置 - Google Patents

一种快照比对的方法和装置 Download PDF

Info

Publication number
WO2019091085A1
WO2019091085A1 PCT/CN2018/087771 CN2018087771W WO2019091085A1 WO 2019091085 A1 WO2019091085 A1 WO 2019091085A1 CN 2018087771 W CN2018087771 W CN 2018087771W WO 2019091085 A1 WO2019091085 A1 WO 2019091085A1
Authority
WO
WIPO (PCT)
Prior art keywords
snapshot
time
file
node
target
Prior art date
Application number
PCT/CN2018/087771
Other languages
English (en)
French (fr)
Inventor
王加元
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019091085A1 publication Critical patent/WO2019091085A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Definitions

  • the present application relates to the field of storage, and more particularly to a method and apparatus for snapshot alignment in the field of storage.
  • An incremental backup is a backup for incremental files after performing a full backup or a previous incremental backup for data. That is, only the current time and the last backup time need to be backed up for each incremental backup. Data between the file system changes.
  • a snapshot is a technology that protects the data of a file system, and is used to protect the state of the file system at a certain time (for example, when the data backup is started).
  • the present application provides a method and apparatus for snapshot comparison, which helps to improve the efficiency in the snapshot comparison process.
  • a method for snapshot alignment comprising:
  • Obtaining a time tree corresponding to a current snapshot of the file system including a root node layer, an intermediate node layer, and a leaf node layer, the root node layer including a root node, the intermediate node layer including at least one layer intermediate node
  • the root node points to a first layer intermediate node in the intermediate node layer, and a last layer intermediate node in the intermediate node layer points to a leaf node in the leaf node layer
  • each leaf node in the leaf node layer includes an update time of a file of the file system
  • the first intermediate node in the intermediate node layer includes a first intermediate time
  • the first intermediate time includes a a last update time in an update time of a file included in each node of the next-level node pointed by the first intermediate node, where the root node includes a creation time of the current snapshot, and the first intermediate node is Determining any one of the intermediate node layers, the update time including a modification time of the modified file in the file system and/or a write time of the newly written file;
  • the creation time of the historical snapshot of the file system starting from the root node layer to access the time included in at least part of the nodes in each layer layer to find the target leaf node, where the target leaf node includes files
  • the update time is greater than the creation time of the historical snapshot
  • a target file Determining, according to the target leaf node, a target file, where the target file includes a file that is modified and/or newly written in the file system during a first time period, where the first time period is at a creation time of the historical snapshot Between the creation time of the current snapshot and the current snapshot.
  • the method for snapshot comparison accesses each time from the time tree corresponding to the current snapshot by recording the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot.
  • a layer node to find a target leaf node whose update time of the file included in the node is greater than the creation time of the historical snapshot, so that it is determined based on the target leaf node that the file system is modified and/or newly written in the first time period.
  • the process of determining the target file effectively saves the time in the snapshot comparison process, improves the efficiency, and thus helps to improve the efficiency of the entire snapshot comparison process.
  • the at least part of the nodes includes all nodes except the jump node in the time tree, and the update time of the files included in the jump node is less than or equal to the creation time of the historical snapshot.
  • the method for snapshot comparison accesses each time from the time tree corresponding to the current snapshot by recording the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot.
  • the jump node whose update time of the file included in the node is less than the creation time of the historical snapshot may be skipped, and only the nodes other than the jump node in the node may be accessed, thereby reducing the number of nodes accessed, further saving Time to improve the efficiency of the snapshot comparison process.
  • the determining, according to the target leaf node, the target file includes:
  • the file whose creation time is before the creation time of the history snapshot is determined to be the modified file in the file system, and/or the creation time is created in the history snapshot.
  • the file after the time is determined to be a newly written file in the file system.
  • the method further includes:
  • Determining data deleted in the file system during the first time period is determined according to the release log.
  • the method for snapshot comparison provided by the embodiment of the present application can quickly determine the release log by creating a release log within a period between the creation time of the historical snapshot and the creation time of the current snapshot (for example, the first time period).
  • the deleted data in the file system in the first time period further saves the time of the snapshot comparison process and improves the efficiency in the whole snapshot comparison process.
  • the method before the determining the creation time of the release log in the first time period, the method further includes:
  • the target deletion record is included in the deleted record stored in the release log of the first time, the target deletion record is updated, so that the target deletion record is further used to record the first data before being deleted.
  • the deletion record for recording the storage location of the first data before the deletion is written to the first The release log of the moment;
  • the storage location corresponding to the target deletion record is continuous with the storage location before the deletion of the first data before the deletion.
  • the log in the process of recording the deletion record in the release log, if the data deleted at any time (for example, the first time) (for example, the first data) is released, the log is released. If there is a target deletion record that satisfies the condition, the target deletion record is updated, so that the target deletion record is also used to record the storage location of the first data before the deletion, thereby further saving system space.
  • the release log includes N sub-release logs, and the N sub-release logs have a correspondence relationship with the N snapshots, and the i-th sub-release log corresponds to the i-th snapshot, where the N snapshots include The historical snapshot and the N-1 snapshots between the historical snapshot and the current snapshot, the N snapshots are sequentially created in chronological order, and the N is an integer greater than or equal to 1.
  • the deletion record included in the i-th sub-release log is used to record a storage location of the deleted data in the file system before the deletion in the intermediate period, where the intermediate period is located at the creation time and the i-th of the i-th snapshot Between the creation time of +1 snapshots, the i ⁇ [1,N], and,
  • Determining a release log of the creation time in the first time period including:
  • the correspondence relationship includes a first correspondence relationship and a second correspondence relationship, where the first correspondence relationship includes a correspondence between the N snapshots and N index numbers, where the second correspondence relationship includes Corresponding relationship between the N index numbers and the N sub-release logs, where the N index numbers are allocated based on preset rules, and
  • N sub-release logs Determining, according to the N snapshots and the corresponding relationship, the N sub-release logs, including:
  • the N index numbers include an index number of the historical snapshot, an index number of the Nth snapshot, and the N-2 The index number.
  • the method further includes:
  • the historical snapshot is the first snapshot in the file system that supports the snapshot comparison, the historical snapshot and the sub-release log corresponding to the historical snapshot are deleted.
  • a device for snapshot alignment comprising any of the possible implementations of the first aspect or the first aspect described above.
  • a computer readable storage medium in a third aspect, storing a program causing a communication device to perform any of the possible implementations of the first aspect described above.
  • a computer program is provided that, when executed on a computer, causes the computer to implement any of the possible implementations of the first aspect described above.
  • the first intermediate node in the intermediate node layer includes a first intermediate time
  • the first An intermediate time includes an update time of a file included in each leaf node pointed by the first intermediate node, where the first intermediate node is an intermediate node other than the last layer intermediate node in the intermediate node layer
  • the first intermediate time includes a last update time in an update time of a file included in each intermediate node of the next-level intermediate node pointed by the first intermediate node, where the first intermediate node is the Any one of the intermediate node layers, the update time including a modification time of the modified file in the file system and/or a write time of the newly written file.
  • FIG. 1 is a schematic structural diagram of a storage device applied to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for snapshot comparison according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a time tree in a method of snapshot alignment according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for snapshot comparison according to another embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a relationship between a snapshot and a release log in a method of snapshot comparison according to another embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a relationship between storage locations of data in a method of snapshot alignment according to another embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a relationship between a snapshot and a release log in a method of snapshot comparison according to another embodiment of the present application.
  • FIGS. 8 to 10 are schematic block diagrams showing a relationship between a snapshot and a release log in a method of snapshot comparison according to still another embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an apparatus for snapshot comparison according to an embodiment of the present application.
  • the manner of performing incremental backup based on snapshot comparison in the embodiment of the present application is briefly described.
  • the incremental backup after the first snapshot is created for the file system (for the sake of understanding and differentiation, the first snapshot is taken), all the data of the file system is firstly backed up, and then a snapshot is created.
  • the second snapshot is compared with the first snapshot to obtain data that changes between the two snapshots.
  • the backup is performed, only two snapshots are needed. It is sufficient to make a backup of the data that changes within the file system between creation times.
  • the incremental backup process only the changed data needs to be backed up, which greatly reduces the time in the data backup process, improves the efficiency of data backup, and reduces the space occupied by the backup terminal. Therefore, incremental backup is adopted. Ways to perform data backup are widely used.
  • the data that is determined to be changed by the two snapshots is the core of the incremental backup.
  • the prior art snapshot comparison method first traverses all the files protected by the second snapshot. And secondly, searching according to the result of the traversing from the file protected by the first snapshot, if the update time of the partial file found in the first snapshot is less than the update time of the file protected in the second snapshot, or If the partial files protected in the second snapshot are not found in the first snapshot, it indicates that the partial files are files that are modified and/or newly written by the file system between the creation times of the two snapshots, and then Traversing all the files in the first snapshot, and then searching from the second snapshot according to the result of the traversing, if the partial files protected by the first snapshot are not found in the second snapshot, indicating these parts
  • a file is a file that is deleted by the file system between the creation times of two snapshots.
  • the present application provides a method for snapshot comparison that can help solve the above problems.
  • FIG. 1 is a schematic structural diagram of a storage device 100 applied to an embodiment of the present application.
  • the storage device 100 includes a processor 110, a memory 120, a network adapter 130, and an input/output (IO) interface 140.
  • the functions of each component are as follows:
  • the processor 110 may be a central processing unit (CPU), and the processor 110 may also be other general-purpose processors, digital signal processing (DSP), and application specific integrated circuits (ASIC). ), a Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and the like.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps in the embodiment of the present application may be completed by using an integrated logic circuit of hardware in the processor 110 or an instruction in a software form.
  • the memory 120 is configured to store operation data in the processor 110 and data exchanged with an external storage device such as a hard disk.
  • the memory 120 can include read only memory and random access memory and provides instructions and data to the processor 110.
  • a portion of the memory 120 may also include a non-volatile random access memory.
  • the memory 120 may be used to store scan results calculated by the processor 11.
  • Network adapter 130 also known as a network card or network interface card, is a device that is networked using a computer for accessing the network.
  • the IO interface 140 is connected to the I/O interface 140 through an internal bus of the storage device.
  • the I/O interface 140 is further connected to the external device 150, and finally realizes information transmission between the processor 110 and the external device 150.
  • the user can issue an instruction to the processor 110 through the I/O interface.
  • the external device 150 includes devices such as a USB flash drive, a mouse, a keyboard, a printer, and the like, which will not be described in detail herein.
  • the external device may be a storage device for storing data, such as a USB flash drive, a hard disk, or the like.
  • the storage device of the embodiment of the present application is described in detail with reference to FIG. 1 .
  • the method for snapshot comparison of the embodiment of the present application is described in detail below with reference to FIG. 2 to FIG. 9 .
  • a device for performing an embodiment of the present application is referred to as a storage device, specifically a processor in the storage device.
  • FIG. 2 is a schematic flowchart of a method for snapshot comparison according to an embodiment of the present application. Hereinafter, each step of the embodiment of the present application is separately described based on FIG. 2 .
  • the time tree includes a root node layer, an intermediate node layer, and a leaf node layer
  • the root node layer includes a root node
  • the intermediate node layer includes at least one layer of intermediate nodes
  • the root node points to the intermediate node layer a first layer intermediate node
  • the last layer intermediate node in the middle node layer points to a leaf node in the leaf node layer
  • each leaf node in the leaf node layer includes an update time of a file of the file system
  • the first intermediate node in the intermediate node layer includes a first intermediate time
  • the first intermediate time includes the first intermediate node
  • the update time includes the modification time of the modified file in the file system and/or the write time of the newly written file.
  • the storage device also generates a time tree corresponding to the snapshot, where the update time of the file may be recorded, and the update time includes the modification time and/or the write time of the file.
  • the time information in the time tree is also updated from time to time.
  • the time tree is also the same as the general tree structure, and may also record some index information for searching files or data, so that when searching for files or data based on the snapshot, the file information can be found through the index information recorded in the corresponding tree structure. Or data.
  • nodes in the tree structure are divided into four categories: root node, leaf node, parent node, and intermediate node.
  • the intermediate node is the next-level node of the parent node. If one node has a higher level, the upper level is called its parent node. If there is no upper level, the node has no parent node.
  • a node with no intermediate nodes in a tree is called a leaf node. There are no other nodes above the current node. This node is called the root node.
  • the intermediate node in the embodiment of the present application may also be referred to as a child node, and the node may also be referred to as a node.
  • the time tree in the embodiment of the present application can be understood as a tree structure recorded with time information, the time tree includes a plurality of nodes, each layer node includes at least one node, and any two layers of the multi-layer nodes
  • the connection is such that the upper layer node points to the next layer node through the tree connection structure.
  • the multi-layer nodes in the time tree are respectively recorded as a root node layer, an intermediate node layer, and a leaf node layer.
  • the root node layer includes only a root node
  • the intermediate node layer includes at least one layer of intermediate nodes, wherein a parent node of the first layer intermediate node in the intermediate node layer is a root node, and a last layer intermediate node in the middle node layer is The parent node of the leaf node in the leaf node layer.
  • the first layer intermediate node is also the last layer intermediate node.
  • the intermediate node layer includes multiple layers of intermediate nodes, Any two intermediate nodes are connected in a tree, so that the upper intermediate node points to the next intermediate node through the tree connection structure. For specific reference to the tree structure, reference may be made to the prior art, and details are not described herein again.
  • the first intermediate node includes a first intermediate time, where the first intermediate time includes a next level pointed by the first intermediate node
  • the last update time in the update time of the file included in each node in the node, and the next level node pointed to by the first intermediate node may be an intermediate node or a leaf node.
  • the next-level node of the first intermediate node is a leaf node in the leaf node layer, because only each leaf node Including the update time of a file, then the last update time in the update time of the file included in each node of the next level node pointed by the first intermediate node is each leaf pointed by the first intermediate node
  • the update time of the file included in the node that is, the first intermediate time includes an update time of a file included in each leaf node pointed to by the first intermediate time; when the first intermediate node is in the intermediate node layer
  • the first intermediate time includes the next intermediate node pointed to by the first intermediate time.
  • FIG. 3 is a schematic structural diagram of a time tree of a method for snapshot comparison according to an embodiment of the present application.
  • the time tree has 4 layers
  • the intermediate node layer includes 2 layers of intermediate nodes
  • the root node includes root node A
  • the first layer intermediate node includes intermediate node B
  • the second layer intermediate node includes intermediate nodes C1 and C2.
  • the leaf node layer includes leaf nodes D1, D2, D3, D4 and D5, wherein the root node A points to the intermediate node B, the intermediate node B points to the next intermediate node C1, C2, C3, and the intermediate node C1 points to the leaf node D1 and D2, the intermediate node C2 points to the leaf node D3, and the intermediate node C3 points to the leaf nodes D4 and D5.
  • the root node includes the creation time of the current snapshot, that is, t19; each leaf node includes an update time of a file.
  • the leaf node D1 includes the update time of the file #A, that is, t15
  • the leaf node D2 includes the update time of the file #B.
  • the first intermediate node is a node in the second layer intermediate node (ie, the last intermediate node in the intermediate node layer), for example, when the first intermediate node is the intermediate node C1, the middle
  • the intermediate time in the node C1 includes the update time t15 of the file #A and the update time t6 of the file #B
  • the leaf node D1 pointed to by the intermediate node C1 includes the update time t15 of the file #A, and the leaf node D2 pointed to by the intermediate node C1.
  • the intermediate time in the node B and the intermediate node B includes the update time t15 of the file #A, the update time t18 of the file #D3, and the update time t8 of the file #D, wherein the update time t15 of the file #A is the intermediate node.
  • the last update time in the update time of the file (ie, t18) in fact, the intermediate node C2 includes only the update time of the file #E, so the last update time in the update time of all the files included in the intermediate node C2 is
  • the update time t8 of the file #D is the last update time in the update time (ie, t7 and t8) of all the files included in the intermediate node C3 pointed to by the intermediate node B. .
  • each intermediate node may also point to other nodes, which are not shown in the figure; in addition, the intermediate node layer in the time tree may also include only one.
  • the layer intermediate node layer or the at least three layer intermediate node layer, the embodiment of the present application is not limited thereto.
  • time information in the intermediate node B may be added to the root node A such that the next level node of the root node A is the intermediate node C1, the intermediate node C2, and the intermediate node C3.
  • S220 according to the creation time of the historical snapshot of the file system, starting from the root node layer to access the time included in at least part of the nodes in each layer layer to find the target leaf node, and the file included in the target leaf node is updated.
  • the time is greater than the creation time of the historical snapshot.
  • the storage device acquires the creation time of the historical snapshot from the historical snapshot (referred to as creation time #A for convenience of distinction and understanding), and based on the creation time #A, each layer is accessed layer by layer from the root node.
  • creation time #A the creation time of the historical snapshot from the historical snapshot
  • each layer is accessed layer by layer from the root node.
  • At least part of the nodes included in the node, the leaf nodes whose update time of the files included in all the leaf nodes are greater than the creation time #A are determined as target leaf nodes, wherein the at least part of the nodes may include all the trees in the tree structure
  • the node may also include all nodes except the jump node in the tree structure, the jump node is a node of at least one type of the intermediate node and the leaf node, and the jump node includes at least one node, the at least one Each node in the node includes a file with an update time less than or equal to the creation time #A.
  • the creation time #A is t11
  • the layer-by-layer access after the target leaf layer is accessed, based on the creation time #A, the files included in the leaf node
  • the leaf nodes whose update time is greater than the creation time #A are the leaf node D1 and the leaf node D3, then the leaf node D1 and the leaf node D3 are the target leaf nodes.
  • the target file includes a file that is modified and/or newly written in the file system during the first time period, and the first time period is between the creation time of the historical snapshot and the creation time of the current snapshot.
  • each leaf node further includes data information of metadata of the file, where the data information of the metadata is used to determine metadata of the corresponding file, and the metadata includes content related to attributes of the file.
  • a target leaf node referred to as a first target leaf node for convenience of distinction and understanding
  • determining data corresponding to the first target leaf node by using data information of metadata included in the first target leaf node
  • the metadata of the object file (recorded as the first object file for convenience of distinction and understanding), and then the first object file is determined based on the metadata of the first object file.
  • the target files determined based on the target leaf node include file #A and file #E.
  • step S230 Next, the specific implementation process of step S230 will be described in detail.
  • the determining the target file according to the target leaf node includes:
  • the file whose creation time is before the creation time of the historical snapshot is determined to be the modified file in the file system, or the file whose creation time is after the creation time of the historical snapshot is determined as A newly written file in the file system.
  • the storage system may acquire a corresponding target file (for example, a first target file) by using data information of metadata included in any target leaf node (for example, the first target leaf node).
  • Metadata the creation time of the first object file is obtained from the metadata corresponding to the first object file, and the creation time of the first object file is compared with the creation time #A, if the first object file is Before the creation time #A, the first target file is the modified file in the file system. If the creation time of the first target file is after the creation time #A, the first target file is Is a newly written file in the file system.
  • file #A and file #E are the updated files in the file system, but which one The file is a modified file in the file system, and which file is newly written in the file system, and needs to be further determined by the creation time of the updated file: that is, the creation time of the file #A is obtained by the leaf node D1.
  • the creation time of the file #A is t10
  • the creation time of the file #A is smaller than the creation time #A, that is, t10 is smaller than t11, indicating that the file #A is the updated file in the file system, and passes through the leaf node D3.
  • the creation time of the file #E is greater than the creation time #A, that is, t18 is greater than t11, indicating that the file #E is the file system Newly written file. In this way, the process of determining the target file is completed.
  • the foregoing process of determining the target file based on the target leaf node is only a schematic description, and the embodiment of the present application is not limited thereto.
  • the storage device may search for a file in the file system in a period before the creation time of the historical snapshot based on the historical snapshot, and if at least part of the file in the target file is found, the storage device indicates that the The partial file is a modified file. If at least part of the file in the target file is not found, it indicates that the at least part of the file is a newly written file.
  • the deleted file here, only describes the process of determining the file that is modified and/or newly written in the file system. For the deleted file, it may be similar to the prior art, or may be referred to below for determining that the file system is deleted. The process of the file.
  • the update time of the file is directly found from the time tree is greater than or equal to the update time of the file in the time tree.
  • the target leaf node of the creation time #A, and then the target file is determined based on the target leaf node, and the target file can be determined by one search process, which effectively saves time.
  • the method for snapshot comparison accesses each time from the time tree corresponding to the current snapshot by recording the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot.
  • a layer node to find a target leaf node whose update time of the file included in the node is greater than the creation time of the historical snapshot, so that it is determined based on the target leaf node that the file system is modified and/or newly written in the first time period.
  • the file ie, the target file
  • compared with the prior art process of determining the target file effectively saves the time in the snapshot comparison process, improves the efficiency, and thus helps to improve the entire snapshot comparison process. s efficiency.
  • the at least part of the nodes may include all nodes except the jump node in the time tree, wherein the jump node is in the intermediate node and the leaf node At least one type of node, and the jump node includes at least one node, each of the at least one node including a file having an update time less than or equal to the creation time #A.
  • the first jump node if any of the jump nodes (referred to as the first jump node for convenience of distinction and understanding), if the first jump node is an intermediate node, the update time of all files included in the intermediate node is less than or Equal to the creation time #A, if the first jump node is a leaf node, the update time of a file included in the leaf node is less than or equal to the creation time #A.
  • the at least part of the nodes may also include all the nodes in the time tree, and the embodiment of the present application is not limited thereto.
  • step S220 when the at least part of the nodes includes all the nodes except the jump node in the time tree, the target leaf node may be searched by mode 1, and when the at least part of the nodes are all the nodes in the time tree, the Mode 2 finds the target leaf node.
  • mode 1 and mode 2 The implementation process of mode 1 and mode 2 is slightly different. In the following, the two modes are separately described.
  • Time #At11 when accessing the next intermediate node, directly skip the intermediate node C3 and the leaf node D4 and the leaf node D5 pointed to by the intermediate node C3, and only access the intermediate node C1 and the intermediate node C2, because in the middle
  • the update time of all the files in the leaf node pointed to by the intermediate node C3 is less than or equal to t8, so that as long as the intermediate node If the update time included in C3 is less than the creation time #A, there is no need to continue to access the intermediate node C3 and the leaf node D4 and the leaf node D5 pointed to by it; similarly, after accessing the intermediate node C1 and reading the file update time , determining that the update time t6 of the file #B included in the leaf node D2 pointed to by the intermediate node C1 is smaller than the creation time #At11, and the leaf
  • the files included in the node may be skipped.
  • all nodes in each layer are traversed layer by layer from the root node.
  • the update time of the files is greater than the creation time from the update time of the files included in all the leaf nodes.
  • the leaf node of time #A is determined as the target leaf node, that is, the leaf node D1 and the leaf node D3 are determined as the target leaf nodes.
  • the embodiment of the present application also provides an optional implementation manner for how to determine the file in which the file system is deleted in the first time period.
  • FIG. 4 is a schematic flowchart of a method for snapshot comparison according to another embodiment of the present application. As shown in FIG. 4, optionally, the method further includes:
  • S240 Determine a release log of the creation time in the first time period, where the release log includes a deletion record, where the deletion record is used to record a storage location of the deleted data in the file system before the deletion in the first time period;
  • S250 Determine, according to the release log, the deleted data in the file system during the first time period.
  • step S210 to step S230 and step S240 to step S250 may be performed simultaneously, or may be performed in accordance with their functions and intrinsic logic.
  • the storage device creates a release log at the same time as the snapshot is created. After the release log is deleted, if the data is deleted, the deletion record is written in the release log, and the deletion record is used for recording.
  • the data deleted in the file system during the period between the creation times of the two snapshots ie, the first time period
  • the first time period is in the storage location before the deletion, such that the release log is created by searching for the creation time in the first time period. Then, the data deleted in the file system during the first time period can be determined.
  • the deleted data in the file system may include a part of data in each of at least one file in the file system, or may include all of each of the at least one file.
  • the deleted record may include: an identifier of the file to which the part of the data belongs (for example, an ID of the file), an offset position of the part of the data in the file, and The size of the part of the data, wherein the offset position of the part of the data in the file indicates the offset of the starting position of the part of the data relative to the starting position of the file, and the unit of the offset position may be a unit of capacity, for example, M , K, bit, etc., can be understood as a logical offset position.
  • the size of the file #A is 256M
  • the size of the deleted data in the file #A is 10M, specifically, the data between the 10th and the 20th, then the offset position of the deleted data in the file #A is That is 10M.
  • the deletion record may also include: the identifier of the object to which the file #A belongs (for example, the inode number of the object to which the file belongs) The offset position of the file #A in the belonging object and the size of the file #A, the identifier of the file, the starting position of the file, and the size of the file.
  • a release log may be allocated to the snapshot for recording the deleted data in the file system during the period between the creation time of the two previous snapshots, wherein, according to the creation time
  • the two snapshots may be two adjacent snapshots, and the other snapshots may also have other snapshots, depending on the service requirements that the storage device receives the user, for example, outside the backup time. If the storage device needs to perform multiple incremental backups on the local data, you can create a snapshot to back up the data according to the time required for each incremental backup. For example, the storage device needs to roll back the snapshot based on the snapshot created periodically. .
  • the method for snapshot comparison provided by the embodiment of the present application can quickly determine the release log by creating a release log within a period between the creation time of the historical snapshot and the creation time of the current snapshot (for example, the first time period).
  • the deleted data in the file system in the first time period further saves the time of the snapshot comparison process and improves the efficiency in the whole snapshot comparison process.
  • the embodiment of the present application can effectively improve the efficiency of the snapshot comparison process without affecting the write data of the file system, and thus, can improve The performance of data backup.
  • FIG. 5 is a schematic block diagram showing a relationship between a snapshot and a release log in a method for snapshot comparison according to an embodiment of the present application.
  • the period between t11 and t19 is the first period
  • the storage device creates snapshot #1 at time t11, which is the historical snapshot
  • t11 is also the time when the storage device last backed up the data.
  • the next time the backup is started is t19, and the snapshot #4 is created.
  • the snapshot #4 is the current snapshot
  • the release log writes a deletion record for recording the deleted data in the file system in the first period.
  • snapshot #2 and snapshot #3 are also created. Of course, there may be no other snapshots between the snapshot #1 and the snapshot #4.
  • the embodiment of the present application provides an optional implementation manner for the process of recording the deletion record in the release log.
  • the method further includes:
  • the target deletion record is included in the deleted record stored in the release log of the first time, the target deletion record is updated, so that the target deletion record is further used to record the storage location of the first data before the deletion;
  • the deletion record for recording the storage location of the first data before the deletion is written to the release of the first time In the log;
  • the storage location corresponding to the target deletion record is consecutive to the storage location before the deletion of the first data.
  • the first time is any time in the first time period
  • the release log of the first time indicates that the release log is at the first time and the record of the deletion record is being recorded, and the release log of the first time Partial deletion records have been recorded.
  • the first data deleted at the first time may be associated with the deleted record stored in the release log of the first time.
  • the deletion record for recording the first data may be added to the target deletion record to update the target deletion record, so that the target deletion record is not only used to record the second
  • the storage location of the data before the deletion is also used to record the storage location of the first data before the deletion; if the target deletion record does not exist in the release log of the first moment, the release log at the first moment Adding a new deletion record, that is, writing a deletion record for recording the storage location of the first data before the deletion to the release log of the first moment .
  • the deletion record includes the identifier of the file to which the deleted data in the file system belongs, the offset position of the deleted data in the file, and the size of the deleted data
  • the target is determined.
  • the above three parameters may be used for determining, for example, if the identifier of the file in which the deleted data is located is the same as the identifier of the file to which the corresponding data in the deleted record belongs, and the offset of the deleted data
  • the position plus the size of the deleted data is equal to the offset position of the corresponding data in the deleted record, or the offset position of the deleted data is equal to the offset position of the corresponding data in the deleted record plus the corresponding data.
  • the size of the delete record is the target delete record.
  • a deletion record is stored in the release log of the first time, and the file to which the first data (referred to as data #1 for convenience of distinction and understanding) belongs to file #A, and the data #1 is The offset position in the file #A is 20M, and the size of the data #1 is 10M.
  • the data corresponding to the deleted record #21 stored in the release log of the first time is data #21, and the deleted record #21 includes: file #A (file identifier), 10M (offset position), 10M (size) ), indicating that the file to which the data #21 belongs is file #A, the offset position is 10M, the size is 10M, and the offset position of the data #21 plus the size of the data #21 is equal to the bias of the data #1.
  • the deletion record #21 is the target deletion record, and the content included after the deletion record #21 is updated becomes: file #A, 10M, 20M, the updated deletion record #21 indicates that the data #1 and the data #21 are in the storage location before the deletion; or the data corresponding to the deleted record #22 stored in the release log of the first time is data #22, the deletion record #22 Including: file #A, 30M, 10M, indicating that the file to which the data #22 belongs is file #A, the offset position is 30M, the size is 10M, and the offset position of the data #1 is added to the data #1.
  • the deletion record #22 is the target deletion record
  • the deletion # 22 after recording becomes updated content comprises: File # A, 20M, 20M, the updated records to delete the data represents # 22 # 22 # 1 with the data storage location is not deleted before.
  • the update is updated.
  • the target deletes the record, so that the target deletion record is also used to record the storage location of the first data before the deletion, further saving system space.
  • the creation time of the previous snapshot is the time of the last backup
  • the creation time of the next snapshot is the time of the next backup. Therefore, for incremental backups, the release log needs to be written.
  • the deletion record of the deleted data in the file system during the period between the time when the previous backup was recorded and the time of the next backup (for example, the first time period) if the user is based on the user at other times during the first time period
  • the requirements of other services need to be created when the snapshot is created, and the release log is not required for the other snapshots.
  • the release log can be used in other scenarios (for example, rolling back a snapshot, multiple incremental backups).
  • the embodiment of the present application further provides an optional implementation manner:
  • the release log includes N sub-release logs, and the N sub-release logs have a correspondence relationship with the N snapshots, where the i-th sub-release log corresponds to the i-th snapshot, and the N snapshots include the historical snapshot and N-1 snapshots between the historical snapshot and the current snapshot.
  • the N snapshots are sequentially created in chronological order.
  • the N is an integer greater than or equal to 1.
  • the deletion record included in the i-th sub-release log is used to record the storage location of the deleted data in the file system in the intermediate period before the deletion, and the intermediate period is located at the creation time of the i-th snapshot and the i+1th Between the creation time of the snapshot, the i ⁇ [1,N], and,
  • the release log that determines the creation time in the first time period includes:
  • the N sub-release logs are determined according to the N snapshots and the correspondence.
  • each time a storage device creates a snapshot it allocates a corresponding release log to the snapshot. That is, one snapshot corresponds to one sub-release log, and each sub-release log is used to record the creation time of two snapshots adjacent to each other. The deleted data in the file system during the time period.
  • the release log corresponding to each snapshot may be determined according to the corresponding relationship between the N snapshots, and thus, the N sub-release logs (ie, the release log) are determined.
  • N is 3, the period between t11 and t19 is the first period, snapshot #1 is the historical snapshot, snapshot #4 is the current snapshot, and the creation time of snapshot #2 and snapshot #3 is located. After the creation time of snapshot #1 and before the creation time of the snapshot #4, N-1 snapshots between the snapshot #1 and the snapshot #4 are snapshot #2 and snapshot #3, snapshot #1 to snapshot #3 corresponds to sub-release log #1 to sub-release log #3.
  • the snapshot #1 corresponds to the child release log #1, and the child release log #1 is used to record between the creation time of the snapshot #1 and the creation time of the next snapshot (ie, the snapshot #2).
  • the deleted data in the file system within the time period for example, the first intermediate time period
  • the snapshot #2 corresponds to the child release log #2
  • the child release log #2 is used to record the creation time and the time of the snapshot #2
  • the deleted data in the file system during the period between the creation time of one snapshot (ie, snapshot #3) (for example, the second intermediate period);
  • the snapshot #3 corresponds to the child release log #3, the child release log # 3 is for recording data deleted in the file system in a period (for example, a third intermediate period) between the creation time of the snapshot #3 and the creation time of the next snapshot (ie, snapshot #4).
  • the storage device creates the corresponding sub-release log #1 after creating the snapshot #1, after the creation time t11 of the snapshot #1, in the sub- Release log #1 records the deletion record of the deleted data in the file system until the record is stopped after the snapshot #2 is created, so that the child release log #1 records the data deleted in the file system during the first intermediate time period. Delete record.
  • the release log can be recorded without deleting the deletion record of the deleted data before the creation time of the next snapshot after the creation time of the corresponding snapshot (or the creation time of the release log).
  • the deleted data does not exist in the data protected by the two snapshots before and after, it is not necessary to record the deletion record of the data deleted in this case.
  • the snapshot #1 and the sub-release log #1 in FIG. 7 as an example, if the file system creates the file #1 at time t12 and subsequently deletes the file #1 at time t12, then for the snapshot # 1 and the snapshot #2, the file #1 is not protected, so it is not necessary to record the deletion record corresponding to the file #1 in the child release log #1.
  • the correspondence includes a first correspondence and a second correspondence, where the first correspondence includes a correspondence between the N snapshots and N index numbers, and the second correspondence And including a correspondence between the N index numbers and the N sub-release logs, where the N index numbers are allocated based on preset rules, and
  • the N index numbers include an index number of the historical snapshot, an index number of the Nth snapshot, and the N-2 index numbers.
  • the correspondence between the snapshot and the index number is called the correspondence relationship #1 (that is, an example of the first correspondence), that is, one snapshot corresponds to An index number, a correspondence between the sub-release log and the index number is called a correspondence relationship #2 (that is, an example of the second correspondence relationship), and an index number corresponds to a sub-release log, wherein the correspondence relationship #1 may be Correspondence between the identifier of the snapshot and the index number.
  • the correspondence #2 may be a correspondence between the index number and the identifier of the sub-release log.
  • the index number is a bridge between the sub-release logs based on the snapshot, that is, in order to determine the sub-release log, the corresponding index number can be determined by the snapshot and the corresponding relationship #1, and further, the index number and the corresponding relationship are obtained. #2 Determine the corresponding sub-release log.
  • each index number corresponds to one index data
  • the index data includes an identifier of the sub-release log corresponding to the index number, so that the index can be determined based on the index number in the process of determining the corresponding sub-release log by using the index number and the corresponding relationship #2
  • the data is then determined based on the identifier of the sub-release log in the index data to determine the corresponding sub-release log.
  • the N index numbers are allocated based on preset rules, so that other index numbers can be determined by the known index number and the preset rule.
  • the preset rule may be that the difference between the sequence numbers of any two adjacent index numbers is a fixed value.
  • the difference between the sequence numbers of any two adjacent index numbers is “ 1", that is, the index numbers are assigned in the order of 1, 2, 3, 4, ...;
  • the preset rule may be that the quotient between the serial numbers of the adjacent two index numbers is a fixed value.
  • the quotient between the sequence numbers of any two adjacent index numbers is "2", that is, the index numbers are assigned in the order of 1, 2, 4, 8, ..., and so on.
  • the two adjacent index numbers indicate that the creation time of the snapshot or the sub-release log corresponding to the two index numbers is adjacent to each other, and the N-th snapshot is the previous snapshot of the current snapshot.
  • the determining the release log of the creation time in the first time period includes:
  • the third correspondence includes N+1 snapshots and N+1 indexes
  • the N+1 snapshots include the N snapshots and the current snapshot, and the N+1 index numbers are allocated based on preset rules
  • the fourth correspondence relationship includes a correspondence between the N index numbers and the N sub-release logs, where the N index numbers include the history index number and The N-1 index numbers.
  • the mode B is different from the mode A in that the mode B determines the index number of the historical snapshot based on the historical snapshot, the current snapshot, and the corresponding relationship #3 (ie, an example of the third correspondence).
  • it is recorded as index number #A) and an index number corresponding to the current snapshot (in order to facilitate differentiation and understanding, denoted as index number #B), and further, according to the index number #A, the index number # B and a preset rule, determining N-1 index numbers between the index number #A and the index number #B, and thus, based on including the index number #A and the N-1 index numbers and correspondences #4 (ie, an example of the fourth correspondence) determines the release log.
  • the preset rule is that the difference between the serial numbers of any two adjacent index numbers is “1”, that is, the multiple index numbers are according to 1, 2, 3, 4, ...
  • the order is assigned.
  • N 3 in FIG. 7, and the corresponding index number #A is determined by the snapshot #1 and the corresponding relationship #3 (the corresponding serial number is 1), and the corresponding index number #B is determined by the snapshot #4 and the corresponding relationship #3.
  • index number #A (corresponding serial number is 3)
  • index number #B based on the index number #A, the index number #B, and a preset rule, determine an index number between the index number #A and the index number #B (ie, N-1
  • the index number) is index number #C (corresponding to the sequence number 2) and index number #D (the corresponding sequence number is 3), then based on the index number #A, the index number #B, and the index number #C and the corresponding Relationship #2, determine the corresponding 3 sub-release logs.
  • a period between the creation time of the historical snapshot and the creation time of the current snapshot may be first determined by the historical snapshot and the current snapshot (ie, the first The N-1 snapshots created in the time period are determined, and the sub-release log of each snapshot is determined based on the correspondence between the snapshot and the index number and the N snapshots including the historical snapshot and the N-1 snapshots. That is, the sub-release log whose creation time is within the first time period is determined.
  • index management information which is used to record information related to the index number, and is mainly used to allocate a corresponding index number for the snapshot or the release log, and subsequently delete the snapshot and the corresponding release. Log. Any one of the contents of the index management information changes, and needs to be updated in time.
  • the index management information includes at least the following contents: a total index number (Tindex), a total number of indexes (Count) of the snapshot supporting the snapshot comparison, a starting index number (Sindex), and an index number of the first snapshot (Findex) ).
  • Tindex total index number
  • Count total number of indexes
  • Sindex starting index number
  • Findex index number of the first snapshot
  • Tindex can be used as the index number corresponding to the current snapshot or release log.
  • the current Tindex is used as the index of the current snapshot or the corresponding release log, and the relationship between the snapshot, the index number, and the release log is recorded (ie, the above After the correspondence #1 and the correspondence #2), the Tindex is updated so that the index number assigned to the subsequent snapshot or release log is the updated Tindex.
  • Count It is used to determine whether a snapshot is to be allocated for the currently created snapshot (for the current snapshot). Specifically, if the current snapshot supports snapshot comparison, the release log can be directly allocated. If the current snapshot does not support snapshot comparison. If you want to see if count is greater than 0, only release the log when it is greater than 0. This is because if there is no snapshot that supports snapshot comparison before the current snapshot, and the current snapshot does not support snapshot comparison, it is unnecessary to create a release log for the current snapshot, which will waste system space. Of course, if the current snapshot supports snapshot comparison, the release log will be allocated regardless of the Count.
  • whether the release log is allocated for the current snapshot may be determined based on the Count, and if the Count is greater than 0, the release log may be allocated; How much, as long as the current snapshot supports snapshot comparison, the release log is allocated.
  • the snapshot that does not support the snapshot comparison may be determined by the service class delivered by the user received by the storage system. For example, if the service delivered by the user does not need to be synchronized, the The way the service is tagged to inform the storage device of its own business does not require a snapshot comparison.
  • release log is allocated for all snapshots when Count is greater than zero.
  • a snapshot matching the snapshot comparison is created, and the Count in the index management information is updated, that is, the Coun in the current update index management information is incremented by one.
  • the Count in the index management information is updated, that is, the Coun in the current update index management information is incremented by one.
  • the starting index number (Sindex) and the index number of the first snapshot (Findex) indicates the index number of the sub-release log, and in the subsequent process of deleting the snapshot and deleting the corresponding sub-release log, The index number is used to reclaim the release log. After the child release log is deleted, the final goal is to make Sindex and Findex the same.
  • the file that has been deleted by the file system is determined based on the release log in the first time period (ie, the time period between the creation time of the historical snapshot and the creation time of the current snapshot).
  • the first time period ie, the time period between the creation time of the historical snapshot and the creation time of the current snapshot
  • the deleted snapshot is not the first snapshot in the snapshot that supports the snapshot comparison, you can delete only the snapshot without deleting the corresponding release log.
  • snapshot #1 to snapshot #4 both support snapshot comparison
  • snapshot #1 is the historical snapshot
  • snapshot #2 is the intermediate snapshot. If the sub-release log corresponding to snapshot #2 is deleted, If you need to perform snapshot comparison on snapshot #2 to snapshot #4, the sub-release log corresponding to snapshot #2 is deleted. Based on other service requirements, subsequent creation time of snapshot #2 and snapshot #3 cannot be determined. The sub-release log within the time period between creations is created, resulting in the inability to complete the alignment for snapshot #2 and snapshot #4.
  • the current snapshot it is not necessarily the first snapshot that supports the snapshot comparison. Only the current release snapshot needs to be deleted, and the corresponding release log does not need to be deleted.
  • the historical snapshot it is necessary to determine whether it is The first snapshot that supports the snapshot comparison. If not, the history snapshot is deleted only, and the corresponding release log is not deleted. If yes, the corresponding release log is deleted according to the processing mode of Case 2.
  • the method further includes:
  • the historical snapshot is the first snapshot in the file system that supports snapshot comparison, the historical snapshot and the sub-release log corresponding to the historical snapshot are deleted.
  • the corresponding sub-release log needs to be deleted for the following reasons:
  • the snapshot used at the time is the snapshot after the creation time of the first snapshot that supports the snapshot comparison, and the sub-release log corresponding to the snapshot that supports the snapshot comparison is not used, so the first A sub-release log corresponding to a snapshot that supports snapshot matching. In this way, system space can be further saved.
  • snapshot #1 is the historical snapshot
  • snapshot #5 is the current snapshot
  • snapshot #2 to snapshot #4 are intermediate snapshots
  • the first time period is the creation of the snapshot #1.
  • the time between the time and the creation time of the snapshot #5 each snapshot corresponds to a release log, wherein the black bold snapshot represents a snapshot that supports snapshot comparison, that is, snapshot #1, snapshot #3, and snapshot # 5 is a snapshot that supports snapshot comparison.
  • the unbold snapshot indicates a snapshot that supports snapshot comparison. That is, snapshot #2 and snapshot #4 are snapshots that do not support snapshot comparison.
  • Snapshot #1 or sub-release log #1 The index number is "0", the index number of snapshot #2 or sub-release log #2 is "1", the index number of snapshot #3 or sub-release log #3 is “2", snapshot #4 or sub-release log# The index number of 4 is "3", and the index number of snapshot #5 or sub-release log #5 is "4".
  • the information in the table on the right is the index management information, wherein the total index number is the total index number after the snapshot number #5 is created or the previous total index number "4" is added to "1" after the child release log #5 is assigned, that is, The updated index number, the total number of indexes of the snapshots supporting the snapshot comparison is 3 (ie, snapshot #1, snapshot #3, and snapshot #5), and the starting index number is the first child release log. Release the index number of log #1.
  • the index number of the first snapshot is the index number of snapshot #1.
  • the child release log #1 Since the snapshot #1 is a snapshot that supports snapshot comparison, the child release log #1 needs to be deleted while deleting the snapshot #1. As shown in FIG. 9, when the snapshot #1 is deleted, the index number of the first snapshot is moved to the right, that is, the index number of the first snapshot is the index number of the snapshot #2, since the sub-release log #1 is also If the index is not deleted, the starting index number is unchanged. Therefore, since the two index numbers are different, it is equivalent to marking the child release log. Later, when deleting the child release log, it is clear that the child release log that needs to be deleted is the starting index. The child release log #1 between the number and the index number of the first supported snapshot.
  • the starting index number is the same as the index number of the first supported snapshot. In this case, it is also necessary to determine the snapshot corresponding to the index number of the updated first snapshot (ie, snapshot #2). Whether to support snapshot comparison, if it is not supported, you need to continue to move the first snapshot index and continue to recycle until the snapshot corresponding to the snapshot index number after the move is a snapshot that supports snapshot comparison or the sub-release log is recycled. Snapshot #2 does not support snapshot comparison. Then, snapshot #2 and corresponding sub-release log #2 are deleted as described above until both index numbers are updated to the index number corresponding to snapshot #3, since snapshot #3 If the snapshot is synchronized, the snapshot is deleted and the sub-release log is deleted. As shown in Figure 10, the updated starting index number and the index of the first snapshot are the same, and both correspond to snapshot #3 or sub- Release the index number of log #3.
  • the snapshot comparison method records the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot, and the time tree corresponding to the current snapshot. Accessing each layer node to find a target leaf node whose update time of the file included in the node is greater than the creation time of the historical snapshot, thereby determining that the file system is modified and/or in the first time period based on the target leaf node.
  • the newly written file compares the process of determining the target file with the prior art, effectively saving time in the snapshot comparison process, improving efficiency, and thus, improving efficiency in the entire snapshot comparison process;
  • the skip node may be included. If the update time of the file is less than or equal to the jump node of the creation time of the historical snapshot, only the nodes other than the jump node in the access node can reduce the number of nodes accessed, further saving time and improving the process of snapshot comparison. effectiveness;
  • the data further saves the time of the snapshot comparison process and improves the efficiency of the entire snapshot comparison process; in addition, the deleted record recorded in the release log occupies less system space and saves system space;
  • the target deletion record is updated, so that the target deletion record is also used to record the storage location of the first data before being deleted, further saving system space.
  • the method for snapshot comparison according to the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 10.
  • the following describes the device for snapshot comparison according to the embodiment of the present application.
  • the technical features described in the method embodiment are the same. Applicable to the following device embodiments.
  • FIG. 11 shows a schematic block diagram of an apparatus 300 for snapshot alignment in accordance with an embodiment of the present application.
  • the device 300 includes:
  • the obtaining unit 310 is configured to acquire a time tree corresponding to a current snapshot of the file system, where the time tree includes a root node layer, an intermediate node layer, and a leaf node layer, where the root node layer includes a root node, and the intermediate node layer includes at least one a layer intermediate node, the root node points to a first layer intermediate node in the middle node layer, and a last layer intermediate node in the middle node layer points to a leaf node in the leaf node layer,
  • each leaf node in the leaf node layer includes an update time of a file of the file system
  • the first intermediate node in the intermediate node layer includes a first intermediate time
  • the first intermediate time includes the first intermediate node
  • the last update time in the update time of the file included in each node of the next level node, the root node includes the creation time of the current snapshot, and the first intermediate node is any middle of the intermediate node layer a node, the update time including a modification time of the modified file in the file system and/or a write time of the newly written file;
  • the searching unit 320 is configured to, according to the creation time of the historical snapshot of the file system, start to access the time included in at least some of the nodes in each layer layer layer by layer from the root node layer in the time tree acquired by the obtaining unit 310, The target leaf node is searched, and the update time of the file included in the target leaf node is greater than the creation time of the historical snapshot;
  • a determining unit 330 configured to determine, according to the target leaf node searched in the searching unit 320, a target file, where the target file includes a file that is modified and/or newly written in the file system in a first time period, the first The time period is between the creation time of the historical snapshot and the creation time of the current snapshot.
  • the apparatus for snapshot comparison accesses each time from the time tree corresponding to the current snapshot by recording the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot.
  • a layer node to find a target leaf node whose update time of the file included in the node is greater than the creation time of the historical snapshot, so that it is determined based on the target leaf node that the file system is modified and/or newly written in the first time period.
  • the file ie, the target file
  • compared with the prior art process of determining the target file effectively saves the time in the snapshot comparison process, improves the efficiency, and thus helps to improve the entire snapshot comparison process. s efficiency.
  • the at least part of the node includes all nodes except the jump node in the time tree, and the update time of the file included in the jump node is less than or equal to the creation time of the historical snapshot.
  • the apparatus for snapshot comparison accesses each time from the time tree corresponding to the current snapshot by recording the update time of the file in each node corresponding to the tree structure of the snapshot, based on the creation time of the historical snapshot.
  • the jump node whose update time of the file included in the node is less than or equal to the creation time of the historical snapshot may be skipped, and only the nodes other than the jump node in the node may be accessed, thereby reducing the number of nodes accessed, and further Save time and increase efficiency in the snapshot comparison process.
  • the determining unit specifically 330 is specifically configured to:
  • the file whose creation time is before the creation time of the history snapshot is determined to be the modified file in the file system, and/or the file whose creation time is after the creation time of the history snapshot is created. Determined to be a newly written file in the file system.
  • the determining unit 330 is further configured to:
  • the apparatus for snapshot comparison provided by the embodiment of the present application can quickly determine the release log by creating a release log within a period between the creation time of the historical snapshot and the creation time of the current snapshot (for example, the first time period).
  • the deleted data in the file system in the first time period further saves the time of the snapshot comparison process, and improves the efficiency in the whole snapshot comparison process; in addition, the deleted record recorded in the release log occupies less system space. Saves system space.
  • the device further includes an update unit 340, where the update unit 340 is configured to:
  • the target deletion record is included in the deleted record stored in the release log of the first time, the target deletion record is updated, so that the target deletion record is further used to record the storage location of the first data before the deletion;
  • the deletion record for recording the storage location of the first data before the deletion is written to the release of the first time In the log;
  • the storage location corresponding to the target deletion record is continuous before the storage location before the deletion of the first data.
  • the snapshot comparison device in the process of recording the deletion record in the release log, if the data deleted at any time (for example, the first time) (for example, the first data) is released, the log is released. If there is a target deletion record that satisfies the condition, the target deletion record is updated, so that the target deletion record is also used to record the storage location of the first data before the deletion, thereby further saving system space.
  • the release log includes N sub-release logs, and the N sub-release logs have a correspondence relationship with the N snapshots, where the i-th sub-release log corresponds to the i-th snapshot, and the N snapshots include the historical snapshot and N-1 snapshots between the historical snapshot and the current snapshot.
  • the N snapshots are sequentially created in chronological order.
  • the N is an integer greater than or equal to 1.
  • the deletion record included in the i-th sub-release log is used to record the storage location of the deleted data in the file system in the intermediate period before the deletion, and the intermediate period is located at the creation time of the i-th snapshot and the i+1th Between the creation time of the snapshot, the i ⁇ [1,N], and,
  • the determining unit 330 is specifically configured to:
  • the N sub-release logs are determined according to the N snapshots and the correspondence.
  • the correspondence includes a first correspondence and a second correspondence, where the first correspondence includes a correspondence between the N snapshots and N index numbers, where the second correspondence includes the N index numbers Correspondence relationship with the N sub-release logs, the N index numbers are allocated based on preset rules, and
  • the determining unit 330 is specifically configured to:
  • the N index numbers include an index number of the historical snapshot, an index number of the Nth snapshot, and the N-2 index numbers.
  • the device further includes an update unit 340, where the update unit 340 is configured to:
  • the historical snapshot is the first snapshot in the file system that supports snapshot comparison, the historical snapshot and the sub-release log corresponding to the historical snapshot are deleted.
  • apparatus 300 for snapshot comparison shown in FIG. 11 may correspond to the processor shown in FIG. 1, and the above and other operations and/or functions of the respective units in the apparatus 300 are respectively corresponding to the foregoing method embodiments. The process, for the sake of brevity, will not be described here.
  • the device 300 can also be the storage device shown in FIG. 1.
  • the device 300 further includes a memory and an IO interface, the processor is configured to execute the instruction stored by the memory, and the processor can be invoked.
  • the program code stored in the memory controls the IO interface to send and receive information or signals, so that the device 300 performs the functions, performed actions or processes of the various units in the above method embodiments.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种快照比对的方法和装置,能够提高快照比对的效率。该方法包括:获取与文件系统的当前快照对应的时间树;根据该文件系统的历史快照的创建时间,自该根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,该目标叶子节点包括的文件的更新时间大于该历史快照的创建时间;根据该目标叶子节点,确定目标文件,该目标文件包括在第一时段内该文件系统中被修改和/或被新写的文件,该第一时段位于该历史快照的创建时间和该当前快照的创建时间之间。

Description

一种快照比对的方法和装置 技术领域
本申请涉及存储领域,更具体地,涉及存储领域中一种快照比对的方法和装置。
背景技术
增量备份是指在针对数据进行一次全备份或上一次增量备份后所进行的一种针对增量文件的备份,即,每次增量备份时只需要备份当前时间与上一次备份的时间之间文件系统发生变化的数据。
为了获取文件系统前后两次备份发生变化的数据,可以基于文件系统的快照获得,即,基于快照比对的方式获得。其中,快照是一种保护文件系统的数据的技术,用于保护文件系统在某个时间(例如,启动数据备份的时间)所处的状态。
目前,已知一种快照比对的方式,在该方式中,当需要针对两个快照进行比对时,需要对该两个快照进行两次全遍历以及两次查找过程,且每次遍历以及查找过程都需要检查每个文件,才能确定文件系统在该两个快照的创建时间之间发生变化的数据,其中,文件系统发生变化的数据包括文件系统被修改和/或被删除和/或被新写的数据,这种快照比对方式,使得快照比对过程占用的时间较长,降低了数据备份的效率,从而降低了数据备份的性能。
发明内容
本申请提供一种快照比对的方法和装置,有助于提高快照比对过程中的效率。
第一方面,提供了一种快照比对的方法,所述方法包括:
获取与文件系统的当前快照对应的时间树,所述时间树包括根节点层、中间节点层和叶子节点层,所述根节点层包括一个根节点,所述中间节点层包括至少一层中间节点,所述根节点指向所述中间节点层中的第一层中间节点,所述中间节点层中的最后一层中间节点指向所述叶子节点层中的叶子节点,
其中,所述叶子节点层中的每个叶子节点包括所述文件系统的一个文件的更新时间,所述中间节点层中的第一中间节点包括第一中间时间,所述第一中间时间包括所述第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,所述根节点包括所述当前快照的创建时间,所述第一中间节点是所述中间节点层中的任一个中间节点,所述更新时间包括所述文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间;
根据所述文件系统的历史快照的创建时间,自所述根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,所述目标叶子节点包括的文件的更新时间大于所述历史快照的创建时间;
根据所述目标叶子节点,确定目标文件,所述目标文件包括在第一时段内所述文件系统中被修改和/或被新写的文件,所述第一时段位于所述历史快照的创建时间和所述当前快照的创建时间之间。
因此,本申请实施例提供的快照比对的方法,通过在对应于快照的树结构的每个节点 中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点,以查找到节点中所包括的文件的更新时间大于该历史快照的创建时间的目标叶子节点,从而基于该目标叶子节点确定在第一时段内文件系统中被修改和/或被新写的文件,相比于现有技术确定该目标文件的过程,有效地节省了快照比对过程中的时间,提高了效率,从而,有助于提高整个快照比对过程中的效率。
可选地,所述至少部分节点包括所述时间树中除跳转节点以外的所有节点,所述跳转节点中包括的文件的更新时间小于或等于所述历史快照的创建时间。
因此,本申请实施例提供的快照比对的方法,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点时,可以跳过节点中包括的文件的更新时间小于或历史快照的创建时间的跳转节点,仅访问节点中除跳转节点以外的节点,可以减少访问的节点的个数,进一步节省时间,提高快照比对过程中的效率。
可选地,所述根据所述目标叶子节点,确定目标文件,包括:
根据所述目标叶子节点,确定所述目标文件的创建时间;
从所述目标文件的创建时间中,将创建时间在所述历史快照的创建时间之前的文件确定为所述文件系统中被修改的文件,和/或,将创建时间在所述历史快照的创建时间之后的文件确定为所述文件系统中被新写的文件。
可选地,所述方法还包括:
确定创建时间在所述第一时段内的释放日志,所述释放日志包括删除记录,所述删除记录用于记录在所述第一时段内所述文件系统中被删除的数据在未删除前的存储位置;
根据所述释放日志,确定在所述第一时段内所述文件系统中被删除的数据。
因此,本申请实施例提供的快照比对的方法,通过创建时间在历史快照的创建时间和当前快照的创建时间之间的时段(例如,第一时段)内的释放日志,可以快速地确定在该第一时段内文件系统中被删除的数据,进一步节省了快照比对过程的时间,提高了整个快照比对过程中的效率。
可选地,在所述确定创建时间在所述第一时段内的释放日志之前,所述方法还包括:
在第一时刻删除所述文件系统中的第一数据,所述第一时刻位于所述第一时段之间;
若在所述第一时刻的释放日志中已存储的删除记录中包括目标删除记录,则更新所述目标删除记录,以使得所述目标删除记录还用于记录所述第一数据在未删除前的存储位置;
若在所述第一时刻的释放日志中已存储的删除记录中不包括目标删除记录,则将用于记录所述第一数据在未删除前的存储位置的删除记录写入至所述第一时刻的释放日志中;
其中,所述目标删除记录对应的数据在未删除前的存储位置与所述第一数据在未删除前的存储位置连续。
因此,本申请实施例提供的快照比对的方法,在释放日志中记录删除记录的过程中,对于在任意时刻(例如,第一时刻)删除的数据(例如,第一数据),若释放日志中存在满足条件的目标删除记录,则更新该目标删除记录,使得该目标删除记录还用于记录该第一数据在未删除前的存储位置,进一步节省了系统空间。
可选地,所述释放日志包括N个子释放日志,所述N个子释放日志与N个快照中之 间存在对应关系,第i个子释放日志与第i个快照对应,所述N个快照包括所述历史快照和所述历史快照与所述当前快照之间的N-1个快照,所述N个快照是按照时间先后顺序依次创建的,所述N为大于或等于1的整数,
所述第i个子释放日志包括的删除记录用于记录在中间时段内所述文件系统中被删除的数据在未删除前的存储位置,所述中间时段位于第i个快照的创建时间和第i+1个快照的创建时间之间,所述i∈[1,N],以及,
所述确定创建时间在所述第一时段内的释放日志,包括:
根据所述N个快照和所述对应关系,确定所述N个子释放日志。
可选地,所述对应关系包括第一对应关系和第二对应关系,所述第一对应关系包括所述N个快照与N个索引号之间的对应关系,所述第二对应关系包括所述N个索引号和所述N个子释放日志之间的对应关系,所述N个索引号是基于预设的规则分配的,以及,
所述根据所述N个快照和所述对应关系,确定所述N个子释放日志,包括:
根据所述历史快照、所述N个快照中的第N个快照和第一对应关系,确定对应于所述历史快照的索引号和对应于所述第N个快照的索引号;
根据所述历史快照的索引号、所述第N个快照的索引号和所述预设的规则,确定所述历史快照的索引号与所述第N个快照的索引号之间的N-2个索引号;
根据N个索引号和所述第二对应关系,确定所述释放日志,所述N个索引号包括所述历史快照的索引号、所述第N个快照的索引号和所述N-2个索引号。
可选地,所述方法还包括:
若所述历史快照为所述文件系统中第一个支持快照比对的快照,则删除所述历史快照和对应于所述历史快照的子释放日志。
第二方面,提供了一种快照比对的装置,所述装置可以包括用于执行上述第一方面或第一方面的任意可能的实现方式中的任一方法。
第三方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储有程序,所述程序使得通信设备执行上述第一方面其可能的实施方式中的任一方法。
第四方面,提供一种计算机程序,所述计算机程序在某一计算机上执行时,将会使所述计算机实现上述第一方面其可能的实施方式中的任一方法。
在上述某些实现方式中,所述中间节点层中的第一中间节点包括第一中间时间,当所述第一中间节点是所述中间节点层中的最后一层中间节点时,所述第一中间时间包括所述第一中间节点所指向的每个叶子节点包括的文件的更新时间,当所述第一中间节点是所述中间节点层中除所述最后一层中间节点以外的中间节点时,所述第一中间时间包括所述第一中间节点所指向的下一级中间节点中每个中间节点所包括的文件的更新时间中的最后更新时间,所述第一中间节点是所述中间节点层中的任一个中间节点,所述更新时间包括所述文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间。
附图说明
图1是应用于本申请实施例的存储设备的示意性结构图。
图2是根据本申请实施例的快照比对的方法的示意性流程图。
图3是根据本申请实施例的快照比对的方法中的时间树的示意性结构图。
图4是根据本申请另一实施例的快照比对的方法的示意性流程图。
图5是根据本申请另一实施例的快照比对的方法中的快照与释放日志之间的关系的示意性框图。
图6是根据本申请另一实施例的快照比对的方法中的数据的存储位置之间的关系的示意性框图。
图7是根据本申请另一实施例的快照比对的方法中的快照与释放日志之间的关系的示意性框图。
图8至图10是根据本申请再一实施例的快照比对的方法中的快照和释放日志之间的关系的示意性框图。
图11是根据本申请实施例的快照比对的装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
应理解,本申请实施例可以应用于所有能够实现快照比对的存储系统中,例如,树结构的文件系统存储系统。
下面,首先对本申请实施例中涉及到的基于快照比对进行增量备份的方式进行简单说明。在增量备份中,对文件系统创建第一次快照(为了便于理解与区分,记为第一快照)后,首先将文件系统的所有数据进行全量备份,后续进行备份时,再创建一个快照(为了便于理解与区分,记为第二快照),将该第二快照与该第一快照进行比对后获得两个快照之间发生变化的数据,在进行备份时,只需要将在两个快照创建时间之间文件系统内发生变化的数据进行备份即可。在增量备份过程中,只需要备份发生变化的数据,这样,大大减少了数据备份过程中的时间,提高了数据备份的效率,同时,减少了备份端的占用空间,因此,采用增量备份的方式进行数据备份被广泛应用。
当在某个时间点(为了便于理解与区分,记为数据恢复时间点)需要通过增量备份进行数据恢复时,根据数据恢复时间点之前的第一次全量备份以及进行第一次全量备份之后到数据恢复时间点之间的所有增量备份恢复数据,具体数据恢复过程可以参考现有技术,此处不再赘述。
在增量备份中,基于两个快照比对来确定文件系统发生变化的数据是进行增量备份的核心,现有技术的快照比对的方式,首先会遍历该第二快照所保护的所有文件,其次,根据遍历的结果从该第一快照所保护的文件中进行查找,若在该第一快照中查找到的部分文件的更新时间小于该第二快照中所保护的文件的更新时间,或,若在该第一快照中未查找到该第二快照中所保护的部分文件,则表示这些部分文件是文件系统在两个快照的创建时间之间被修改和/或新写的文件,再者,遍历该第一快照中的所有文件,接着,根据遍历的结果从该第二快照中查找,若在该第二快照中未查找到该第一快照所保护的部分文件,则表示这些部分文件是文件系统在两个快照的创建时间之间被删除的文件。从而,基于这种快照比对的方式确定文件系统发生变化的数据。
在此种快照比对的方式中,由于需要对两个快照进行两次全遍历以及两次查找过程,使得快照比对过程中占用了较长时间,降低了数据备份的效率,从而降低了数据备份的性能。
本申请提供了一种快照比对的方法,能够有助于解决上述问题。
首先,结合图1针对应用于本申请实施例的存储设备进行简单描述。
图1所示为应用于本申请实施例的存储设备100的示意性结构图。如图1所示,该存储设备100包括处理器110、存储器120、网络适配器130以及输入/输出(Input/Output,IO)接口140。各个组件的功能如下:
处理器110,该处理器110可以是中央处理单元(Central Processing Unit,CPU),该处理器110还可以是其他通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。其中,本申请实施例中的各步骤都可以通过处理器110中的硬件的集成逻辑电路或者软件形式的指令完成。
存储器120,用于存储该处理器110中的运算数据,以及与硬盘等外部存储设备交换的数据。该存储器120可以包括只读存储器和随机存取存储器,并向该处理器110提供指令和数据。该存储器120的一部分还可以包括非易失性随机存取存储器,例如,在本申请实施例中,该存储器120可以用于存储该处理器11计算的扫描结果。
网络适配器130,又称为网卡或网络接口卡,是使用计算机联网的设备,用于访问网络。
IO接口140,该处理器110通过该存储设备的内部总线和该I/O接口140连接,该I/O接口140再和外部设备150连接,最终实现该处理器110和外部设备150的信息传输,用户可以通过该I/O接口对该处理器110下发指令。其中,该外部设备150包括例如U盘、鼠标、键盘、打印机等设备,此处不再细说。在本申请实施例中,该外部设备可以是硬盘等用于存储数据的存储设备,例如,U盘、硬盘等。
上述通过图1对本申请实施例的存储设备进行了简单描述,下面,结合图2至图9,对本申请实施例的快照比对的方法200进行详细说明。为了便于描述,将用于执行本申请实施例的设备称为存储设备,具体为该存储设备中的处理器。
图2所示为本申请实施例的快照比对的方法的示意性流程图,下面,基于图2对本申请实施例的各个步骤分别进行描述。
S210,获取对应于文件系统的当前快照的时间树。
具体而言,该时间树包括根节点层、中间节点层和叶子节点层,该根节点层包括一个根节点,该中间节点层包括至少一层中间节点,该根节点指向该中间节点层中的第一层中间节点,该中间节点层中的最后一层中间节点指向该叶子节点层中的叶子节点,
其中,该叶子节点层中的每个叶子节点包括该文件系统的一个文件的更新时间,该中间节点层中的第一中间节点包括第一中间时间,该第一中间时间包括该第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,该根节点包括该当前快照的创建时间,该第一中间节点是该中间节点层中的任一个中间节点,该更新时间包括该文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间。
具体而言,该存储设备在创建快照过程中,也会生成与快照对应的时间树,该时间树中可以记录文件的更新时间,该更新时间包括文件的修改时间和/或写入时间,当文件系统中的文件被修改和/或新写时,该时间树中的时间信息也会时时更新。应理解,该时间 树也同一般树结构一样,还可以记录一些用于查找文件或数据的索引信息,使得后续基于快照查找文件或数据时,可以通过对应的树结构中记录的索引信息查找文件或数据。
这里,首先简单阐述一下树结构中的各种类型的节点。在一个树结构中,树的节点分为根节点、叶子节点、父节点和中间节点这四类。中间节点是父节点的下一级节点,一个节点如果有上一级,则称这个上一级是它的父节点,如果没有上一级,则这个节点则无父节点。一棵树中没有中间节点的节点,称为叶子节点。在当前节点之上已经没有其他的节点,这个节点叫做根节点。应理解,本申请实施例中的中间节点也可以称为子节点,节点也可以称为结点。
本申请实施例中的时间树可以理解为一种记录有时间信息的树结构,该时间树中包括多层节点,每层节点中包括有至少一个节点,多层节点中的任意两层节点树状连接,使得上一层节点通过这种树状连接结构指向下一层节点。在本申请实施例中,为了便于区分与理解,将该时间树中的多层节点分别记为根节点层、中间节点层和叶子节点层。该根节点层只包括根节点,该中间节点层包括至少一层中间节点,该中间节点层中的第一层中间节点的父节点就是根节点,该中间节点层中的最后一层中间节点是叶子节点层中的叶子节点的父节点,当该中间节点层只有一层中间节点时,该第一层中间节点也为该最后一层中间节点,当该中间节点层包括多层中间节点时,任意两层中间节点树状连接,使得上一层中间节点通过树状连接结构指向下一层中间节点。具体关于树结构的指向关系可以参考现有技术,此处不再赘述。
其中,对于该中间节点层中的任一个中间节点,即第一中间节点来说,该第一中间节点包括第一中间时间,该第一中间时间包括该第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,该第一中间节点所指向的下一级节点可以是中间节点也可以是叶子节点。具体而言,当该第一中间节点是该中间节点层中的最后一层中间节点时,该第一中间节点的下一级节点是叶子节点层中的叶子节点,由于每个叶子节点中只包括一个文件的更新时间,那么,该第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间即为该第一中间节点所指向的每个叶子节点所包括的文件的更新时间,即,该第一中间时间包括该第一中间时间所指向的每个叶子节点所包括的文件的更新时间;当该第一中间节点是该中间节点层中除该最后一层中间节点以外的中间节点时,该第一中间节点所指向的下一级节点是中间节点,那么,该第一中间时间包括该第一中间时间所指向的下一级中间节点中每个中间节点所包括的文件的更新时间中的最后更新时间。
图3所示为本申请实施例的快照比对的方法的时间树的示意性结构图。如图3所示,该时间树有4层,中间节点层包括2层中间节点,根节点包括根节点A,第一层中间节点包括中间节点B,第二层中间节点包括中间节点C1、C2和C3,叶子节点层包括叶子节点D1、D2、D3、D4和D5,其中,根节点A指向中间节点B,中间节点B指向下一层中间节点C1、C2、C3,中间节点C1指向叶子节点D1和D2,中间节点C2指向叶子节点D3,中间节点C3指向叶子节点D4和D5。根节点包括当前快照的创建时间,即t19;每个叶子节点都包括一个文件的更新时间,例如,叶子节点D1包括文件#A的更新时间,即t15,叶子节点D2包括文件#B的更新时间,即t6等;当第一中间节点是第二层中间节点(即,该中间节点层中的最后一层中间节点)中的节点时,例如,该第一中间节点为中间节点 C1时,中间节点C1中的中间时间包括文件#A的更新时间t15和文件#B的更新时间t6,中间节点C1所指向的叶子节点D1包括文件#A的更新时间t15,中间节点C1所指向的叶子节点D2包括文件#B的更新时间t6;当第一中间时间是第一层中间节点(即,该中间节点层中除最后一层中间节点以外的中间节点)时,例如,该第一中间节点为中间节点B,中间节点B中的中间时间包括文件#A的更新时间t15、文件#D3的更新时间t18和文件#D的更新时间t8,其中,文件#A的更新时间t15是中间节点B所指向的中间节点C1中包括的所有文件的更新时间(即,t15和t6)中的最后更新时间,文件#D3的更新时间t18是中间节点B所指向的中间节点C2中所包括的所有文件的更新时间(即,t18)中的最后更新时间,实际上,中间节点C2仅包括文件#E的更新时间,所以,中间节点C2中所包括的所有文件的更新时间中的最后更新时间即为中间节点C2中所包括的文件的更新时间,文件#D的更新时间t8是中间节点B所指向的中间节点C3中包括的所有文件的更新时间(即,t7和t8)中的最后更新时间。
应理解,上述图3所示的时间树仅为示意性说明,每个中间节点还可以指向其他的节点,图中只是未示出而已;此外,时间树中的中间节点层也可以仅包括一层中间节点层或者包括至少三层中间节点层,本申请实施例并不限于此。
作为示例而非限定,可以将中间节点B中的时间信息添加至根节点A中,使得根节点A的下一级节点为中间节点C1、中间节点C2以及中间节点C3。
S220,根据该文件系统的历史快照的创建时间,自该根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,该目标叶子节点包括的文件的更新时间大于该历史快照的创建时间。
具体而言,该存储设备从该历史快照中获取该历史快照的创建时间(为了便于区分与理解,记为创建时间#A),基于该创建时间#A,从根节点开始逐层访问每层节点中的至少部分节点所包括的时间,将所有叶子节点中包括的文件的更新时间大于该创建时间#A的叶子节点确定为目标叶子节点,其中,该至少部分节点可以包括树结构中的全部节点,也可以包括树结构中除跳转节点以外的所有节点,该跳转节点是中间节点和叶子节点中的至少一种类型的节点,并且,该跳转节点包括至少一个节点,该至少一个节点中的每个节点包括的文件的更新时间小于或等于该创建时间#A。
继续以图3为例,假设该创建时间#A为t11,那么,在访问时间树时,逐层访问,在将目标叶子层访问完毕后,基于该创建时间#A,叶子节点中包括的文件的更新时间大于该创建时间#A的叶子节点是叶子节点D1和叶子节点D3,那么,叶子节点D1和叶子节点D3即为目标叶子节点。
S230,根据该目标叶子节点,确定目标文件。
其中,该目标文件包括在第一时段内该文件系统中被修改和/或被新写的文件,该第一时段位于该历史快照的创建时间和该当前快照的创建时间之间。
具体而言,每个叶子节点中还包括文件的元数据的数据信息,该元数据的数据信息用于确定对应的文件的元数据,元数据中包括与文件的属性相关的内容。以任一个目标叶子节点(为了便于区分与理解,记为第一目标叶子节点)为例,通过该第一目标叶子节点中包括的元数据的数据信息,确定对应于该第一目标叶子节点的目标文件(为了便于区分与理解,记为第一目标文件)的元数据,进而基于第一目标文件的元数据来确定第一目标文 件。在图3所示的时间树中,可以看出基于该目标叶子节点确定的目标文件包括文件#A和文件#E。
下面,对步骤S230的具体实现过程进行详细说明。
在一种可选的实现方式中,该根据该目标叶子节点,确定目标文件,包括:
根据该目标叶子节点,确定该目标文件的创建时间;
从该目标文件的创建时间中,将创建时间在该历史快照的创建时间之前的文件确定为该文件系统中被修改的文件,或,将创建时间在该历史快照的创建时间之后的文件确定为该文件系统中被新写的文件。
具体而言,如上所述,该存储系统可以通过任一个目标叶子节点(例如,第一目标叶子节点)中包括的元数据的数据信息中获取对应的目标文件(例如,第一目标文件)的元数据,从对应于该第一目标文件的元数据中获取该第一目标文件的创建时间,将该第一目标文件的创建时间与该创建时间#A进行比较,若该第一目标文件的创建时间在该创建时间#A之前,则该第一目标文件即为文件系统中被修改的文件,若该第一目标文件的创建时间在该创建时间#A之后,则该第一目标文件即为文件系统中被新写的文件。
继续以图3为例,通过时间树查找到目标叶子节点(即,叶子节点D1和叶子节点D3)后,只能确定文件#A和文件#E是文件系统中被更新的文件,但是,哪个文件是文件系统中被修改的文件,哪个文件是文件系统中被新写的文件,还需要通过被更新的文件的创建时间来进一步确定:即,通过叶子节点D1获取文件#A的创建时间,假设该文件#A的创建时间为t10,该文件#A的创建时间小于该创建时间#A,即,t10小于t11,则表示该文件#A是文件系统中被更新的文件,通过叶子节点D3获取文件#E的创建时间,假设该文件#E的创建时间为t18,,该文件#E的创建时间大于该创建时间#A,即t18大于t11,则表示该文件#E是文件系统中被新写的文件。这样,就完成了确定该目标文件的过程。
作为示例而非限定,上述基于目标叶子节点确定该目标文件的过程仅为示意性说明,本申请实施例并不限于此。
例如,存储设备在查找到目标叶子节点后,可以基于历史快照,查找在历史快照的创建时间之前的时段内文件系统中的文件,若查找到该目标文件中的至少部分文件,则表示该至少部分文件是被修改的文件,若未查找到目标文件中的至少部分文件,则表示该至少部分文件是被新写的文件。
应理解,整个快照比对过程中需要确定在两个快照的创建时间之间文件系统中所有发生变化的文件,不仅包括文件系统中被修改和/或被新写的文件,还包括文件系统中被删除的文件,这里,仅仅描述了确定文件系统中被修改和/或被新写的文件的过程,对于删除的文件,可以与现有技术类似,也可以参考下文对于确定文件系统中被删除的文件的过程。
在现有技术中,为了确定该目标文件,需要首先遍历当前快照所保护的所有文件,且需要访问至文件本身所在的存储位置,接着从历史快照所保护的文件中查找满足条件的文件,同样需要访问至文件本身所在的存储位置,这样,一次遍历加一次查找过程比较浪费时间,而在本申请实施例中,基于历史快照的创建时间,直接从时间树中查找文件的更新时间大于或等于该创建时间#A的目标叶子节点,进而基于目标叶子节点确定该目标文件,可以通过一次查找过程就能确定该目标文件,有效地节省了时间。
因此,本申请实施例提供的快照比对的方法,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点,以查找到节点中所包括的文件的更新时间大于该历史快照的创建时间的目标叶子节点,从而基于该目标叶子节点确定在第一时段内文件系统中被修改和/或被新写的文件(即,目标文件),相比于现有技术确定该目标文件的过程,有效地节省了快照比对过程中的时间,提高了效率,从而,有助于提高整个快照比对过程中的效率。
如上所述,当逐层访问时间树中的至少部分节点时,该至少部分节点可以包括该时间树中除跳转节点以外的所有节点,其中,该跳转节点是中间节点和叶子节点中的至少一种类型的节点,并且,该跳转节点包括至少一个节点,该至少一个节点中的每个节点包括的文件的更新时间小于或等于该创建时间#A。以任一个跳转节点(为了便于区分与理解,记为第一跳转节点)为例,若该第一跳转节点是中间节点,则该中间节点中包括的所有文件的更新时间都小于或等于该创建时间#A,若该第一跳转节点是叶子节点,则叶子节点包括的一个文件的更新时间小于或等于该创建时间#A。
当然,作为示例而非限定,该至少部分节点也可以包括该时间树中的全部节点,本申请实施例并不限于此。
在步骤S220中,当该至少部分节点包括该时间树中除跳转节点以外的所有节点时,可以通过方式1查找目标叶子节点,当该至少部分节点该时间树中的全部节点时,可以通过方式2查找目标叶子节点。方式1与方式2的实现过程略有不同,下面,分别对两种方式分别进行说明。
方式1
继续以图3为例,从根节点A访问至中间节点B,读取中间节点B中包括的所有时间后,确定指向中间节点C3所包括的文件的更新时间中的最后更新时间t8小于该创建时间#At11,则在访问下一层中间节点时,直接跳过中间节点C3以及中间节点C3指向的叶子节点D4和叶子节点D5,只访问中间节点C1和中间节点C2,这是因为,在中间节点C3所包括的文件的更新时间中的最后更新时间t8小于创建时间#At11的情况下,中间节点C3所指向的叶子节点中的所有文件的更新时间都小于或等于t8,这样,只要中间节点C3中包括的更新时间小于创建时间#A,就没有必要继续访问中间节点C3以及其所指向的叶子节点D4和叶子节点D5;同理,当访问至中间节点C1且读取文件的更新时间后,确定中间节点C1所指向的叶子节点D2中包括的文件#B的更新时间t6小于该创建时间#At11,可以直接跳过叶子节点D2,只访问叶子节点D1;当访问至中间节点C2时,确定中间节点C2所指向的叶子节点D3中包括的文件#E的更新时间t18大于该创建时间#A t11,则继续访问中间节点C2指向的叶子节点D3。这样,完成了针对时间树的访问过程,从而查找出目标叶子节点。
因此,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点时,可以跳过节点中包括的文件的更新时间小于或等于历史快照的创建时间的跳转节点,仅访问节点中除跳转节点以外的节点,可以减少访问的节点的个数,进一步节省时间,提高快照比对过程中的效率。
方式2
继续以图3为例,从根节点开始逐层遍历每层节点中的所有节点,当遍历完叶子节点 后,从所有叶子节点中包括的文件的更新时间中,将文件的更新时间大于该创建时间#A的叶子节点确定为目标叶子节点,即,将叶子节点D1和叶子节点D3确定为目标叶子节点。
针对如何确定在第一时段内该文件系统被删除的文件的过程,本申请实施例也提供了一种可选的实现方式。
图4所示为本申请另一实施例的快照比对的方法的示意性流程图。如图4所示,可选地,该方法还包括:
S240,确定创建时间在该第一时段内的释放日志,该释放日志包括删除记录,该删除记录用于记录在该第一时段内该文件系统中被删除的数据在未删除前的存储位置;
S250,根据该释放日志,确定在该第一时段内该文件系统中被删除的数据。
这里需要强调的是,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。尤其是,步骤S210至步骤S230和步骤S240至步骤S250可以同时执行,也可以按照其功能和内在逻辑执行。
在本申请实施例中,该存储设备在创建快照的同时,会相应地创建释放日志,在创建释放日志后,若删除数据,则在该释放日志中写入删除记录,该删除记录用于记录在两个快照的创建时间之间的时段(即,第一时段)内该文件系统中被删除的数据在未删除前的存储位置,这样,通过查找创建时间在该第一时段内的释放日志,就可以确定在该第一时段内该文件系统中被删除的数据。
此外,该文件系统中被删除的数据可以包括文件系统中至少一个文件中每个文件中的一部分数据,也可以包括至少一个文件中每个文件的全部的
当文件系统中被删除的数据包括一个文件中的部分数据时,该删除记录可以包括:该部分数据所属的文件的标识(例如,文件的ID)、该部分数据在文件中的偏移位置以及该部分数据的大小,其中,该部分数据在文件中的偏移位置表示该部分数据的起始位置相对于文件的起始位置的偏移,偏移位置的单位可以为容量单位,例如,M、K、bit等,可以理解为一种逻辑偏移位置。
例如,文件#A的大小为256M,文件#A中被删除的数据的大小为10M,具体是第10M与第20M之间的数据,那么,该文件#A中被删除的数据的偏移位置即为10M。
当文件系统中被删除的数据包括一个文件(例如,文件#A)的全部数据时,该删除记录也可以包括:该文件#A所属的对象的标识(例如,文件所属的对象的inode号)、该文件#A在所属对象中的偏移位置以及该文件#A的大小、该文件的标识、文件的起始位置以及文件的大小。
在本申请实施例中,每次创建快照后,可以为快照分配一个释放日志,用于记录在前后两个快照的创建时间之间的时段内文件系统内被删除的数据,其中,按照创建时间的先后顺序,该两个快照可以是相邻的两个快照,该两个快照之间也可以有其他快照,视存储设备接收到用户下发的业务需求而定,例如,在备份时间以外的时间,存储设备需要对本地的数据进行多份增量备份,则可以按照每份增量备份的时间需求创建快照来进行数据的备份;再例如,存储设备需要基于定时创建的快照进行快照回滚。
因此,本申请实施例提供的快照比对的方法,通过创建时间在历史快照的创建时间和 当前快照的创建时间之间的时段(例如,第一时段)内的释放日志,可以快速地确定在该第一时段内文件系统中被删除的数据,进一步节省了快照比对过程的时间,提高了整个快照比对过程中的效率。
此外,释放日志中记录的删除记录占用的系统空间较小,因此,本申请实施例在不影响文件系统的写入数据的情况下,能够有效地提高快照比对过程的效率,进而,可以提高数据备份的性能。
图5所示为本申请实施例的快照比对的方法中的快照与释放日志之间的关系的示意性框图。如图5所示,t11与t19之间的时段为第一时段,存储设备在时间t11创建了快照#1,该快照#1即为该历史快照,t11也是存储设备上次备份数据的时间,下次启动备份的时间为t19,创建了快照#4,该快照#4即为该当前快照,释放日志写入了用于记录在该第一时段内文件系统中被删除的数据的删除记录。此外,在该第一时段内,还创建了快照#2与快照#3。当然,在该快照#1与该快照#4之间也可以没有其他快照,本申请实施例不做任何限定。
在本申请实施例中,针对在释放日志中记录删除记录的过程,本申请实施例提供了一种可选的实现方式,
在该确定创建时间在该第一时段内的释放日志之前,该方法还包括:
在第一时刻删除该文件系统中的第一数据,该第一时刻属于该第一时段;
若在该第一时刻的释放日志中已存储的删除记录中包括目标删除记录,则更新该目标删除记录,以使得该目标删除记录还用于记录该第一数据在未删除前的存储位置;
若在该第一时刻的释放日志中已存储的删除记录中不包括目标删除记录,则将用于记录该第一数据在未删除前的存储位置的删除记录写入至该第一时刻的释放日志中;
其中,该目标删除记录对应的数据在未删除前的存储位置与该第一数据在未删除前的存储位置连续。
具体而言,该第一时刻是位于该第一时段中的任意时刻,该第一时刻的释放日志表示的是处于该第一时刻且正在记录删除记录的释放日志,该第一时刻的释放日志中已经记录了部分删除记录。
以第一时刻删除的第一数据为例,在释放日志中记录删除记录的过程中,可以对在第一时刻删除的第一数据与该第一时刻的释放日志中已存储的删除记录对应的数据进行比较:若在该第一时刻的释放日志中存在删除记录(即,目标删除记录)对应的数据(为了便于区分与理解,记为第二数据)在未删除前的存储位置与该第一数据在未删除前的存储位置连续,则可以将用于记录该第一数据的删除记录添加至该目标删除记录,以更新该目标删除记录,使得该目标删除记录不仅用于记录该第二数据在未删除前的存储位置,还用于记录该第一数据在未删除前的存储位置;若该第一时刻的释放日志中不存在该目标删除记录,则在该第一时刻的释放日志中添加新的删除记录,即,将用于记录该第一数据在未删除前的存储位置的删除记录写入至该第一时刻的释放日志中。对每次删除的数据都执行上述操作,从而完成释放日志的删除记录的记录过程。
如前所述,若该删除记录中包括文件系统中被删除的数据所属的文件的标识、该被删除的数据在文件中的偏移位置以及该被删除的数据的大小,那么,在确定目标删除记录过程中,可以通过上述3个参数进行判断,例如,若是该被删除的数据所在的文件的标识与 删除记录中对应的数据所属的文件的标识相同,且该被删除的数据的偏移位置加上该被删除的数据的大小等于删除记录中对应的数据的偏移位置,或者,该被删除的数据的偏移位置等于删除记录中对应的数据的偏移位置加上对应的数据的大小,则该删除记录为目标删除记录。
以图6为例,在该第一时刻的释放日志中存储有删除记录,该第一数据(为了便于区分与理解,记为数据#1)所属的文件为文件#A,该数据#1在该文件#A中的偏移位置为20M,该数据#1的大小为10M。其中,该第一时刻的释放日志中已存储的删除记录#21对应的数据为数据#21,该删除记录#21包括:文件#A(文件标识),10M(偏移位置),10M(大小),表示的是该数据#21所属的文件为文件#A,偏移位置为10M,大小为10M,该数据#21的偏移位置加上该数据#21的大小等于该数据#1的偏移位置,即,10+10=20,则该删除记录#21即为目标删除记录,则该删除记录#21更新后包括的内容变为:文件#A,10M,20M,更新后的删除记录#21表示该数据#1与该数据#21在未删除前的存储位置;或者,该第一时刻的释放日志中已存储的删除记录#22对应的数据为数据#22,该删除记录#22包括:文件#A,30M,10M,表示的是该数据#22所属的文件为文件#A,偏移位置为30M,大小为10M,该数据#1的偏移位置加上该数据#1的大小等于该数据#22的偏移位置,即20+10=30,则该删除记录#22即为目标删除记录,则该删除记录#22更新后包括的内容变为:文件#A,20M,20M,更新后的删除记录#22表示该数据#1与该数据#22在未删除前的存储位置。
这样,在释放日志中记录删除记录的过程中,对于在任意时刻(例如,第一时刻)删除的数据(例如,第一数据),若释放日志中存在满足条件的目标删除记录,则更新该目标删除记录,使得该目标删除记录还用于记录该第一数据在未删除前的存储位置,进一步节省了系统空间。
实际上,对于增量备份来说,前一个快照的创建时间是上次备份的时间,后一个快照的创建时间是下一次启动备份的时间,因此,对于增量备份来说,释放日志需要写入用于记录前一次备份的时间到下一次备份的时间之间的时段(例如,第一时段)内文件系统中被删除的数据的删除记录,若是在该第一时段内的其他时间基于用户的其他业务的需求还需要创建快照时,至于是否需要为其他快照创建释放日志,本申请实施例不做限定。不过,考虑到在其他场景(例如,回滚快照,多份增量备份)中也可以使用释放日志,为了降低实现的复杂度本申请实施例还提供了一种可选的实现方式:
可选地,该释放日志包括N个子释放日志,该N个子释放日志与N个快照中之间存在对应关系,第i个子释放日志与第i个快照对应,该N个快照包括该历史快照和该历史快照与该当前快照之间的N-1个快照,该N个快照是按照时间先后顺序依次创建的,该N为大于或等于1的整数,
该第i个子释放日志包括的删除记录用于记录在中间时段内该文件系统中被删除的数据在未删除前的存储位置,该中间时段位于第i个快照的创建时间和第i+1个快照的创建时间之间,该i∈[1,N],以及,
该确定创建时间在该第一时段内的释放日志,包括:
根据该N个快照和该对应关系,确定该N个子释放日志。
也就是说,存储设备每次创建快照后都会为快照分配对应的释放日志,即,一个快照 对应一个子释放日志,每个子释放日志用于记录在前后相邻的两个快照的创建时间之间的时段内文件系统中被删除的数据。在确定该释放日志过程中,可以根据该N个快照与该对应关系,确定每个快照对应的释放日志,这样,就确定了该N个子释放日志(即,该释放日志)。
如图7所示,N为3,t11与t19之间的时段为第一时段,快照#1为该历史快照,快照#4为该当前快照,快照#2和快照#3的创建时间位于该快照#1的创建时间之后且位于该快照#4的创建时间之前,该快照#1与该快照#4之间的N-1个快照即为快照#2和快照#3,快照#1至快照#3分别对应子释放日志#1至子释放日志#3。
具体而言,该快照#1对应该子释放日志#1,该子释放日志#1用于记录在该快照#1的创建时间与下一个快照(即,该快照#2)的创建时间之间的时段(例如,第一中间时段)内文件系统中被删除的数据;该快照#2对应该子释放日志#2,该子释放日志#2用于记录在该快照#2的创建时间与下一个快照(即,快照#3)的创建时间之间的时段(例如,第二中间时段)内文件系统中被删除的数据;该快照#3对应该子释放日志#3,该子释放日志#3用于记录在该快照#3的创建时间与下一个快照(即,快照#4)的创建时间之间的时段(例如,第三中间时段)内文件系统中被删除的数据。以该快照#1和对应的该子释放日志#1为例,存储设备在创建该快照#1后创建对应的该子释放日志#1,自该快照#1的创建时间t11之后,在该子释放日志#1中记录文件系统中被删除的数据的删除记录,直到创建快照#2后停止记录,这样,该子释放日志#1记录了在该第一中间时段内文件系统中被删除的数据的删除记录。
这里需要指出的是,释放日志可以不用记录在对应的快照的创建时间(或,释放日志的创建时间)之后新建的文件后续在下一个快照的创建时间之前删除的数据的删除记录,因为在这种情况下删除的数据在前后两个快照所保护的数据中都是不存在的,就没有必要记录在这种情况下删除的数据的删除记录。例如,以图7中的该快照#1和该子释放日志#1为例,若在时间t12文件系统创建了文件#1,后续在时间t12删除了该文件#1,那么,对于该快照#1和该快照#2来说,都没有保护该文件#1,因此,就不用在该子释放日志#1中记录对应该文件#1的删除记录。
在本申请实施例中,确定创建时间在该第一时段的释放日志的方式可以有两种,下面,分别对这两种方式进行说明。
方式A
在一种可选的实现方式中,该对应关系包括第一对应关系和第二对应关系,该第一对应关系包括该N个快照与N个索引号之间的对应关系,该第二对应关系包括该N个索引号和该N个子释放日志之间的对应关系,该N个索引号是基于预设的规则分配的,以及,
该根据该N个快照和该对应关系,确定该N个子释放日志,包括:
根据该历史快照、该N个快照中的第N个快照和第一对应关系,确定对应于该历史快照的索引号和对应于该第N个快照的索引号;
根据该历史快照的索引号、该第N个快照的索引号和该预设的规则,确定该历史快照的索引号与该第N个快照的索引号之间的N-2个索引号;
根据N个索引号和该第二对应关系,确定该释放日志,该N个索引号包括该历史快照的索引号、该第N个快照的索引号和该N-2个索引号。
也就是说,快照、索引号与子释放日志之间分别存在对应关系,快照与索引号之间的对应关系称为对应关系#1(即,第一对应关系的一例),即,一个快照对应一个索引号,子释放日志与索引号之间的对应关系称为对应关系#2(即,第二对应关系的一例),一个索引号对应一个子释放日志,其中,该对应关系#1可以是快照的标识与索引号之间的对应关系,该对应关系#2可以是索引号与子释放日志的标识之间的对应关系。可以这么理解,索引号是基于快照确定子释放日志之间的桥梁,即,为了确定子释放日志,可以通过快照与该对应关系#1确定对应的索引号,进而,通过索引号与该对应关系#2确定对应的子释放日志。
这里,每个索引号对应一个索引数据,索引数据中包括索引号对应的子释放日志的标识,这样,通过索引号与对应关系#2确定对应的子释放日志过程中,可以基于索引号确定索引数据,再基于索引数据中的子释放日志的标识来确定对应的子释放日志。
此外,该N个索引号是基于预设的规则分配的,这样,可以通过已知索引号和该预设的规则确定其他索引号。例如,该预设的规则可以是,任意相邻的两个索引号的序号之间的差值是定值,举例来说,任意相邻的两个索引号的序号之间的差值是“1”,即,索引号是按照1,2,3,4,…的顺序分配的;再例如,该预设的规则可以是相邻的两个索引号的序号之间的商是定值,举例来说,任意相邻的两个索引号的序号之间的商是“2”,即,索引号是按照1,2,4,8,…的顺序分配的,等等。预设的规则可以有很多种,这里不再一一举例,任何能够通过已知的索引号以及预设的规则确定其他索引号的方案都在本申请的保护范围内。
这里,相邻的两个索引号是指该两个索引号对应的快照或子释放日志的创建时间是前后相邻的,第N个快照是该当前快照的前一个快照。
下面,继续结合图7,针对确定创建时间在该第一时段内的释放日志的过程进行说明。
假设该预设的规则是任意相邻的两个索引号的序号之间的差值是“1”,即索引号是按照1,2,3,4,…的顺序分配的。图7中的N=3,通过该快照#1以及对应关系#1确定对应的索引号(为了便于区分与理解,记为索引号#1,对应的序号是1),通过该快照#3以及对应关系#2(即,第N个快照)确定对应的索引号(为了便于区分与理解,记为索引号#3,对应的序号是3),基于该索引号#1、该索引号#3以及预设的规则,确定该索引号#1和该索引号#3之间的索引号(即,N-2个索引号)是索引号#2,对应的序号为2,那么,基于索引号#1、该索引号#2和该索引号#3以及该对应关系#2,确定对应的3个子释放日志,即,基于索引号#1、该索引号#2和该索引号#3,从每个索引号对应的索引数据中确定对应的子释放日志。
方式B
可选地,该确定创建时间在该第一时段内的释放日志,包括:
根据该历史快照、该当前快照和第三对应关系,确定对应于该历史快照的索引号和对应于该当前快照的索引号,该第三对应关系包括N+1个快照与N+1个索引号之间的对应关系,该N+1个快照包括该N个快照和该当前快照,该N+1个索引号是基于预设的规则分配的;
根据该历史快照的索引号、该当前快照的索引号和该预设的规则,确定该历史快照的索引号和该当前快照的索引号之间的N-1个索引号;
根据N个索引号和第四对应关系,确定该释放日志,该第四对应关系包括该N个索引号和N个子释放日志和之间的对应关系,该N个索引号包括该历史索引号和该N-1个索引号。
这里,该方式B与该方式A不同之处在于,在该方式B中是基于该历史快照、该当前快照和对应关系#3(即,第三对应关系的一例)确定历史快照的索引号(为了便于区分与理解,记为索引号#A)和对应于该当前快照的索引号(为了便于区分与理解,记为索引号#B),进而,根据该索引号#A、该索引号#B以及预设规则,确定该索引号#A、该索引号#B之间的N-1个索引号,从而,基于包括该索引号#A和该N-1个索引号和对应关系#4(即,第四对应关系的一例)确定该释放日志。
继续以图7为例,假设该预设的规则是任意相邻的两个索引号的序号之间的差值是“1”,即多个索引号是按照1,2,3,4,…的顺序分配的。图7中的N=3,通过该快照#1以及对应关系#3确定对应的索引号#A(对应的序号是1),通过该快照#4以及对应关系#3确定对应的索引号#B(对应的序号是3),基于该索引号#A、该索引号#B以及预设的规则,确定该索引号#A和该索引号#B之间的索引号(即,N-1个索引号)是索引号#C(对应的序号为2)和索引号#D(对应的序号为3),那么,基于索引号#A、该索引号#B和该索引号#C以及该对应关系#2,确定对应的3个子释放日志。
作为示例而非限定,除了上述两种方式外,还可以通过该历史快照和该当前快照,首先确定在该历史快照的创建时间和该当前快照的创建时间之间的时段(即,该第一时段)内创建的N-1个快照,再基于快照与索引号之间的对应关系和包括该历史快照和该N-1个快照在内的N个快照,确定每个快照的子释放日志,即,确定创建时间在该第一时段内的子释放日志。
为了方便针对索引号的管理,本申请实施例还创建了索引管理信息,用于记录与索引号相关的信息,主要用于为快照或释放日志分配对应的索引号以及后续删除快照以及对应的释放日志。该索引管理信息中的任一个内容发生变化,都需要及时更新。
该索引管理信息中至少包括如下内容:总索引号(Tindex)、总的支持快照比对的快照的索引个数(Count)、起始索引号(Sindex)以及第一个快照的索引号(Findex)。下面,分别每个内容的作用都进行简单描述。
Tindex:可以作为当前快照或释放日志对应的索引号。实现过程中,创建完当前快照或对应的释放日志后,将当前的Tindex作为当前快照或对应的释放日志的索引号,在记录完快照、索引号以及释放日志之间的关系(即,上述的对应关系#1和对应关系#2)后,更新Tindex以便于为后续的快照或释放日志分配的索引号是更新后的Tindex。
Count:用于判断是是否需要为当前创建的快照(简称当前快照)分配释放日志,具体地,若当前快照是支持快照比对的,则可以直接分配释放日志,若当前快照不是支持快照比对的,则需要看count是否大于0,只有在大于0的时候才分配释放日志。这是因为,若是在当前快照之前不存在支持快照比对的快照,且当前快照也不支持快照比对,那么,为当前快照创建释放日志是没有必要的,反而会浪费系统空间。当然,若当前快照支持快照比对,无论Count是多少,都会分配释放日志。
所以,在本申请实施例中,是否为当前快照分配释放日志,一方面,可以基于Count确定是否为当前快照分配释放日志,只要Count大于0,就可以分配释放日志;另一方面, 无论Count是多少,只要当前快照支持快照比对,则分配释放日志。
当然,也可以为每个快照分配释放日志,不用关心Count或当前快照是否支持快照比对,相比于上述方式,也仅仅是浪费一点空间,本申请实施例并不限于此。
需要说明的是,本申请实施例中不支持快照比对的快照可以通过存储系统接收到的用户下发的业务类确定,例如,若用户下发的业务不需要快照比对,那么,可以通过为业务做标记的方式来告知存储设备自己的业务不需要快照比对。
这里,进一步说明下为何在Count大于0时为所有快照分配释放日志的原因。
实际上,若系统将支持快照比对和不支持快照比对的两种快照混合创建时,如果仅为支持快照比对的快照创建释放日志,那么当快照回滚至不支持快照的时候,将无法从支持快照比对的快照所对应的释放日志中去掉不支持快照比对类型的快照记录的释放数据,也就是说,在快照回滚过程中不能回滚至不支持快照比对的快照中。举例来说,假设创建了快照#1、快照#2和快照#3,其中,快照#1是和快照#3是支持快照比对的,快照#2不支持快照比对,仅为快照#1分配了对应的释放日志,一直记录删除记录至快照#3的创建时间,假设,在快照#1和快照#2的创建时间之间文件系统被删除了数据#A,在快照#2和快照#3的创建时间之间文件系统被删除了数据#B,当发起快照回滚使得快照回滚至快照#2时,从释放日志中难以确定哪些数据是在快照#2和快照#3的创建时间之间删除的,也就使得文件系统无法恢复至快照#2的创建时间的状态。因此,考虑到这种情况,只要Count大于0,就可以为每个快照分配释放日志。所以,如前所述,考虑到其他场景(例如,回滚快照)也可以使用释放日志中记录的删除记录,可以每个快照分配释放日志。
实现过程中,创建完一个支持快照比对的快照,更新索引管理信息中的Count,即将当前更新索引管理信息中的Coun加1。当然,对于不支持快照比对的快照,则不需要更新该索引管理信息中的Count。
起始索引号(Sindex)和第一个快照的索引号(Findex):起始索引号表示的是子释放日志的索引号,在后续在删除快照以及删除对应的子释放日志过程中,这两个索引号用于回收释放日志,在将子释放日志删除后,最终达到的目的是使得Sindex和Findex相同。
可以这么理解,有了这两个索引号,相当于为需要回收的子释放日志做了标记,可以使得系统在删除快照后,不用马上删除子释放日志,子释放日志有了标记,就可以慢慢回收,两个索引号之间的子释放日志。若是没有这两个索引号,那么,系统在删除快照后,需要马上删除对应的释放日志,若是删除的文件占用的空间较大,会影响用户数据的业务的正常进行。
具体针对如何利用这两个索引号进行子释放日志的删除过程可以参考下文的相关描述。
在申请实施例中,若已经基于创建时间在第一时段(即,该历史快照的创建时间和该当前快照的创建时间之间的时段)内的释放日志确定出该文件系统被删除的文件,为了节省系统空间,可以将该历史快照和该当前快照删除,同时,也可能伴随着释放日志的删除。具体删除快照或释放日志的情况有两种,下面分情况进行说明。
情况1
若是删除的快照不是所有快照中的第一个支持快照比对的快照,则可以仅删除快照,不用删除对应的释放日志。
这是因为,继续以图7为例,快照#1至快照#4都支持快照比对,快照#1是该历史快照,快照#2是中间快照,若删除快照#2对应的子释放日志,后续若需要对快照#2至快照#4进行快照比对,则由于删除了快照#2对应的子释放日志,基于其他业务需求,使得后续无法确定在快照#2的创建时间和快照#3的创建时间之间的时段内的子释放日志,从而导致无法完成针对快照#2和快照#4的比对。
所以,对于该当前快照来说,必然不是第一个支持快照比对的快照,则只需要删除对于该当前快照,不需要删除对应的释放日志,对于该历史快照来说,则需要判断是否为第一个支持快照比对的快照,若不是,则仅删除该历史快照,不删除对应的释放日志,若是,则参考情况2的处理方式删除对应的释放日志。
情况2
可选地,该方法还包括:
若该历史快照为该文件系统中第一个支持快照比对的快照,则删除该历史快照和对应于该历史快照的子释放日志。
具体而言,在这种情况下,需要删除对应的子释放日志,原因如下:
由于在对应于第一支持快照比对的快照的创建时间之前没有能够支持快照比对的快照,将该第一个支持快照比对的快照删除后,后续在对任意两个快照进行快照比对时使用的快照都是在该第一个支持快照比对的快照的创建时间之后的快照,都不会使用该第一个支持快照比对的快照对应的子释放日志,因此,可以将该第一个支持快照比对的快照对应的子释放日志删除。这样,可以进一步节省系统空间。
下面,结合图8至图10对删除释放日志的具体实现过程进行详细说明。
如图8所示,在图8中,快照#1是该历史快照,快照#5是该当前快照,快照#2至快照#4是中间快照,该第一时段为在该快照#1的创建时间与该快照#5的创建时间之间的时段,每个快照对应一个释放日志,其中,黑色加粗的快照表示的是支持快照比对的快照,即快照#1、快照#3和快照#5是支持快照比对的快照,未加粗的快照表示的是支持快照比对的快照,即快照#2和快照#4是不支持快照比对的快照,快照#1或子释放日志#1的索引号是“0”,快照#2或子释放日志#2的索引号是“1”,快照#3或子释放日志#3的索引号是“2”,快照#4或子释放日志#4的索引号是“3”,快照#5或子释放日志#5的索引号是“4”。
右边表格中的信息即为索引管理信息,其中,总索引号是创建快照#5或分配子释放日志#5之后将之前的总索引号“4”加“1”后的总索引号,即,更新后的索引号,总的支持快照比对的快照的索引个数为3(即,快照#1、快照#3和快照#5),起始索引号即为第一个子释放日志即子释放日志#1的索引号,第一个快照的索引号即为快照#1的索引号。
由于该快照#1是支持快照比对的快照,在删除该快照#1的同时,需要删除该子释放日志#1。如图9所示,当将快照#1删除后,第一个快照的索引号会向右移动,即,第一个快照的索引号为快照#2的索引号,由于子释放日志#1还未删除,则起始索引号不变,这样,由于两个索引号不相同,相当于为子释放日志做了标记,后续在删除子释放日志时明确知道需要删除的子释放日志是起始索引号与第一支持快照的索引号之间的子释放日志#1。
当将子释放日志#1删除后,起始索引号与第一支持快照的索引号相同,这时,还需要判断更新后的第一个快照的索引号对应的快照(即,快照#2)是否支持快照比对,若不支 持,则需要继续移动第一个快照索引,继续回收,直到移动之后的快照索引号对应的快照是支持快照比对的快照或子释放日志被回收完。快照#2不支持快照比对,那么,将快照#2以及对应的子释放日志#2都按照上述方式删除,直到两个索引号都更新至至快照#3对应的索引号,由于快照#3是支持快照比对的快照,则停止快照以及子释放日志的删除,如图10所示,更新后的起始索引号以及第一个快照的索引号都相同,且都对应快照#3或子释放日志#3的索引号。
因此,本申请实施例提供的快照比对的方法,一方面,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点,以查找到节点中所包括的文件的更新时间大于该历史快照的创建时间的目标叶子节点,从而基于该目标叶子节点确定在第一时段内文件系统中被修改和/或被新写的文件,相比于现有技术确定该目标文件的过程,有效地节省了快照比对过程中的时间,提高了效率,从而,有助于提高整个快照比对过程中的效率;
另一方面,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点时,可以跳过节点中包括的文件的更新时间小于或等于历史快照的创建时间的跳转节点,仅访问节点中除跳转节点以外的节点,可以减少访问的节点的个数,进一步节省时间,提高快照比对过程中的效率;
另一方面,通过创建时间在历史快照的创建时间和当前快照的创建时间之间的时段(例如,第一时段)内的释放日志,可以快速地确定在该第一时段内文件系统中被删除的数据,进一步节省了快照比对过程的时间,提高了整个快照比对过程中的效率;此外,释放日志中记录的删除记录占用的系统空间较小,节省了系统空间;
再一方面,在释放日志中记录删除记录的过程中,对于在任意时刻(例如,第一时刻)删除的数据(例如,第一数据),若释放日志中存在满足条件的目标删除记录,则更新该目标删除记录,使得该目标删除记录还用于记录该第一数据在未删除前的存储位置,进一步节省了系统空间。
以上,结合图1至图10详细描述了根据本申请实施例的快照比对的方法,下面,结合图11描述根据本申请实施例的快照比对的装置,方法实施例所描述的技术特征同样适用于以下装置实施例。
图11示出了根据本申请实施例的快照比对的装置300的示意性框图。如图11所示,该装置300包括:
获取单元310,用于获取与文件系统的当前快照对应的时间树,该时间树包括根节点层、中间节点层和叶子节点层,该根节点层包括一个根节点,该中间节点层包括至少一层中间节点,该根节点指向该中间节点层中的第一层中间节点,该中间节点层中的最后一层中间节点指向该叶子节点层中的叶子节点,
其中,该叶子节点层中的每个叶子节点包括该文件系统的一个文件的更新时间,该中间节点层中的第一中间节点包括第一中间时间,该第一中间时间包括该第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,该根节点包括该当前快照的创建时间,该第一中间节点是该中间节点层中的任一个中间节点,该更新时间包括该文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间;
查找单元320,用于根据该文件系统的历史快照的创建时间,自从该获取单元310中 获取的时间树中的该根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,该目标叶子节点包括的文件的更新时间大于该历史快照的创建时间;
确定单元330,用于根据该查找单元320中查找的该目标叶子节点,确定目标文件,该目标文件包括在第一时段内该文件系统中被修改和/或被新写的文件,该第一时段位于该历史快照的创建时间和该当前快照的创建时间之间。
因此,本申请实施例提供的快照比对的装置,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点,以查找到节点中所包括的文件的更新时间大于该历史快照的创建时间的目标叶子节点,从而基于该目标叶子节点确定在第一时段内文件系统中被修改和/或被新写的文件(即,目标文件),相比于现有技术确定该目标文件的过程,有效地节省了快照比对过程中的时间,提高了效率,从而,有助于提高整个快照比对过程中的效率。
可选地,该至少部分节点包括该时间树中除跳转节点以外的所有节点,该跳转节点中包括的文件的更新时间小于或等于该历史快照的创建时间。
因此,本申请实施例提供的快照比对的装置,通过在对应于快照的树结构的每个节点中记录文件的更新时间,基于历史快照的创建时间,从当前快照对应的时间树中访问每层节点时,可以跳过节点中包括的文件的更新时间小于或等于历史快照的创建时间的跳转节点,仅访问节点中除跳转节点以外的节点,可以减少访问的节点的个数,进一步节省时间,提高快照比对过程中的效率。
可选地,该确定单元具体330具体用于:
根据该目标叶子节点,确定该目标文件的创建时间;
从该目标文件的创建时间中,将创建时间在该历史快照的创建时间之前的文件确定为该文件系统中被修改的文件,和/或,将创建时间在该历史快照的创建时间之后的文件确定为该文件系统中被新写的文件。
可选地,该确定单元330还用于:
确定创建时间在该第一时段内的释放日志,该释放日志包括删除记录,该删除记录用于记录在该第一时段内该文件系统中被删除的数据在未删除前的存储位置;
根据该释放日志,确定在该第一时段内该文件系统中被删除的数据。
因此,本申请实施例提供的快照比对的装置,通过创建时间在历史快照的创建时间和当前快照的创建时间之间的时段(例如,第一时段)内的释放日志,可以快速地确定在该第一时段内文件系统中被删除的数据,进一步节省了快照比对过程的时间,提高了整个快照比对过程中的效率;此外,释放日志中记录的删除记录占用的系统空间较小,节省了系统空间。
可选地,该装置还包括更新单元340,该更新单元340用于:
在第一时刻删除该文件系统中的第一数据,该第一时刻位于该第一时段之间;
若在该第一时刻的释放日志中已存储的删除记录中包括目标删除记录,则更新该目标删除记录,以使得该目标删除记录还用于记录该第一数据在未删除前的存储位置;
若在该第一时刻的释放日志中已存储的删除记录中不包括目标删除记录,则将用于记录该第一数据在未删除前的存储位置的删除记录写入至该第一时刻的释放日志中;
其中,该目标删除记录对应的数据在未删除前的存储位置与该第一数据在未删除前的 存储位置连续。
因此,本申请实施例提供的快照比对的装置,在释放日志中记录删除记录的过程中,对于在任意时刻(例如,第一时刻)删除的数据(例如,第一数据),若释放日志中存在满足条件的目标删除记录,则更新该目标删除记录,使得该目标删除记录还用于记录该第一数据在未删除前的存储位置,进一步节省了系统空间。
可选地,该释放日志包括N个子释放日志,该N个子释放日志与N个快照中之间存在对应关系,第i个子释放日志与第i个快照对应,该N个快照包括该历史快照和该历史快照与该当前快照之间的N-1个快照,该N个快照是按照时间先后顺序依次创建的,该N为大于或等于1的整数,
该第i个子释放日志包括的删除记录用于记录在中间时段内该文件系统中被删除的数据在未删除前的存储位置,该中间时段位于第i个快照的创建时间和第i+1个快照的创建时间之间,该i∈[1,N],以及,
该确定单元330具体用于:
根据该N个快照和该对应关系,确定该N个子释放日志。
可选地,该对应关系包括第一对应关系和第二对应关系,该第一对应关系包括该N个快照与N个索引号之间的对应关系,该第二对应关系包括该N个索引号和该N个子释放日志之间的对应关系,该N个索引号是基于预设的规则分配的,以及,
该确定单元330具体用于:
根据该历史快照、该N个快照中的第N个快照和第一对应关系,确定对应于该历史快照的索引号和对应于该第N个快照的索引号;
根据该历史快照的索引号、该第N个快照的索引号和该预设的规则,确定该历史快照的索引号与该第N个快照的索引号之间的N-2个索引号;
根据N个索引号和该第二对应关系,确定该释放日志,该N个索引号包括该历史快照的索引号、该第N个快照的索引号和该N-2个索引号。
可选地,该装置还包括更新单元340,该更新单元340用于:
若该历史快照为该文件系统中第一个支持快照比对的快照,则删除该历史快照和对应于该历史快照的子释放日志。
应理解,图11所示的快照比对的装置300可以对应于图1所示的处理器,该装置300中的各个单元的上述和其它操作和/或功能分别为了上述方法实施例中的相应流程,为了简洁,在此不再赘述。
还应理解,该装置300也可以为图1所示的存储设备,此种情况下,该装置300还包括存储器和IO接口,该处理器用于执行该存储器存储的指令,且该处理器可以调用存储器中存储的程序代码,以控制IO接口收发信息或信号,使得该装置300执行上述方法实施例中各个单元的功能、所执行的动作或处理过程。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种用于快照比对的方法,其特征在于,所述方法包括:
    获取与文件系统的当前快照对应的时间树,所述时间树包括根节点层、中间节点层和叶子节点层,所述根节点层包括一个根节点,所述中间节点层包括至少一层中间节点,所述根节点指向所述中间节点层中的第一层中间节点,所述中间节点层中的最后一层中间节点指向所述叶子节点层中的叶子节点,
    其中,所述叶子节点层中的每个叶子节点包括所述文件系统的一个文件的更新时间,所述中间节点层中的第一中间节点包括第一中间时间,所述第一中间时间包括所述第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,所述根节点包括所述当前快照的创建时间,所述第一中间节点是所述中间节点层中的任一个中间节点,所述更新时间包括所述文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间;
    根据所述文件系统的历史快照的创建时间,自所述根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,所述目标叶子节点包括的文件的更新时间大于所述历史快照的创建时间;
    根据所述目标叶子节点,确定目标文件,所述目标文件包括在第一时段内所述文件系统中被修改和/或被新写的文件,所述第一时段位于所述历史快照的创建时间和所述当前快照的创建时间之间。
  2. 根据权利要求1所述的方法,其特征在于,所述至少部分节点包括所述时间树中除跳转节点以外的所有节点,所述跳转节点中包括的文件的更新时间小于或等于所述历史快照的创建时间。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述目标叶子节点,确定目标文件,包括:
    根据所述目标叶子节点,确定所述目标文件的创建时间;
    从所述目标文件的创建时间中,将创建时间在所述历史快照的创建时间之前的文件确定为所述文件系统中被修改的文件,和/或,将创建时间在所述历史快照的创建时间之后的文件确定为所述文件系统中被新写的文件。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    确定创建时间在所述第一时段内的释放日志,所述释放日志包括删除记录,所述删除记录用于记录在所述第一时段内所述文件系统中被删除的数据在未删除前的存储位置;
    根据所述释放日志,确定在所述第一时段内所述文件系统中被删除的数据。
  5. 根据权利要求4所述的方法,其特征在于,在所述确定创建时间在所述第一时段内的释放日志之前,所述方法还包括:
    在第一时刻删除所述文件系统中的第一数据,所述第一时刻位于所述第一时段之间;
    若在所述第一时刻的释放日志中已存储的删除记录中包括目标删除记录,则更新所述目标删除记录,以使得所述目标删除记录还用于记录所述第一数据在未删除前的存储位置;
    若在所述第一时刻的释放日志中已存储的删除记录中不包括目标删除记录,则将用于记录所述第一数据在未删除前的存储位置的删除记录写入至所述第一时刻的释放日志中;
    其中,所述目标删除记录对应的数据在未删除前的存储位置与所述第一数据在未删除前的存储位置连续。
  6. 根据权利要求4或5所述的方法,其特征在于,所述释放日志包括N个子释放日志,所述N个子释放日志与N个快照中之间存在对应关系,第i个子释放日志与第i个快照对应,所述N个快照包括所述历史快照和所述历史快照与所述当前快照之间的N-1个快照,所述N个快照是按照时间先后顺序依次创建的,所述N为大于或等于1的整数,
    所述第i个子释放日志包括的删除记录用于记录在中间时段内所述文件系统中被删除的数据在未删除前的存储位置,所述中间时段位于第i个快照的创建时间和第i+1个快照的创建时间之间,所述i∈[1,N],以及,
    所述确定创建时间在所述第一时段内的释放日志,包括:
    根据所述N个快照和所述对应关系,确定所述N个子释放日志。
  7. 根据权利要求6所述的方法,其特征在于,所述对应关系包括第一对应关系和第二对应关系,所述第一对应关系包括所述N个快照与N个索引号之间的对应关系,所述第二对应关系包括所述N个索引号和所述N个子释放日志之间的对应关系,所述N个索引号是基于预设的规则分配的,以及,
    所述根据所述N个快照和所述对应关系,确定所述N个子释放日志,包括:
    根据所述历史快照、所述N个快照中的第N个快照和第一对应关系,确定对应于所述历史快照的索引号和对应于所述第N个快照的索引号;
    根据所述历史快照的索引号、所述第N个快照的索引号和所述预设的规则,确定所述历史快照的索引号与所述第N个快照的索引号之间的N-2个索引号;
    根据N个索引号和所述第二对应关系,确定所述释放日志,所述N个索引号包括所述历史快照的索引号、所述第N个快照的索引号和所述N-2个索引号。
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:
    若所述历史快照为所述文件系统中第一个支持快照比对的快照,则删除所述历史快照和对应于所述历史快照的子释放日志。
  9. 一种用于快照比对的装置,其特征在于,所述装置包括:
    获取单元,用于获取与文件系统的当前快照对应的时间树,所述时间树包括根节点层、中间节点层和叶子节点层,所述根节点层包括一个根节点,所述中间节点层包括至少一层中间节点,所述根节点指向所述中间节点层中的第一层中间节点,所述中间节点层中的最后一层中间节点指向所述叶子节点层中的叶子节点,
    其中,所述叶子节点层中的每个叶子节点包括所述文件系统的一个文件的更新时间,所述中间节点层中的第一中间节点包括第一中间时间,所述第一中间时间包括所述第一中间节点所指向的下一级节点中每个节点所包括的文件的更新时间中的最后更新时间,所述根节点包括所述当前快照的创建时间,所述第一中间节点是所述中间节点层中的任一个中间节点,所述更新时间包括所述文件系统中被修改的文件的修改时间和/或被新写的文件的写入时间;
    查找单元,用于根据所述文件系统的历史快照的创建时间,自从所述获取单元中获取的时间树中的所述根节点层开始逐层访问每层节点中的至少部分节点所包括的时间,以查找目标叶子节点,所述目标叶子节点包括的文件的更新时间大于所述历史快照的创建时 间;
    确定单元,用于根据所述查找单元中查找的所述目标叶子节点,确定目标文件,所述目标文件包括在第一时段内所述文件系统中被修改和/或被新写的文件,所述第一时段位于所述历史快照的创建时间和所述当前快照的创建时间之间。
  10. 根据权利要求9所述的装置,其特征在于,所述至少部分节点包括所述时间树中除跳转节点以外的所有节点,所述跳转节点中包括的文件的更新时间小于或等于所述历史快照的创建时间。
  11. 根据权利要求9或10所述的装置,其特征在于,所述确定单元具体用于:
    根据所述目标叶子节点,确定所述目标文件的创建时间;
    从所述目标文件的创建时间中,将创建时间在所述历史快照的创建时间之前的文件确定为所述文件系统中被修改的文件,和/或,将创建时间在所述历史快照的创建时间之后的文件确定为所述文件系统中被新写的文件。
  12. 根据权利要求9至11中任一项所述的装置,其特征在于,所述确定单元还用于:
    确定创建时间在所述第一时段内的释放日志,所述释放日志包括删除记录,所述删除记录用于记录在所述第一时段内所述文件系统中被删除的数据在未删除前的存储位置;
    根据所述释放日志,确定在所述第一时段内所述文件系统中被删除的数据。
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括更新单元,所述更新单元用于:
    在第一时刻删除所述文件系统中的第一数据,所述第一时刻位于所述第一时段之间;
    若在所述第一时刻的释放日志中已存储的删除记录中包括目标删除记录,则更新所述目标删除记录,以使得所述目标删除记录还用于记录所述第一数据在未删除前的存储位置;
    若在所述第一时刻的释放日志中已存储的删除记录中不包括目标删除记录,则将用于记录所述第一数据在未删除前的存储位置的删除记录写入至所述第一时刻的释放日志中;
    其中,所述目标删除记录对应的数据在未删除前的存储位置与所述第一数据在未删除前的存储位置连续。
  14. 根据权利要求12或13所述的装置,其特征在于,所述释放日志包括N个子释放日志,所述N个子释放日志与N个快照中之间存在对应关系,第i个子释放日志与第i个快照对应,所述N个快照包括所述历史快照和所述历史快照与所述当前快照之间的N-1个快照,所述N个快照是按照时间先后顺序依次创建的,所述N为大于或等于1的整数,
    所述第i个子释放日志包括的删除记录用于记录在中间时段内所述文件系统中被删除的数据在未删除前的存储位置,所述中间时段位于第i个快照的创建时间和第i+1个快照的创建时间之间,所述i∈[1,N],以及,
    所述确定单元具体用于:
    根据所述N个快照和所述对应关系,确定所述N个子释放日志。
  15. 根据权利要求14所述的装置,其特征在于,所述对应关系包括第一对应关系和第二对应关系,所述第一对应关系包括所述N个快照与N个索引号之间的对应关系,所述第二对应关系包括所述N个索引号和所述N个子释放日志之间的对应关系,所述N个索引号是基于预设的规则分配的,以及,
    所述确定单元具体用于:
    根据所述历史快照、所述N个快照中的第N个快照和第一对应关系,确定对应于所述历史快照的索引号和对应于所述第N个快照的索引号;
    根据所述历史快照的索引号、所述第N个快照的索引号和所述预设的规则,确定所述历史快照的索引号与所述第N个快照的索引号之间的N-2个索引号;
    根据N个索引号和所述第二对应关系,确定所述释放日志,所述N个索引号包括所述历史快照的索引号、所述第N个快照的索引号和所述N-2个索引号。
  16. 根据权利要求14或15所述的装置,其特征在于,所述装置还包括更新单元,所述更新单元用于:
    若所述历史快照为所述文件系统中第一个支持快照比对的快照,则删除所述历史快照和对应于所述历史快照的子释放日志。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1至8中任意一项所述的方法。
PCT/CN2018/087771 2017-11-13 2018-05-22 一种快照比对的方法和装置 WO2019091085A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711112972.X 2017-11-13
CN201711112972.XA CN110018989B (zh) 2017-11-13 2017-11-13 一种快照比对的方法和装置

Publications (1)

Publication Number Publication Date
WO2019091085A1 true WO2019091085A1 (zh) 2019-05-16

Family

ID=66437564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087771 WO2019091085A1 (zh) 2017-11-13 2018-05-22 一种快照比对的方法和装置

Country Status (2)

Country Link
CN (1) CN110018989B (zh)
WO (1) WO2019091085A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078629A (zh) * 2019-12-13 2020-04-28 西安科技大学 一种面向演化图高效历史查询的自适应快照调整方法
CN111767284A (zh) * 2020-06-23 2020-10-13 Oppo(重庆)智能科技有限公司 数据处理方法、装置、存储介质和服务器
WO2024078029A1 (zh) * 2022-10-11 2024-04-18 华为技术有限公司 文件系统管理方法、装置和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362431B (zh) * 2019-07-23 2022-07-05 中国工商银行股份有限公司 一种数据备份方法及装置
CN110912979B (zh) * 2019-11-16 2022-06-10 杭州安恒信息技术股份有限公司 一种解决多服务器资源同步冲突方法
CN111159109A (zh) * 2019-11-26 2020-05-15 陶壮壮 一种磁盘空间占用文件的检测方法及系统
CN114996224B (zh) * 2022-07-01 2022-10-25 浙江大华技术股份有限公司 一种文件信息统计的方法、装置及电子设备
CN115185891B (zh) * 2022-09-14 2023-01-17 联想凌拓科技有限公司 文件系统的数据管理方法及装置、电子设备及存储介质
CN117251434A (zh) * 2023-11-20 2023-12-19 深圳万物安全科技有限公司 数据对比方法、服务器及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010592A1 (en) * 2003-07-08 2005-01-13 John Guthrie Method and system for taking a data snapshot
CN101178677A (zh) * 2007-11-09 2008-05-14 中国科学院计算技术研究所 一种计算机文件系统的快照方法
US20110161381A1 (en) * 2009-12-28 2011-06-30 Wenguang Wang Methods and apparatuses to optimize updates in a file system based on birth time
CN102955808A (zh) * 2011-08-26 2013-03-06 腾讯科技(深圳)有限公司 一种数据获取方法和分布式文件系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180092B (zh) * 2017-05-15 2020-10-23 中国科学院上海微系统与信息技术研究所 一种文件系统的控制方法、装置及终端

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010592A1 (en) * 2003-07-08 2005-01-13 John Guthrie Method and system for taking a data snapshot
CN101178677A (zh) * 2007-11-09 2008-05-14 中国科学院计算技术研究所 一种计算机文件系统的快照方法
US20110161381A1 (en) * 2009-12-28 2011-06-30 Wenguang Wang Methods and apparatuses to optimize updates in a file system based on birth time
CN102955808A (zh) * 2011-08-26 2013-03-06 腾讯科技(深圳)有限公司 一种数据获取方法和分布式文件系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078629A (zh) * 2019-12-13 2020-04-28 西安科技大学 一种面向演化图高效历史查询的自适应快照调整方法
CN111078629B (zh) * 2019-12-13 2023-02-07 西安科技大学 一种面向演化图高效历史查询的自适应快照调整方法
CN111767284A (zh) * 2020-06-23 2020-10-13 Oppo(重庆)智能科技有限公司 数据处理方法、装置、存储介质和服务器
CN111767284B (zh) * 2020-06-23 2023-11-21 Oppo(重庆)智能科技有限公司 数据处理方法、装置、存储介质和服务器
WO2024078029A1 (zh) * 2022-10-11 2024-04-18 华为技术有限公司 文件系统管理方法、装置和存储介质

Also Published As

Publication number Publication date
CN110018989A (zh) 2019-07-16
CN110018989B (zh) 2021-05-18

Similar Documents

Publication Publication Date Title
WO2019091085A1 (zh) 一种快照比对的方法和装置
US11853549B2 (en) Index storage in shingled magnetic recording (SMR) storage system with non-shingled region
US11301379B2 (en) Access request processing method and apparatus, and computer device
US10120869B2 (en) Method and apparatus for fault-tolerant memory management
US9268804B2 (en) Managing a multi-version database
CN110018998B (zh) 一种文件管理方法、系统及电子设备和存储介质
US11580162B2 (en) Key value append
WO2017041654A1 (zh) 用于分布式存储系统的写入数据、获取数据的方法和设备
US20110093437A1 (en) Method and system for generating a space-efficient snapshot or snapclone of logical disks
US11030092B2 (en) Access request processing method and apparatus, and computer system
CN113568582B (zh) 数据管理方法、装置和存储设备
CN110019130B (zh) 一种数据库更新的方法及装置
CN112463058B (zh) 一种碎片数据整理方法、装置及存储节点
JP4159506B2 (ja) 階層記憶装置、その復旧方法、及び復旧プログラム
US20200226060A1 (en) In-place garbage collection of a sharded, replicated distributed state machine based on mergeable operations
US10452496B2 (en) System and method for managing storage transaction requests
CN116257531B (zh) 一种数据库空间回收方法
CN113821476B (zh) 数据处理方法及装置
WO2022140918A1 (en) Method and system for in-memory metadata reduction in cloud storage system
CN113342751B (zh) 元数据处理方法、装置、设备和可读存储介质
US10740015B2 (en) Optimized management of file system metadata within solid state storage devices (SSDs)
CN117389909A (zh) 内存数据淘汰方法和装置
CN114297196A (zh) 元数据存储方法、装置、电子设备及存储介质
CN117951094A (zh) 存储空间的回收方法、文件系统、介质和计算设备
CN111090614A (zh) Rom快照的读取方法、装置和存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18876676

Country of ref document: EP

Kind code of ref document: A1