CN115543918A - File snapshot method, system, electronic equipment and storage medium - Google Patents

File snapshot method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115543918A
CN115543918A CN202211272039.XA CN202211272039A CN115543918A CN 115543918 A CN115543918 A CN 115543918A CN 202211272039 A CN202211272039 A CN 202211272039A CN 115543918 A CN115543918 A CN 115543918A
Authority
CN
China
Prior art keywords
file
data
snapshot
metadata
backup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211272039.XA
Other languages
Chinese (zh)
Inventor
陈勇
王瀚
鲍苏宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN202211272039.XA priority Critical patent/CN115543918A/en
Publication of CN115543918A publication Critical patent/CN115543918A/en
Priority to PCT/CN2023/080695 priority patent/WO2024082525A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file snapshot method, a system, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a data description file, wherein the data description file comprises: metadata and raw data files; backing up the data description file to enable data in the data description file to be classified and backed up into a backup file based on a file tree structure; and snapshotting the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file, thereby realizing the classified management of the data and effectively improving the index efficiency.

Description

File snapshot method, system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and a system for snapshotting a file, an electronic device, and a storage medium.
Background
The file system technology is very common and mature, and the snapshot storage service is completed by using the characteristics of the file system, so that the feasibility is high. The storage mode of a file system and a network file system can be realized, so that the snapshot storage selectivity is higher; and the snapshot storage service can be realized by efficiently utilizing the cheap network file system.
However, the prior art cannot realize classified storage and management of the snapshot files.
Disclosure of Invention
The invention provides a file snapshot method, a system, electronic equipment and a storage medium, which are used for solving the problem that the prior art can not realize snapshot classified storage and management.
In a first aspect, an embodiment of the present disclosure provides a file snapshot method, including:
acquiring a data description file, wherein the data description file comprises: metadata and raw data files;
backing up the data description file to enable data in the data description file to be classified and backed up into a backup file based on a file tree structure;
and snapshotting the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
In a second aspect, an embodiment of the present disclosure provides a file snapshot system, including:
the file acquisition module is used for acquiring a data description file, and the data description file comprises: metadata and raw data files;
the file backup module is used for backing up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;
and the file snapshot module is used for carrying out snapshot on the backup file according to the directory of the backup file so as to move the data in the backup file to the snapshot file.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the file snapshot method provided by the embodiments of the first aspect described above.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and the computer instructions are configured to, when executed, enable a processor to implement the file snapshot method provided in the embodiment of the first aspect.
The embodiment of the invention provides a file snapshot method, a device, equipment and a storage medium, wherein a data description file is obtained, and the data description file comprises the following components: metadata and raw data files; backing up the data description file to enable data in the data description file to be classified and backed up into a backup file based on a file tree structure; and carrying out snapshot on the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file. In the technical scheme, the files are classified and backed up by adopting the file tree structure, so that the data can be classified and managed, and the indexing efficiency is effectively improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a file snapshot method according to an embodiment of the present invention;
fig. 2 is a flowchart of a file snapshot method according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a file tree structure of a file snapshot method according to a second embodiment of the present invention;
fig. 4 is a flowchart of file backup in a file snapshot method according to a second embodiment of the present invention;
fig. 5 is a schematic diagram of a data block structure involved in a file snapshot method according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of a file snapshot system according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and "target" and the like in the description and claims of the invention and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a file snapshot method according to an embodiment of the present invention, where this embodiment is applicable to a case where a file snapshot is processed according to a file tree structure, and the method may be executed by a file snapshot system, where the file snapshot system may be implemented in a form of hardware and/or software.
As shown in fig. 1, the method includes:
s101, acquiring a data description file.
In this embodiment, a snapshot of a file is processed, and a data description file needs to be acquired first. The data description file includes: metadata and raw data files.
Metadata is data that is used to describe the attributes of data, is descriptive information about data and information resources, or may be structured data that is used to provide information about a resource. The system is used for supporting functions such as indicating storage positions, historical data, resource searching, file recording and the like to realize effective discovery, searching, integrated organization and effective management of used resources of information resources.
The original data file may be a business data file that needs to be snapshot, for example, a data file generated by a business system or a data file waiting for snapshot stored in a database.
S102, backing up the data description file, and classifying and backing up the data in the data description file into a backup file based on a file tree structure.
In this embodiment, a backup file for the data description file is obtained, where the backup file may be the data description file stored based on the file tree structure, may be a copy of the data description file, and has the same content as the data description file.
The file tree structure may be a tree structure composed of a plurality of files or a plurality of data, and may be a directory tree structure in the form of a directory. The backup files may include metadata and raw data files.
For example, in the file tree structure, when executing the backup processing service, a folder named "backup file" may be created, and the metadata and the original data files are collected in the folder named "backup file" and represented as the basic backup file. When a specific backup service is performed, the metadata and the original data files may have a large number of original data files.
S103, snapshotting the backup file according to the directory of the backup file, and moving the data in the backup file to the snapshot file.
In this embodiment, a snapshot may be a fully available copy of a backed-up file that includes an image of the corresponding data at some point in time (the point in time at which the copy began). The snapshot may be a copy of the data it represents or may be a replica of the data. For a backup file, a file snapshot is an instant copy of the backup file, contains all information of the backup file at the time of snapshot generation, and is a complete and usable copy.
And snapshotting the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file. The snapshot is based on the backup data, and the snapshot processing and storage are carried out on the backup file based on the file tree structure of the backup data. And when the backup file is snapshot, the data is moved to the snapshot file, and the backup data in the backup file is empty. After the data description file is subjected to file snapshot processing, only original data and snapshot data are saved, backup data do not exist any more, and the backup data can be transition data in the file snapshot process. It should be noted that the backup data may be tampered with or modified, but the snapshot data cannot be modified.
Illustratively, a folder named "snapshot file" may be created when the snapshot handling service is executed. When a snapshot is taken, the associated metadata and data files in the underlying backup storage are moved to the folder.
In this embodiment, a data description file is obtained, the data description file is backed up, data in the data description file is classified and backed up into a backup file based on a file tree structure, the backup file is snapshot according to a directory of the backup file, and the data in the backup file is moved into a snapshot file. In the technical scheme, the files are classified and backed up by adopting the file tree structure, so that the data can be classified and managed, and the indexing efficiency is effectively improved.
As a first optional embodiment of the embodiment, on the basis of the above embodiment, the first optional embodiment further explains metadata in the data description file, wherein the metadata includes:
1) And index metadata used for describing index information of the original data file.
In this alternative embodiment, the index metadata may be a type of descriptive information to better describe which data the snapshot store owns. In the database, the target data is obtained according to the index value of the index metadata, and the index metadata can be used for storing the index value.
Further, in order to perform snapshot storage more quickly, smaller index data is needed, and therefore the index metadata is subjected to two levels of design processing, including first level index metadata and second level index metadata.
The primary index metadata may be used to describe the secondary index metadata.
In this alternative embodiment, the primary index metadata may simply mark by data bits which secondary index description metadata is owned. In particular, the primary index metadata may store secondary index metadata.
Illustratively, the primary index metadata may include 0 and 1, but is not limited to 0 and 1. The index metadata of 0 and 1 may include secondary index metadata, and the secondary index metadata may include a certain amount of data, for example, 100, and may include 0 to 99 secondary index metadata in the primary index metadata 0, and may include 100 to 199 secondary index metadata in the primary index metadata 1.
Secondary index metadata, which may be used to describe the index of the raw data in the raw data file.
In this optional embodiment, the secondary index metadata may actually index into the data file, determine specific data registered by the secondary index metadata, and determine a location where the data is stored through the secondary index. In the data backup and snapshot, the data to be backed up or snapshot processed can be directly determined by the secondary index metadata in the index metadata.
2) And data reference count metadata for describing the number of references to the original data in the original data file.
In this alternative embodiment, the data reference count metadata may be understood as data capable of recording the number of times of reference to the original data in the original data file, and determining whether the original data is used or not by recording the number of times of reference thereto. When the data reference count metadata count is 0, indicating that the corresponding content is no longer used, the content may be cleaned up.
For example, the data stored in the database, once used, may be incremented by 1 for the number of references to the data, i.e., incremented by 1 for the data reference count metadata count, and the data used may include a backup or a snapshot. When the number of data references is 0, it may indicate that the data is not used, which may indicate that the data is not needed, and at this time, the data may be selectively not stored when performing a file backup or snapshot. Similarly, the data is stored in the form of a file, the file may include multiple pieces of data, the file may also be backed up or snapshot, when the file is copied and backed up, the number of file references is increased by 1, and if the number of file references is 0, it may indicate that the file is not used, and at this time, the file may be selectively not stored when the backup or snapshot process is performed.
Specifically, in the file backup or snapshot, the deficit backup may be performed, that is, not all the original data files and data are backed up, but a part of the content may be backed up, where the part of the content may include data of a part other than 0 of the data reference count metadata count. The data counted as 0 part does not need to be backed up.
3) And the file description metadata is used for describing file storage information of the original data file.
In this embodiment, in the file description metadata, the file storage information includes: storage path of original file data, write once read many WORM protection information, and reference times of the original data file.
The storage path of the original file data may be storage directory information of the data file, such as/backup file/original data file/0/100 information.
Write Once Read Many WORM protection information is information about WORM technology, which is Write Once Read Many (WORM) or Immutable storage (Immutable storage) technology, ensuring that written data is kept in a Read-only state, and an authorized user can Read data saved by WORM but cannot modify, delete or overwrite the data, thereby satisfying requirements on data saving and security. Users can create a shared folder in the WORM volume, and all core secret data is stored in the folder and is protected in a centralized mode.
Illustratively, the WORM protection information may include whether WORM protection is employed or a protection mechanism identification number. Whether WORM protection is employed may include a mechanism that may indicate that WORM protection is not employed when the data is represented as 0 and may indicate that WORM protection is employed when the data is represented as 1. The protection mechanism identification number may be used to indicate an identification number of an adopted protection mechanism, and the adopted protection mechanism is obtained by querying according to the protection mechanism identification number. It should be noted that the WORM protection information exists in the file description metadata of the original data file, a folder is created based on the original data file, any data file has the WORM protection information under the folder subdirectory, and a WORM protection mechanism exists. In a file backup or a file snapshot, WORM protection information moves as the original file moves.
For example, when performing a file backup or snapshot, a shared folder may be created in the WORM volume according to the file tree structure, and the files and data in the folder may be WORM protected. If the folder has WORM protection information, the folders that are the lower folders of the folder, i.e., the subdirectory folders, also have WORM protection information. Because the WORM protection mechanism has the property of being not tamper-proof, the data loss caused by virus damage data and misoperation or system crash can be effectively prevented, the metadata loss caused by hacking can be prevented, the snapshot storage can be effectively protected from being immutable, and the security of the file data can be greatly ensured.
The number of references to an original data file may be understood as the number of times the original data file was used, copied, backed up, or snapshotted. The number of references is 0, which may indicate that the original data file has never been referenced, proving that it is not needed, and may be optionally skipped when performing a backup or snapshot.
Example two
Fig. 2 is a flowchart of a file snapshot method according to a second embodiment of the present invention, and fig. 3 is a schematic diagram of a file tree structure of the file snapshot method according to the second embodiment of the present invention. The embodiment is applicable to the case of processing the file snapshot according to the file tree structure, and the method may be executed by a file snapshot system, which may be implemented in the form of hardware and/or software.
In this embodiment, the data description file is backed up in a file tree structure, and the backup file is snapshot according to the directory of the backup file.
As shown in fig. 2, the method includes:
s201, acquiring a data description file.
In this embodiment, the data description file includes metadata and an original data file. The metadata includes index metadata, data reference count metadata, and file description metadata. The index metadata includes primary index metadata and secondary index metadata. The file description metadata is used to describe file storage information including a storage path of original file data, write-once-read-many WORM protection information, and a reference number of the original data file.
S202, creating a backup file subdirectory.
In this embodiment, the backup file subdirectory is created according to a file tree structure. The backup file subdirectory may be named autonomously as needed, which is named "backup file" in this embodiment. The "backup files" subdirectory may include the entire backup content for the data description file.
S203, creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory.
In the present embodiment, a backup metadata subdirectory and a backup file data subdirectory are created under the backup file subdirectory based on the file tree structure. The subdirectory naming rule may be determined according to the requirement, and this embodiment does not limit this, for example, the backup metadata subdirectory for storing metadata under the backup file subdirectory may be named "backup metadata", and the backup file data subdirectory for storing backup files may be named "backup file data".
And S204, backing up the metadata in the data description file to a backup metadata subdirectory.
In this embodiment, the backup metadata subdirectory may include index metadata, data reference count metadata, and file description metadata in the data description file. The index metadata may include primary index metadata and secondary index metadata. The data reference count metadata may include the number of references to the data file, which may be, for example, 0, 1, 2, or n times. The more times of reference, the higher the use requirement of the data.
Illustratively, the backup path for metadata a may appear as: backup file/backup metadata/data reference count metadata/a.
And S205, classifying and backing up the original data contained in the original data file into a backup file data subdirectory according to the metadata in the data description file.
In this embodiment, classifying and backing up the original data included in the original data file under the backup file data subdirectory according to the number of metadata in the data description file may include classifying and backing up the original data whose number of references in the original data file is not zero under the backup file data subdirectory according to the number of metadata. The original data with the reference number different from zero can be regarded as the original data with the use requirement. In consideration of the fact that many original data files may exist, in order to avoid unpredictable results caused by too many data files, the scheme classifies the original data files according to a certain number of files, and stores a certain amount of data files under the data files corresponding to each number under the data subdirectory of the backup files, so as to realize classified backup according to the number.
Illustratively, the backup file data subdirectory may include at most M data file subdirectories, and each data file subdirectory may include at most N original data files, and in this embodiment, M and N are 1024. For example, the original data file contains 2000 original data files, the original data is backed up to a backup file data subdirectory, the backup file data subdirectory can include two data file subdirectories, and the data file subdirectory "0" is used for storing original-data files 0-1023; the data file subdirectory "1" is used to store the original data files 1024 to 1999. In the backup file, at most M × N original data may be backed up, and in this embodiment, the number of the original data may be 1024 × 1024.
For example, the backup path of the original data file may be represented as:/backup file data/0/.
Specifically, fig. 4 is a file backup flowchart in a file snapshot method according to a second embodiment of the present invention.
As shown in fig. 4, when backing up a data file, first, it is checked whether the data file exists. If the file does not exist, a data block file is created, which can be understood as creating a backup file subdirectory to contain backup file data information; if the data block file exists, the modification of the data block file can be understood as the data file information newly added and backed up in the sub directory of the backed-up file.
And then look up if there is a secondary index file (secondary index metadata is stored in the secondary index file). If not, creating a secondary index file; if the secondary index file exists, modifying the secondary index file can be understood as adding secondary index metadata in the secondary index file. The secondary index metadata can be indexed into the data file, determine specific data registered by the data file, and determine the storage position of the data through the secondary index.
And searching whether a primary index file exists or not (primary index metadata is stored in the primary index file). If not, creating a primary index file; if the first-level index file exists, modifying the first-level index file can be understood as adding new first-level index metadata in the first-level index file. Wherein the primary index metadata may simply mark by data bits which secondary index description metadata is owned. In particular, the primary index metadata may store secondary index metadata.
Searching whether a data block reference file (used for storing data reference counting metadata) exists or not, and if not, creating the data block reference file; if the data block reference file exists, modifying the data block reference file can be understood as adding data reference counting metadata in the data block reference file. The data reference count metadata may be understood as data capable of recording whether data in a data file is referenced and the number of references.
Searching whether a file description (used for storing file description metadata) exists or not, and if not, creating the file description; if the file description exists, modifying the file description can be understood as adding file description metadata in the file description. The file description metadata can be used to describe file storage information of the original data file (storage path of original file data, write once read many WORM protection information, and reference times of the original data file).
S206, creating a snapshot file subdirectory when the backup file is snapshot, and generating a global unique identifier.
The snapshot file subdirectory can be regarded as a subdirectory for storing the snapshot file corresponding to the original data file.
In this embodiment, a snapshot file subdirectory is created when a backup file is snapshot, a subdirectory naming rule may be determined according to needs, which is not limited in this embodiment, and for example, the snapshot file subdirectory may be named as a "snapshot file". The "snapshot file" subdirectory may include the entire contents of the backup file. The Universally Unique Identifier (UUID) can enable each folder to have Unique identification information, and is considered as a Unique Identifier in all spaces and time, and the name duplication problem during creation does not need to be considered.
And S207, creating a snapshot metadata subdirectory named by adopting a globally unique identifier under the snapshot file subdirectory.
The snapshot metadata subdirectory can be regarded as a subdirectory for storing snapshot metadata corresponding to backup metadata in the backup file data subdirectory, and the snapshot metadata refers to data obtained by performing snapshot on the backup metadata.
In this embodiment, when taking a snapshot of a backup file, a snapshot metadata subdirectory is created under the snapshot file subdirectory for storing snapshot metadata. The snapshot metadata subdirectory is named using the generated globally unique identifier. It should be noted that the backup data file may take multiple snapshots, that is, the backup data file will generate multiple snapshot data, and therefore, this part of data will be moved to a different snapshot subdirectory named by using the globally unique identifier.
And S208, creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file.
The snapshot file data subdirectory can be regarded as a subdirectory for storing snapshot file data corresponding to backup file data in the backup file data subdirectory, and the snapshot file data refers to file data obtained by performing snapshot on the backup file data.
In this embodiment, when taking a snapshot of a backup file, a snapshot file data subdirectory is created under the snapshot file subdirectory for storing snapshot file data. The name of the snapshot file data subdirectory may be the same as the name of the backup file data subdirectory.
S209, moving the index metadata and the file description metadata of the backup file to the position under the snapshot metadata subdirectory.
In this embodiment, the index metadata may include primary index metadata and secondary index metadata, that is, the snapshot metadata subdirectory may include the primary index metadata, the secondary index metadata, and the file description metadata. And moving the index metadata and the file description metadata of the backup file to the position under the snapshot metadata subdirectory, wherein the index metadata and the file description metadata in the backup file disappear at the moment.
For example, a backup path for a certain metadata may appear as: snapshot file/snapshot metadata/secondary index metadata.
S210, according to the metadata backed up in the backup file, the original data file to be backed up is moved to the data subdirectory of the snapshot file.
In this embodiment, according to the metadata backed up in the backup file, moving the backed up original data file to a sub directory of snapshot file data includes moving the original data file with a non-zero reference number to a sub directory of snapshot file data, where the original data file backed up in the backup file disappears.
It should be noted that, since the backup file data are classified according to the number and backed up under the sub-directory of the backup file data, after the snapshot is performed, the snapshot is still stored under the sub-directory of the snapshot file data according to the number and classified. The files are protected by WORM in the metadata of the data description files, and the original data files backed up under the data subdirectory of the snapshot files are still protected by WORM after being snapshot, so that classified management of the data can be realized, and the snapshot storage can be effectively protected by immutable data.
Illustratively, a snapshot file data subdirectory may include at most N data files, and each data file may include at most M original data, and in this embodiment, N and M are 1024. For example, the original data file comprises 2000 original data files, the original data is snapshotted to a data subdirectory of the snapshot file, the data subdirectory of the snapshot file comprises two data files, and the data file '0' is used for storing original-data files 0-1023; data file "1" is used to store raw data files 1024-1999. In the snapshot file, at most N × M pieces of original data may be snapshot, and in this embodiment, 1024 × 1024 pieces of original data may be snapshot.
Illustratively, the snapshot path of the backed up original data file may be presented as:/snapshot file data/globally unique identifier/0/.
In this embodiment, a data description file is obtained, a backup file subdirectory is created, a backup metadata subdirectory and a backup file data subdirectory are created under the backup file subdirectory, metadata in the data description file is backed up under the backup metadata subdirectory, original data contained in an original data file are classified and backed up under the backup file data subdirectory according to the metadata in the data description file, a snapshot file subdirectory is created when a snapshot is performed on the backup file, a global unique identifier is generated, a snapshot metadata subdirectory named by the global unique identifier is created under the snapshot file subdirectory, a snapshot file data subdirectory is created in the snapshot file according to the backup file data subdirectory of the backup file, index metadata and file description metadata of the backup file are moved under the snapshot metadata subdirectory, and the backed-up original data file is moved under the snapshot file data subdirectory according to the backup metadata in the backup file. In the technical scheme, a unique directory tree structure design is adopted, so that data can be classified and managed, and the immutable data protection can be effectively carried out on snapshot storage. In file backup and snapshot, the original data are classified and stored according to the number, the number of the data stored in one folder is reduced, the condition that the storage error occurs in the file backup process to cause the data from the beginning is effectively avoided, and the storage requirement of the data in the data backup disaster recovery field is met. And the data classification management is realized, and the data storage safety is effectively improved.
As a first optional embodiment of this embodiment, on the basis of the foregoing embodiment, the first optional embodiment further divides an original data file, and fig. 5 is a schematic diagram of a data block structure involved in a file snapshot method according to a second embodiment of the present invention, where specific data in a service is stored in the original data file, and a special data structure design is performed inside the original data file in order to enable quick access and check of the specific data. In this optional embodiment, the original data file is logically divided into regions, and the original data file may include:
1) And the header description area is used for describing the version, the number of data blocks and the check length information of the file system.
In this embodiment, the version of the file system, the version information of which is recorded in each specific file, includes all the metadata files and data files described above. The versions are recorded, and data of different versions can be processed in a distinguishing mode. This information will preferably be stored in the first 32 bytes of the header of each file in the form of a character tag.
2) And the check area is used for recording a check value determined according to the check length information and the data block number.
In this embodiment, the check region length may be calculated by multiplying the unit length (e.g. 4 bytes) of the check value by the number (e.g. 1024) of the data blocks and by dividing the length (e.g. 1M) of each block by the check length (e.g. 64K), and the formula is: 4*1024* (1*1024*1024)/(64*1024). The verification area records the verification value of each data to be verified according to the verification length of the head description area and the number of the data blocks. The check value can be a unique value obtained by a hash algorithm or can be confirmed by other methods. The design of the verification area can verify the data according to the specific importance of the data so as to ensure the correctness of the data.
Illustratively, the data to be stored is subjected to hash algorithm calculation according to a certain length to obtain a check value of the data, and the check value is stored in the check area. The hash algorithm may be, for example, a murmurr hash2 algorithm, which is not limited in this embodiment of the present invention.
According to the importance of the data, the check value can be calculated by the same algorithm from the data acquired from the data file data area under the condition of requirement. And comparing the value with the value in the check area so as to ensure the validity of the data.
3) And the offset area is used for recording the offset of the data block identification.
In this embodiment, the offset area records an offset for locating the data block identifier, and locates to a specific data block through the data block identifier offset.
4) And a data area for storing the original data on a data block basis.
In this embodiment, the data area is additionally stored according to the fixed data size (1 MB) of each block, and the specific number of data blocks may be the number of data blocks mentioned in the header description information, for example, 1024 blocks by default.
In this embodiment, the size of the data file can be controlled by limiting the number of data blocks, so that the data file can obtain the best performance of space and efficiency in a specific snapshot service.
EXAMPLE III
Fig. 6 is a schematic structural diagram of a file snapshot system according to a third embodiment of the present invention. As shown in fig. 6, the system includes:
a file obtaining module 61, configured to obtain a data description file, where the data description file includes: metadata and raw data files;
the file backup module 62 is configured to backup the data description file, so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;
and a file snapshot module 63, configured to snapshot the backup file according to the directory of the backup file, so that the data in the backup file is moved to a snapshot file.
According to the file snapshot system, the files are classified and backed up by adopting the file tree structure, data can be classified and managed, and the index efficiency is effectively improved.
Optionally, the metadata includes:
index metadata used for describing index information of the original data file;
data reference count metadata for describing the number of references to the original data in the original data file;
and the file description metadata is used for describing the file storage information of the original data file.
Optionally, the index metadata includes:
the first-level index metadata is used for describing the second-level index metadata;
and the secondary index metadata is used for describing the index of the original data in the original data file.
Optionally, the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection information and the reference times of the original data file.
Optionally, the file backup module 62 may be specifically configured to:
creating a backup file subdirectory;
creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;
backing up the metadata in the data description file under the backup metadata subdirectory;
and classifying and backing up the original data contained in the original data file into the data subdirectory of the backup file according to the metadata in the data description file.
Optionally, the file snapshot module 63 may be specifically configured to:
creating a snapshot file subdirectory when the backup file is subjected to snapshot, and generating a global unique identifier;
creating a snapshot metadata subdirectory named by the globally unique identifier under the snapshot file subdirectory; creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;
moving the index metadata and the file description metadata of the backup file to be under the snapshot metadata subdirectory;
and moving the backed-up original data file to the data subdirectory of the snapshot file according to the backed-up metadata in the backed-up file.
Optionally, the original data file includes: a header description area, a check area, an offset area, and a data area;
the header description area is used for describing the version, the number of data blocks and the check length information of the file system;
the check area is used for recording a check value determined according to the check length information and the data block number;
the offset area is used for recording the offset of the data block identifier;
the data area is used for storing original data based on data blocks.
The file snapshot system provided by the embodiment of the invention can execute the file snapshot method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
FIG. 7 illustrates a schematic diagram of an electronic device 70 that may be used to implement an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 70 includes at least one processor 71, and a memory communicatively connected to the at least one processor 71, such as a Read Only Memory (ROM) 72, a Random Access Memory (RAM) 73, and the like, wherein the memory stores computer programs executable by the at least one processor, and the processor 71 may perform various appropriate actions and processes according to the computer programs stored in the Read Only Memory (ROM) 72 or the computer programs loaded from the storage unit 78 into the Random Access Memory (RAM) 73. In the RAM 73, various programs and data necessary for the operation of the electronic apparatus 70 can also be stored. The processor 71, the ROM 72, and the RAM 73 are connected to each other by a bus 74. An input/output (I/O) interface 75 is also connected to bus 74.
A number of components in the electronic device 70 are connected to the I/O interface 75, including: an input unit 76 such as a keyboard, a mouse, etc.; an output unit 77 such as various types of displays, speakers, and the like; a storage unit 78, such as a magnetic disk, optical disk, or the like; and a communication unit 79 such as a network card, modem, wireless communication transceiver, etc. The communication unit 79 allows the electronic device 70 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Processor 71 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of processor 71 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 71 performs the various methods and processes described above, such as the file snapshot method.
In some embodiments, the file snapshot method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 78. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 70 via the ROM 72 and/or the communication unit 79. When the computer program is loaded into RAM 73 and executed by processor 71, one or more steps of the file snapshot method described above may be performed. Alternatively, in other embodiments, processor 71 may be configured to perform the file snapshot method by any other suitable means (e.g., by way of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for snapshot of a file, comprising:
acquiring a data description file, wherein the data description file comprises: metadata and raw data files;
backing up the data description file to enable data in the data description file to be classified and backed up into a backup file based on a file tree structure;
and carrying out snapshot on the backup file according to the directory of the backup file, so that the data in the backup file is moved to a snapshot file.
2. The method of claim 1, wherein the metadata comprises:
index metadata used for describing index information of the original data file;
data reference count metadata for describing the number of references to the original data in the original data file;
and the file description metadata is used for describing the file storage information of the original data file.
3. The method of claim 2, wherein the indexing metadata comprises:
the first-level index metadata is used for describing the second-level index metadata;
and the secondary index metadata is used for describing the index of the original data in the original data file.
4. The method of claim 2, wherein the file storage information comprises: the storage path of the original file data, the write-once-read-many WORM protection information and the reference times of the original data file.
5. The method of claim 1, wherein the backing up the data description file to make the data in the data description file be classified and backed up into a backup file based on a directory tree structure comprises:
creating a backup file subdirectory;
creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;
backing up the metadata in the data description file under the backup metadata subdirectory;
and classifying and backing up the original data contained in the original data file to the data subdirectory of the backup file according to the metadata in the data description file.
6. The method according to claim 5, wherein the snapshot of the backup file according to the directory of the backup file, and the moving of the data in the backup file to a snapshot file comprises:
creating a snapshot file subdirectory when the backup file is subjected to snapshot and generating a global unique identifier;
creating a snapshot metadata subdirectory named by the globally unique identifier under the snapshot file subdirectory;
creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;
moving the index metadata and the file description metadata of the backup file to be under the snapshot metadata subdirectory;
and moving the backed-up original data file to the data subdirectory of the snapshot file according to the backed-up metadata in the backed-up file.
7. The method of claim 1, wherein the raw data file comprises: a header description area, a check area, an offset area, and a data area;
the header description area is used for describing the version, the number of data blocks and the check length information of the file system;
the check area is used for recording check values determined according to the check length information and the data block number;
the offset area is used for recording the offset of the data block identifier;
the data area is used for storing original data based on data blocks.
8. A file snapshot system, comprising:
a file obtaining module, configured to obtain a data description file, where the data description file includes: metadata and raw data files;
the file backup module is used for backing up the data description file so as to lead the data in the data description file to be classified and backed up into a backup file based on a file tree structure;
and the file snapshot module is used for carrying out snapshot on the backup file according to the directory of the backup file so as to move the data in the backup file to the snapshot file.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the file snapshot method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the file snapshot method of any one of claims 1-7 when executed.
CN202211272039.XA 2022-10-18 2022-10-18 File snapshot method, system, electronic equipment and storage medium Pending CN115543918A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211272039.XA CN115543918A (en) 2022-10-18 2022-10-18 File snapshot method, system, electronic equipment and storage medium
PCT/CN2023/080695 WO2024082525A1 (en) 2022-10-18 2023-03-10 File snapshot method and system, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211272039.XA CN115543918A (en) 2022-10-18 2022-10-18 File snapshot method, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115543918A true CN115543918A (en) 2022-12-30

Family

ID=84734798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211272039.XA Pending CN115543918A (en) 2022-10-18 2022-10-18 File snapshot method, system, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115543918A (en)
WO (1) WO2024082525A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082525A1 (en) * 2022-10-18 2024-04-25 上海爱数信息技术股份有限公司 File snapshot method and system, electronic device, and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183208B1 (en) * 2010-12-24 2015-11-10 Netapp, Inc. Fileshot management
US8639665B2 (en) * 2012-04-04 2014-01-28 International Business Machines Corporation Hybrid backup and restore of very large file system using metadata image backup and traditional backup
CN103593262B (en) * 2013-11-15 2016-05-25 上海爱数信息技术股份有限公司 A kind of virtual machine backup method based on classification
CN107526840A (en) * 2017-09-14 2017-12-29 郑州云海信息技术有限公司 File system snapshot querying method, device and computer-readable recording medium
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN115543918A (en) * 2022-10-18 2022-12-30 上海爱数信息技术股份有限公司 File snapshot method, system, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082525A1 (en) * 2022-10-18 2024-04-25 上海爱数信息技术股份有限公司 File snapshot method and system, electronic device, and storage medium

Also Published As

Publication number Publication date
WO2024082525A1 (en) 2024-04-25

Similar Documents

Publication Publication Date Title
KR102226257B1 (en) Method and device for writing service data to a blockchain system
US8782011B2 (en) System and method for scalable reference management in a deduplication based storage system
US10911540B1 (en) Recovering snapshots from a cloud snapshot lineage on cloud storage to a storage system
US20210286535A1 (en) Tracking storage capacity usage by snapshot lineages using metadata in a multi-level tree structure
US11977532B2 (en) Log record identification using aggregated log indexes
CN103020255B (en) Classification storage means and device
CN109271343A (en) A kind of data merging method and device applied in key assignments storage system
CN102880714A (en) File deleting method and file deleting device
US10509767B2 (en) Systems and methods for managing snapshots of a file system volume
CN112100182B (en) Data warehouse-in processing method, device and server
CN110888837B (en) Object storage small file merging method and device
US20210286760A1 (en) Managing snapshots stored locally in a storage system and in cloud storage utilizing policy-based snapshot lineages
US11630736B2 (en) Recovering a storage volume associated with a snapshot lineage from cloud storage
US20210286683A1 (en) Pausing and resuming copying of snapshots from a local snapshot lineage to at least one cloud snapshot lineage
US20180075159A1 (en) Efficient property graph storage for streaming / multi-versioning graphs
CN107506150A (en) Distributed storage devices, delete, write again, deleting, read method and system
CN105493080A (en) Method and apparatus for context aware based data de-duplication
US20210286761A1 (en) Generating configuration data enabling remote access to portions of a snapshot lineage copied to cloud storage
CN115543918A (en) File snapshot method, system, electronic equipment and storage medium
CN112783447A (en) Method, apparatus, device, medium, and article of manufacture for processing snapshots
CN114518848B (en) Method, device, equipment and medium for processing stored data
US9367573B1 (en) Methods and apparatus for archiving system having enhanced processing efficiency
CN113076086B (en) Metadata management system and method for modeling model object using the same
US11226739B2 (en) Method, device and computer program product for storage management
CN112269677A (en) Rollback operation device, method, equipment and medium under heterogeneous cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination