WO2024082525A1 - File snapshot method and system, electronic device, and storage medium - Google Patents

File snapshot method and system, electronic device, and storage medium Download PDF

Info

Publication number
WO2024082525A1
WO2024082525A1 PCT/CN2023/080695 CN2023080695W WO2024082525A1 WO 2024082525 A1 WO2024082525 A1 WO 2024082525A1 CN 2023080695 W CN2023080695 W CN 2023080695W WO 2024082525 A1 WO2024082525 A1 WO 2024082525A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
data
snapshot
backup
metadata
Prior art date
Application number
PCT/CN2023/080695
Other languages
French (fr)
Chinese (zh)
Inventor
陈勇
王瀚
鲍苏宁
Original Assignee
上海爱数信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海爱数信息技术股份有限公司 filed Critical 上海爱数信息技术股份有限公司
Publication of WO2024082525A1 publication Critical patent/WO2024082525A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Definitions

  • the present application relates to the field of computer technology, and in particular to a file snapshot method, system, electronic device, and storage medium.
  • File system technology is very common and mature, and it is highly feasible to use the characteristics of the file system to complete the snapshot storage business. Not only can the storage methods of the file system and the network file system be realized, making snapshot storage more selective; but also the snapshot storage business can be realized by efficiently using the cheap network file system.
  • the present application provides a file snapshot method, system, electronic device and storage medium to avoid the situation in the related art where snapshot classification storage and management cannot be implemented, and file snapshots are performed based on a directory tree structure to implement classification storage and management of snapshot files.
  • an embodiment of the present disclosure provides a file snapshot method, including:
  • an embodiment of the present disclosure provides a file snapshot system, including:
  • a file acquisition module configured to acquire a data description file, the data description file includes: metadata and original data files;
  • a file backup module configured to back up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;
  • the file snapshot module is configured to take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
  • an electronic device including:
  • the memory stores a computer program that can be executed by at least one processor, and the computer program is executed by at least one processor so that the at least one processor can execute the file snapshot method provided by the above-mentioned first aspect embodiment.
  • an embodiment of the present disclosure provides a computer-readable storage medium storing Computer instructions, where the computer instructions are used to enable a processor to implement the file snapshot method provided in the first aspect embodiment when executed.
  • FIG1 is a flow chart of a file snapshot method provided by an embodiment of the present application.
  • FIG2 is a flowchart of a file snapshot method provided by another embodiment of the present application.
  • FIG3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application.
  • FIG4 is a flowchart of a file backup method in a file snapshot method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a file snapshot system provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • FIG1 is a flowchart of a file snapshot method provided by an embodiment of the present application.
  • the present embodiment is applicable to situations where file snapshot processing is performed according to a file tree structure.
  • the method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.
  • the method includes:
  • a data description file which includes metadata and original data files.
  • Metadata is data used to describe data attributes. It is descriptive information about data and information resources, or it can be structural data used to provide information about a certain resource. It is used to support functions such as indicating storage locations, historical data, resource search, and file records, and to achieve effective discovery, search, integrated organization of information resources, and effective management of used resources.
  • the original data file may be a business data file that needs to be snapshotted, for example, a data file generated by a business system or a data file stored in a database and waiting for a snapshot.
  • a backup file of the data description file is obtained.
  • the backup file may be a data description file stored based on a file tree structure, or may be a copy of the data description file, and the content of the backup file is the same as that of the data description file.
  • the file tree structure can be a tree structure composed of multiple files or multiple data, or a directory tree structure in the form of a directory.
  • the backup file can include metadata and original data files.
  • a folder named "backup file” when performing a backup processing business, a folder named "backup file” can be created, and metadata and original data files can be grouped into the "backup file” folder to represent the basic backup file.
  • the metadata and original data files can have a large number of original data files.
  • S103 Take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
  • a snapshot can be a completely usable copy of a backup file, which includes an image of the corresponding data at a certain point in time (the time when the copy starts).
  • a snapshot can be a copy of the data it represents, or a replica of the data.
  • a file snapshot is an instant copy of the backup file, which contains all the information of the backup file at the time the snapshot is generated, and is also a completely usable copy.
  • the backup file is snapshotted according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
  • the snapshot is based on the snapshot of the backup data, and the backup file is snapshotted and stored based on the file tree structure of the backup data.
  • the backup file is snapshotted, the data is moved to the snapshot file, and the backup data in the backup file is cleared.
  • the backup data can be a transitional data in the file snapshot process. It should be noted that the backup data can be tampered with or modified, but the snapshot data cannot be modified.
  • a folder named "snapshot file” may be created.
  • the metadata and data files related to the basic backup storage are moved to the folder.
  • a data description file is obtained, the data description file is backed up, the data in the data description file is classified and backed up in the backup file based on the file tree structure, a snapshot is taken of the backup file according to the directory of the backup file, and the data in the backup file is moved to the snapshot file.
  • the file tree structure is used to classify and back up files, which can realize the classification management of data and effectively improve the indexing efficiency.
  • this exemplary embodiment interprets the metadata in the data description file, wherein the metadata includes:
  • Index metadata which is used to describe the index information of the original data file.
  • the index metadata may be a kind of description information for better describing what data the snapshot storage has.
  • the target data is obtained according to the index value of the index metadata, and the index metadata may be used to store the index value.
  • index metadata is designed and processed at two levels, including primary index metadata and secondary index metadata.
  • the primary index metadata can be used to describe the secondary index metadata.
  • the primary index metadata may only use data bits to mark which secondary index description metadata it has.
  • the primary index metadata may store secondary index metadata.
  • the primary index metadata may include 0 and 1, but is not limited to 0 and 1.
  • 0 and 1 may include secondary index metadata
  • the secondary index metadata may include a certain amount of data, such as 100.
  • secondary index metadata 0-99 may be included, and under the primary index metadata 1, secondary index metadata 100-199 may be included. This embodiment does not limit the specific forms of the primary index metadata and the secondary index metadata.
  • Secondary index metadata can be used to describe the index of the original data in the original data file.
  • the secondary index metadata can be truly indexed into the data file to determine the specific data stored therein, and the location of the data storage can be determined through the secondary index.
  • data backup and snapshot the data to be backed up or snapshotted can be directly determined through the secondary index metadata in the index metadata.
  • Data reference count metadata which is used to describe the number of references to the original data in the original data file.
  • the data reference count metadata can be understood as data that can record the number of times the original data in the original data file is referenced, and whether the original data has been used is determined by recording its number of references.
  • the data reference count metadata count is 0, it means that the corresponding content is no longer used, and the content can be cleaned up.
  • the number of references to the data will increase by 1, that is, the data reference count metadata count will increase by 1, and the use of the data may include backup or snapshot.
  • the number of data references is 0, it can be said that the data has not been used, which can indicate that the data is not needed. In this case, the data can be selectively not stored when performing file backup or snapshot.
  • data is stored in the form of a file, which can include multiple pieces of data, and the file may also be backed up or snapshotted. When the file is copied and backed up, the number of file references increases by 1. If the number of file references is 0, it can be said that the file has not been used. In this case, the file can be selectively not stored when performing backup or snapshot processing.
  • differential backup can be performed, that is, not all original data files and data are backed up, but part of them can be backed up, and the part of them can include data with non-zero data reference counts and metadata counts. Data with zero counts does not need to be backed up.
  • File description metadata which is used to describe the file storage information of the original data file.
  • the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file.
  • the storage path of the original file data may be storage directory information of the data file, such as /backup file/original data file/0/100 and the like.
  • WORM protection information is about WORM technology.
  • WORM technology is Write Once Read Many (WORM) or immutable storage technology, which ensures that written data remains in a read-only state.
  • Authorized users can read data stored in WORM, but cannot modify, delete, or overwrite the data, thus meeting the requirements of data preservation and security. Users can create a shared folder on a WORM volume, and all core confidential data is stored in this folder for centralized protection.
  • the WORM protection information may include whether WORM protection is adopted or a protection mechanism identification number. Whether WORM protection is adopted may include that when the data is represented as 0, it can indicate that WORM protection is not adopted, and when the data is represented as 1, it can indicate that a WORM protection mechanism is adopted.
  • the protection mechanism identification number can be used to indicate the identification number of the adopted protection mechanism, and the adopted protection mechanism can be obtained by querying the protection mechanism identification number.
  • the WORM protection information exists in the file description metadata of the original data file. A folder is created based on the original data file, and the subfolders in the folder are stored in the subfolders. In the directory, any data file has WORM protection information and a WORM protection mechanism exists. In file backup or file snapshot, the WORM protection information moves along with the original file.
  • a shared folder when performing file backup or snapshot, can be created in a WORM volume according to the file tree structure, and the files and data in the folder can be protected by WORM. If the folder has WORM protection information, then the folder's subordinate folders, that is, the sub-directory folders, also have WORM protection information. Since the WORM protection mechanism is tamper-proof, it can effectively prevent viruses from destroying data and data loss caused by misoperation or system crashes, prevent metadata loss caused by hacker attacks, and effectively protect snapshot storage from immutable data, greatly ensuring the security of file data.
  • the number of times an original data file is referenced can be understood as the number of times the original data file is used, copied, backed up, or snapshotted. If the number of references is 0, it means that the original data file has never been referenced, proving that it is not needed and can be selectively skipped when backing up or taking snapshots.
  • Fig. 2 is a flow chart of a file snapshot method provided by another embodiment of the present application.
  • Fig. 3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application. This embodiment is applicable to the case where file snapshot processing is performed according to the file tree structure.
  • the method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.
  • the data description file is backed up in a file tree structure, and a snapshot of the backup file is taken according to the directory of the backup file for further description.
  • the method includes:
  • the data description file includes metadata and the original data file.
  • the metadata includes index metadata, data reference count metadata, and file description metadata.
  • the index metadata includes primary index metadata and secondary index metadata.
  • the file description metadata is used to describe the file storage information including the storage path of the original file data, write-once-read-many WORM protection information, and the number of references to the original data file.
  • a backup file subdirectory is created according to the file tree structure.
  • the backup file subdirectory can be named according to the needs, and in this embodiment, it is named "backup file”.
  • the "backup file” subdirectory can include all backup contents of the data description file.
  • S203 Create a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory.
  • a backup metadata subdirectory and a backup file data subdirectory are created under the backup file subdirectory.
  • the subdirectory naming rule can be determined according to the requirements, and this embodiment does not limit this.
  • the backup metadata subdirectory for storing metadata under the backup file subdirectory can be named "backup metadata”
  • the backup file data subdirectory for storing backup files can be named "backup file data”.
  • the backup metadata subdirectory may include index metadata, data reference count metadata, and file description metadata in the data description file.
  • the index metadata may include primary index metadata and secondary index metadata.
  • the data reference count metadata may include the number of references to the data file, for example, 0, 1, 2, or n times. The more references there are, the higher the demand for the data to be used.
  • the backup path of metadata A may be presented as: /backup file/backup metadata/data reference count metadata/A.
  • the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
  • the original data contained in the original data file is classified by quantity and backed up to the backup file data subdirectory, which may include classifying and backing up the original data in the original data file whose number of references is not zero and backing up to the backup file data subdirectory by quantity.
  • the original data whose number of references is not zero can be considered as the original data that is required to be used.
  • this solution classifies the original data files according to a certain number of files, and stores a certain amount of data files under each data file corresponding to a number in the backup file data subdirectory, so as to achieve classified backup by quantity.
  • the backup file data subdirectory may include at most M data file subdirectories, and each data file subdirectory may include at most N original data files.
  • M and N are 1024.
  • the original data file contains 2000 original data files, and the original data is backed up to the backup file data subdirectory.
  • the backup file data subdirectory may include two data file subdirectories, data file subdirectory "0" is used to store original data files 0 to 1023; data file subdirectory "1" is used to store original data files 1024 to 1999.
  • at most M*N original data can be backed up, which can be 1024*1024 original data in this embodiment.
  • the backup path of the original data file may be presented as: /backup file/backup file data/0/*.
  • FIG4 is a flowchart of a file backup in a file snapshot method provided in an embodiment of the present application.
  • the secondary index metadata is stored in the secondary index file. If the secondary index file does not exist, it is created; if the secondary index file exists, it is modified, which can be understood as adding secondary index metadata to the secondary index file.
  • the secondary index metadata can be indexed into the data file to determine the specific data stored therein, and the location of data storage can be determined through the secondary index.
  • first-level index metadata is stored in the first-level index file. If there is no first-level index file, create a first-level index file; if there is a first-level index file, modify the first-level index file, which can be understood as adding first-level index metadata to the first-level index file.
  • the first-level index metadata can only use data bits to mark which second-level index description metadata it has.
  • the first-level index metadata can store the second-level index metadata.
  • data block reference file used to store data reference count metadata. If there is no data block reference file, create a data block reference file; if there is a data block reference file, modify the data block reference file, which can be understood as adding data reference count metadata to the data block reference file.
  • data reference count metadata can be understood as data that can record whether the data in the data file is referenced and the number of references.
  • the file description metadata can be used to describe the file storage information of the original data file (the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file).
  • the snapshot file subdirectory may be considered as a subdirectory for storing snapshot files corresponding to the original data files.
  • a snapshot file subdirectory is created when a snapshot of a backup file is taken.
  • the subdirectory naming rule can be determined according to the needs, and this embodiment does not limit this.
  • the snapshot file subdirectory can be named "snapshot file”.
  • the "snapshot file” subdirectory can include all the contents of the backup file.
  • UUID Universally Unique Identifier
  • S207 Create a snapshot metadata subdirectory named with a globally unique identifier under the snapshot file subdirectory.
  • the snapshot metadata subdirectory may be considered as a subdirectory for storing snapshot metadata corresponding to the backup metadata under the backup file data subdirectory.
  • the snapshot metadata refers to data obtained by taking a snapshot of the backup metadata.
  • a snapshot metadata subdirectory is created under the snapshot file subdirectory to store the snapshot metadata.
  • the snapshot metadata subdirectory is named using the generated globally unique identifier. It should be noted that the backup data file may be snapshotted multiple times, that is, the backup data file will generate multiple snapshot data, and therefore, this part of the data will be moved to different snapshot subdirectories named using globally unique identifiers.
  • the snapshot file data subdirectory may be considered as a subdirectory for storing snapshot file data corresponding to the backup file data under the backup file data subdirectory, and the snapshot file data refers to file data obtained by taking a snapshot of the backup file data.
  • a snapshot file data subdirectory is created under the snapshot file subdirectory for storing snapshot file data.
  • the snapshot file data subdirectory may be named the same as the backup file data subdirectory.
  • S209 Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory.
  • the index metadata may include primary index metadata and secondary index metadata
  • the snapshot metadata subdirectory may include primary index metadata, secondary index metadata and file description metadata.
  • the index metadata and file description metadata of the backup file are moved to the snapshot metadata subdirectory, and the index metadata and file description metadata in the backup file disappear.
  • the backup path of a certain metadata may be presented as: /snapshot file/snapshot metadata/secondary index metadata.
  • the backed up original data files are moved to the snapshot file data subdirectory, including moving the original data files whose reference counts are not zero to the snapshot file data subdirectory. At this time, the backed up original data files in the backup file disappear.
  • the backup file data in the backup file data subdirectory is classified by quantity and backed up, after the snapshot, it is still classified by quantity and stored in the snapshot file data subdirectory. If the file is set to be WORM-protected in the metadata of the data description file, the original data files backed up in the snapshot file data subdirectory are still WORM-protected after the snapshot, thereby achieving both classification management of data and effective immutable data protection of snapshot storage.
  • the snapshot file data subdirectory may include at most N data files, and each data file may include at most M original data.
  • N and M are 1024.
  • the original data file contains 2000 original data files, and the original data is snapshotted to the snapshot file data subdirectory.
  • the snapshot file data subdirectory includes two data files, data file "0" is used to store original data files 0 to 1023; data file "1" is used to store original data files 1024 to 1999.
  • at most N*M original data can be snapshotted, which can be 1024*1024 original data in this embodiment.
  • the snapshot path of the backed-up original data file may be presented as: /snapshot file/snapshot file data/globally unique identifier/0/*.
  • a unique directory tree structure design is adopted, which can not only classify and manage data, but also effectively protect snapshot storage from immutable data.
  • file backup and snapshot the original data is classified and stored by quantity, reducing the amount of data stored in a folder, effectively avoiding the situation where storage errors occur during file backup and cause starting over, and realizing the storage requirements of data in the field of data backup and disaster recovery. It also realizes the classification management of data, effectively improving the security of data storage.
  • FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application.
  • the original data file stores specific data in the business.
  • a special data structure is designed inside the original data file.
  • the original data file is logically divided into regions.
  • the original data file may include:
  • Header description area used to describe the file system version, number of data blocks and checksum length information.
  • the version of the file system is recorded in each specific file, and the file includes all the metadata files and data files mentioned above.
  • the file includes all the metadata files and data files mentioned above.
  • a check area used to record a check value determined based on the check length information and the number of data blocks.
  • the length of the check area can be calculated by multiplying the check value unit length (for example, 4 bytes) by the number of data blocks (for example, The calculation is: 4*1024*(1*1024*1024)/(64*1024) multiplied by the length of each block (e.g. 1024) divided by the checksum length (e.g. 64K).
  • the checksum area will record the checksum value of each data to be checked according to the checksum length of the header description area and the number of data blocks.
  • the checksum value can be a unique value obtained by the hash algorithm, or it can be confirmed by other methods.
  • the design of the checksum area can verify the data according to the specific importance of the data to ensure the correctness of the data.
  • the data to be stored is subjected to a hash algorithm calculation according to a certain length to obtain a check value of the data, and the check value is stored in the check area.
  • the hash algorithm may be, for example, the MurmurHash2 algorithm, which is not limited in the embodiment of the present application.
  • the same algorithm can be used to calculate the check value obtained from the data area of the data file. This value is compared with the value in the check area to ensure the validity of the data.
  • Offset area used to record the offset of the positioning data block identifier.
  • the offset area records the offset of the positioning data block identifier, and the specific data block is positioned through the data block identifier offset.
  • Data area used to store original data based on data blocks.
  • the data area performs additional storage according to a fixed data size (1MB) of each data block
  • the specific number of data blocks may be the number of data blocks mentioned in the header description information, for example, it may be 1024 blocks by default.
  • the size of the data file can be controlled so that the best space and efficiency performance can be obtained in a specific snapshot service.
  • FIG6 is a schematic diagram of the structure of a file snapshot system provided in an embodiment of the present application. As shown in FIG6 , the system includes:
  • a file acquisition module 61 is configured to acquire a data description file, wherein the data description file includes metadata and an original data file;
  • a file backup module 62 configured to back up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;
  • the file snapshot module 63 is configured to take a snapshot of the backup file according to the directory of the backup file, so as to move the data in the backup file to the snapshot file.
  • the file snapshot system adopted in this technical solution uses a file tree structure to classify and back up files, which can realize classified management of data and effectively improve indexing efficiency.
  • the metadata includes:
  • Index metadata configured to describe index information of the original data file
  • Data reference count metadata which is set to describe the number of references to the original data in the original data file
  • the file description metadata is configured to describe the file storage information of the original data file.
  • the index metadata includes:
  • Primary index metadata set to describe the secondary index metadata
  • the secondary index metadata is set to describe the index of the original data in the original data file.
  • the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection Information and number of citations of the original data file.
  • the file backup module 62 may be configured as follows:
  • the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
  • the file snapshot module 63 may be configured as follows:
  • a snapshot file subdirectory is created, and a globally unique identifier is generated;
  • the backed up original data file is moved to the snapshot file data subdirectory.
  • the original data file includes: a header description area, a check area, an offset area and a data area;
  • the header description area is used to describe the version of the file system, the number of data blocks and the check length information
  • the check area is used to record a check value determined according to the check length information and the number of data blocks;
  • the offset area is used to record the offset of the positioning data block identifier
  • the data area is used to store original data based on data blocks.
  • the file snapshot system provided in the embodiments of the present application can execute the file snapshot method provided in any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.
  • Fig. 7 shows a block diagram of an electronic device 70 that can be used to implement an embodiment of the present application.
  • the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • the electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or required herein.
  • the electronic device 70 includes at least one processor 71, and a memory connected to the at least one processor 71 in communication, such as a read-only memory (ROM) 72, a random access memory (RAM) 73, etc., wherein the memory stores a computer program that can be executed by at least one processor, and the processor 71 can perform a variety of appropriate actions and processes according to the computer program stored in the read-only memory (ROM) 72 or the computer program loaded from the storage unit 78 to the random access memory (RAM) 73. In the RAM 73, a variety of programs and data required for the operation of the electronic device 70 can also be stored.
  • the processor 71, ROM 72, and RAM 73 are connected to each other via a bus 74.
  • An input/output (I/O) interface 75 is also connected to the bus 74.
  • a number of components in the electronic device 70 are connected to the I/O interface 75, including: an input unit 76, such as a keyboard, a mouse, etc.; Output unit 77, such as various types of displays, speakers, etc.; storage unit 78, such as magnetic disks, optical disks, etc.; and communication unit 79, such as network cards, modems, wireless communication transceivers, etc.
  • the communication unit 79 allows the electronic device 70 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the processor 71 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 71 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a variety of special artificial intelligence (AI) computing chips, a variety of processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the processor 71 performs the multiple methods and processes described above, such as the file snapshot method.
  • the file snapshot method may be implemented as a computer program, which is tangibly contained in a computer-readable storage medium, such as a storage unit 78.
  • part or all of the computer program may be loaded and/or installed on the electronic device 70 via the ROM 72 and/or the communication unit 79.
  • the processor 71 may be configured to perform the file snapshot method in any other suitable manner (e.g., by means of firmware).
  • Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chips
  • CPLDs load programmable logic devices
  • These various embodiments can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • a programmable processor which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • the computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that when the computer programs are executed by the processor, the functions/operations specified in the flow charts and/or block diagrams are implemented.
  • the computer programs may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.
  • a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, device, or apparatus.
  • a computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing.
  • a computer readable storage medium may be a machine readable signal medium.
  • machine readable storage media may include electrical connections based on one or more lines, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer readable storage medium may be a non-transitory computer readable medium.
  • a readable storage medium may be a non-transitory computer readable medium.
  • the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or trackball
  • Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).
  • the systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components.
  • the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), a blockchain network, and the Internet.
  • a computing system may include a client and a server.
  • the client and the server are generally remote from each other and usually interact through a communication network.
  • the client and server relationship is generated by computer programs running on the corresponding computers and having a client-server relationship with each other.
  • the server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to avoid the management difficulty and weak business scalability of physical hosts and VPS services in related technologies.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present application are a file snapshot method and system, an electronic device, and a storage medium. The method comprises: acquiring a data description file, the data description file comprising metadata and a raw data file; backing up the data description file, so that data in the data description file is classified and backed up into a backup file on the basis of a file tree structure; and taking a snapshot of the backup file according to the directory of the backup file, so as to move data in the backup file into a snapshot file.

Description

一种文件快照方法、系统、电子设备及存储介质File snapshot method, system, electronic device and storage medium
本申请要求在2022年10月18日提交中国专利局、申请号为202211272039.X的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on October 18, 2022, with application number 202211272039.X. The entire contents of this application are incorporated by reference into this application.
技术领域Technical Field
本申请涉及计算机技术领域,例如涉及一种文件快照方法、系统、电子设备及存储介质。The present application relates to the field of computer technology, and in particular to a file snapshot method, system, electronic device, and storage medium.
背景技术Background technique
文件系统技术非常的普遍,文件系统技术也非常的成熟,利用文件系统的特性来完成快照存储业务具有很高的可行性。不但可以实现文件系统和网络文件系统两种方式的存储方式,让快照存储的选择性更多;而且能够高效的利用廉价的网络文件系统实现快照存储业务。File system technology is very common and mature, and it is highly feasible to use the characteristics of the file system to complete the snapshot storage business. Not only can the storage methods of the file system and the network file system be realized, making snapshot storage more selective; but also the snapshot storage business can be realized by efficiently using the cheap network file system.
但是,相关技术中无法实现对快照文件进行归类存储和管理。However, the related art cannot implement classification, storage and management of snapshot files.
发明内容Summary of the invention
本申请提供了一种文件快照方法、系统、电子设备及存储介质,以避免相关技术中无法实现快照归类存储和管理的情况,基于目录树结构进行文件快照,实现对快照文件的归类存储和管理。The present application provides a file snapshot method, system, electronic device and storage medium to avoid the situation in the related art where snapshot classification storage and management cannot be implemented, and file snapshots are performed based on a directory tree structure to implement classification storage and management of snapshot files.
第一方面,本公开实施例提供了一种文件快照方法,包括:In a first aspect, an embodiment of the present disclosure provides a file snapshot method, including:
获取数据描述文件,数据描述文件包括:元数据和原始数据文件;Obtain a data description file, which includes metadata and original data files;
对数据描述文件进行备份,使数据描述文件中的数据基于文件树结构归类备份到备份文件中;Backing up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;
按照备份文件的目录对备份文件进行快照,使备份文件中的数据移动至快照文件中。Take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
第二方面,本公开实施例提供了一种文件快照系统,包括:In a second aspect, an embodiment of the present disclosure provides a file snapshot system, including:
文件获取模块,设置为获取数据描述文件,数据描述文件包括:元数据和原始数据文件;A file acquisition module, configured to acquire a data description file, the data description file includes: metadata and original data files;
文件备份模块,设置为对数据描述文件进行备份,使数据描述文件中的数据基于文件树结构归类备份到备份文件中;A file backup module, configured to back up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;
文件快照模块,设置为按照备份文件的目录对备份文件进行快照,使备份文件中的数据移动至快照文件中。The file snapshot module is configured to take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
第三方面,本公开实施例提供了一种电子设备,包括:In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
至少一个处理器;以及at least one processor; and
与至少一个处理器通信连接的存储器;其中,a memory communicatively connected to at least one processor; wherein,
存储器存储有可被至少一个处理器执行的计算机程序,计算机程序被至少一个处理器执行,以使至少一个处理器能够执行上述第一方面实施例提供的文件快照方法。The memory stores a computer program that can be executed by at least one processor, and the computer program is executed by at least one processor so that the at least one processor can execute the file snapshot method provided by the above-mentioned first aspect embodiment.
第四方面,本公开实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有 计算机指令,计算机指令用于使处理器执行时实现上述第一方面实施例提供的文件快照方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium storing Computer instructions, where the computer instructions are used to enable a processor to implement the file snapshot method provided in the first aspect embodiment when executed.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本申请一实施例提供的一种文件快照方法的流程图;FIG1 is a flow chart of a file snapshot method provided by an embodiment of the present application;
图2是本申请另一实施例提供的一种文件快照方法的流程图;FIG2 is a flowchart of a file snapshot method provided by another embodiment of the present application;
图3是本申请一实施例提供的一种文件快照方法的文件树结构示意图;FIG3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application;
图4是本申请一实施例提供的一种文件快照方法中的文件备份流程图;FIG4 is a flowchart of a file backup method in a file snapshot method provided in an embodiment of the present application;
图5是本申请一实施例提供的一种文件快照方法中所涉及的数据块结构示意图;FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application;
图6是本申请一实施例提供的一种文件快照系统的结构示意图;FIG6 is a schematic diagram of the structure of a file snapshot system provided by an embodiment of the present application;
图7是本申请一实施例提供的一种电子设备的结构示意图。FIG. 7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
具体实施方式Detailed ways
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“目标”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and "target" in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.
图1是本申请一实施例提供的一种文件快照方法的流程图,本实施例可适用于按文件树结构进行文件快照处理的情形,该方法可以由文件快照系统来执行,该文件快照系统可以采用硬件和/或软件的形式实现。FIG1 is a flowchart of a file snapshot method provided by an embodiment of the present application. The present embodiment is applicable to situations where file snapshot processing is performed according to a file tree structure. The method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.
如图1所示,该方法包括:As shown in FIG1 , the method includes:
S101、获取数据描述文件。S101. Obtain a data description file.
在本实施例中,对文件进行快照处理,首先需要获取数据描述文件。数据描述文件包括:元数据和原始数据文件。In this embodiment, to perform snapshot processing on a file, it is first necessary to obtain a data description file, which includes metadata and original data files.
元数据是用于描述数据属性的数据,是对数据及信息资源的描述性信息,或者可以是用于提供某种资源的有关信息的结构数据。用来支持如指示存储位置、历史数据、资源查找、文件记录等功能,实现信息资源的有效发现、查找、一体化组织和对使用资源的有效管理。Metadata is data used to describe data attributes. It is descriptive information about data and information resources, or it can be structural data used to provide information about a certain resource. It is used to support functions such as indicating storage locations, historical data, resource search, and file records, and to achieve effective discovery, search, integrated organization of information resources, and effective management of used resources.
原始数据文件可以是需要进行快照的业务数据文件,例如可以是业务系统所产生的数据文件或者数据库中所存储的等待快照的数据文件。The original data file may be a business data file that needs to be snapshotted, for example, a data file generated by a business system or a data file stored in a database and waiting for a snapshot.
S102、对数据描述文件进行备份,使数据描述文件中的数据基于文件树结构归类备份到备份文件中。S102: Back up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure.
在本实施例中,获取对数据描述文件的备份文件,备份文件可以是基于文件树结构存储的数据描述文件,可以是数据描述文件的副本,内容与数据描述文件相同。 In this embodiment, a backup file of the data description file is obtained. The backup file may be a data description file stored based on a file tree structure, or may be a copy of the data description file, and the content of the backup file is the same as that of the data description file.
文件树结构,可以是一棵由多个文件所组成或由多个数据所组成的树型结构,可以是以目录形式存在的一种目录树结构。备份文件可以包括元数据与原始数据文件。The file tree structure can be a tree structure composed of multiple files or multiple data, or a directory tree structure in the form of a directory. The backup file can include metadata and original data files.
示例性的,在文件树结构下,执行备份处理业务时,可以创建名为“备份文件”的文件夹,将元数据和原始数据文件归集到该“备份文件”文件夹中,表示为基础备份文件。在进行具体的备份业务时,元数据与原始数据文件可以拥有数量众多的原始数据文件。For example, in the file tree structure, when performing a backup processing business, a folder named "backup file" can be created, and metadata and original data files can be grouped into the "backup file" folder to represent the basic backup file. When performing a specific backup business, the metadata and original data files can have a large number of original data files.
S103、按照备份文件的目录对备份文件进行快照,使备份文件中的数据移动至快照文件中。S103: Take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
在本实施例中,快照可以是对备份文件的一个完全可用拷贝,该拷贝包括相应数据在某个时间点(拷贝开始的时间点)的映像。快照可以是其所表示的数据的一个副本,也可以是数据的一个复制品。对于备份文件来说,文件快照是对备份文件的一个即时拷贝,包含了备份文件在快照生成时刻所有的信息,本身也是一个完整可用的副本。In this embodiment, a snapshot can be a completely usable copy of a backup file, which includes an image of the corresponding data at a certain point in time (the time when the copy starts). A snapshot can be a copy of the data it represents, or a replica of the data. For a backup file, a file snapshot is an instant copy of the backup file, which contains all the information of the backup file at the time the snapshot is generated, and is also a completely usable copy.
所述按照备份文件的目录对备份文件进行快照,使备份文件中的数据移动至快照文件中。快照是基于备份数据的快照,基于备份数据的文件树结构对备份文件进行快照处理与存储。备份文件被快照的时候数据移动至快照文件中,备份文件中的备份数据清空。数据描述文件在进行文件快照处理后,仅保存原始数据和快照数据,不再存在备份数据,备份数据可以文件快照过程中的一个过渡数据。需要注意的是,备份数据是可以被篡改或修改的,但快照数据无法被修改。The backup file is snapshotted according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file. The snapshot is based on the snapshot of the backup data, and the backup file is snapshotted and stored based on the file tree structure of the backup data. When the backup file is snapshotted, the data is moved to the snapshot file, and the backup data in the backup file is cleared. After the file snapshot processing is performed on the data description file, only the original data and the snapshot data are saved, and the backup data no longer exists. The backup data can be a transitional data in the file snapshot process. It should be noted that the backup data can be tampered with or modified, but the snapshot data cannot be modified.
示例性的,在执行快照处理业务时,可以创建名为“快照文件”的文件夹。当进行快照的时候,会将基础备份存储中相关的元数据和数据文件移动到该文件夹。Exemplarily, when executing the snapshot processing service, a folder named "snapshot file" may be created. When taking a snapshot, the metadata and data files related to the basic backup storage are moved to the folder.
在本实施例中,获取数据描述文件,对数据描述文件进行备份,使数据描述文件中的数据基于文件树结构归类备份到备份文件中,按照备份文件的目录对备份文件进行快照,使备份文件中的数据移动至快照文件中。上述技术方案中,采用文件树结构将文件归类备份,可以实现对数据的分类管理,有效提升索引效率。In this embodiment, a data description file is obtained, the data description file is backed up, the data in the data description file is classified and backed up in the backup file based on the file tree structure, a snapshot is taken of the backup file according to the directory of the backup file, and the data in the backup file is moved to the snapshot file. In the above technical solution, the file tree structure is used to classify and back up files, which can realize the classification management of data and effectively improve the indexing efficiency.
在上述实施例基础上,本示例实施例对数据描述文件中的元数据进行解释,其中,元数据包括:Based on the above embodiment, this exemplary embodiment interprets the metadata in the data description file, wherein the metadata includes:
1)索引元数据,用于描述原始数据文件的索引信息。1) Index metadata, which is used to describe the index information of the original data file.
在本实施例中,索引元数据可以是为了更好的描述快照存储拥有哪些数据的一种描写信息。在数据库中,根据索引元数据的索引值获取目标数据,索引元数据可以用于存储索引值。In this embodiment, the index metadata may be a kind of description information for better describing what data the snapshot storage has. In the database, the target data is obtained according to the index value of the index metadata, and the index metadata may be used to store the index value.
例如,为了更快的进行快照存储,需要更小的索引数据,因此将索引元数据进行了两级的设计处理,包括一级索引元数据与二级索引元数据。For example, in order to perform snapshot storage faster, smaller index data is required, so the index metadata is designed and processed at two levels, including primary index metadata and secondary index metadata.
其中,一级索引元数据,可以用于描述二级索引元数据。Among them, the primary index metadata can be used to describe the secondary index metadata.
在本实施例中,一级索引元数据可以仅通过数据位来标记拥有哪些第二级索引描述元数据。例如,一级索引元数据可以存储二级索引元数据。In this embodiment, the primary index metadata may only use data bits to mark which secondary index description metadata it has. For example, the primary index metadata may store secondary index metadata.
示例性的,一级索引元数据可以包括0和1,但不限于0和1。其中,0和1下可以包括二级索引元数据,二级索引元数据可以包括一定数量的数据,例如100,在一级索引元数据0 下,可以包含二级索引元数据0-99,在一级索引元数据1下,可以包含二级索引元数据100-199,本实施例对一级索引元数据及二级索引元数据的具体形式不作限制。Exemplarily, the primary index metadata may include 0 and 1, but is not limited to 0 and 1. 0 and 1 may include secondary index metadata, and the secondary index metadata may include a certain amount of data, such as 100. Under the primary index metadata, secondary index metadata 0-99 may be included, and under the primary index metadata 1, secondary index metadata 100-199 may be included. This embodiment does not limit the specific forms of the primary index metadata and the secondary index metadata.
二级索引元数据,可以用于描述原始数据文件中的原始数据的索引。Secondary index metadata can be used to describe the index of the original data in the original data file.
在本实施例中,二级索引元数据可以真正的索引到数据文件中,确定其寄存的具体的数据,并可以通过二级索引确定数据存储的位置。在数据备份与快照中,可以直接通过索引元数据中的二级索引元数据确定待进行备份或快照处理的数据。In this embodiment, the secondary index metadata can be truly indexed into the data file to determine the specific data stored therein, and the location of the data storage can be determined through the secondary index. In data backup and snapshot, the data to be backed up or snapshotted can be directly determined through the secondary index metadata in the index metadata.
2)数据引用计数元数据,用于描述原始数据文件中的原始数据的引用次数。2) Data reference count metadata, which is used to describe the number of references to the original data in the original data file.
在本实施例中,数据引用计数元数据,可以理解为能够记录原始数据文件中的原始数据引用次数的数据,通过记录其引用次数来确定该原始数据否被使用过。当数据引用计数元数据计数为0时表示对应的内容不再被使用,则该内容可以被清理。In this embodiment, the data reference count metadata can be understood as data that can record the number of times the original data in the original data file is referenced, and whether the original data has been used is determined by recording its number of references. When the data reference count metadata count is 0, it means that the corresponding content is no longer used, and the content can be cleaned up.
示例性的,数据库中存储的数据,一旦被使用,该数据的引用次数就会加1,即数据引用计数元数据计数加1,所述数据被使用可以包括备份或快照。当数据引用次数为0,可以表示该数据没有被使用过,可以说明该数据不被需要,此时,在进行文件备份或快照时可以选择性的对该数据不进行存储。同样的,数据以文件形式存储,该文件可以包括多条数据,文件也有可能被备份或快照,当文件被复制备份后,文件引用次数加1,若该文件引用次数是0,可以表示该文件未被使用过,此时,在进行备份或快照处理时,可以选择性的对该文件不进行存储。Exemplarily, once the data stored in the database is used, the number of references to the data will increase by 1, that is, the data reference count metadata count will increase by 1, and the use of the data may include backup or snapshot. When the number of data references is 0, it can be said that the data has not been used, which can indicate that the data is not needed. In this case, the data can be selectively not stored when performing file backup or snapshot. Similarly, data is stored in the form of a file, which can include multiple pieces of data, and the file may also be backed up or snapshotted. When the file is copied and backed up, the number of file references increases by 1. If the number of file references is 0, it can be said that the file has not been used. In this case, the file can be selectively not stored when performing backup or snapshot processing.
例如,在文件备份或快照中,可以进行差额备份,即并非将全部的原始数据文件及数据都备份,可以备份其部分内容,部分内容可以包括数据引用计数元数据计数非0部分的数据。计数为0部分的数据无须进行备份。For example, in file backup or snapshot, differential backup can be performed, that is, not all original data files and data are backed up, but part of them can be backed up, and the part of them can include data with non-zero data reference counts and metadata counts. Data with zero counts does not need to be backed up.
3)文件描述元数据,用于描述原始数据文件的文件存储信息。3) File description metadata, which is used to describe the file storage information of the original data file.
在本实施例中,文件描述元数据中,文件存储信息包括:原始文件数据的存储路径、一写多读WORM保护信息和原始数据文件的引用次数。In this embodiment, in the file description metadata, the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file.
原始文件数据的存储路径,可以是数据文件的存储目录信息,例如/备份文件/原始数据文件/0/100等信息。The storage path of the original file data may be storage directory information of the data file, such as /backup file/original data file/0/100 and the like.
一写多读WORM保护信息是关于WORM技术的信息,WORM技术是一写多读(Write Once Read Many,WORM)或不可变存储(Immutable storage)技术,确保写入的数据保持在只读状态,经授权的用户可以读WORM保存的数据,但不能对数据修改、删除、覆盖,从而满足数据保存与安全性的要求。用户可以在WORM卷创建共享文件夹,所有核心机密数据都存放在这一文件夹上,集中进行保护。WORM protection information is about WORM technology. WORM technology is Write Once Read Many (WORM) or immutable storage technology, which ensures that written data remains in a read-only state. Authorized users can read data stored in WORM, but cannot modify, delete, or overwrite the data, thus meeting the requirements of data preservation and security. Users can create a shared folder on a WORM volume, and all core confidential data is stored in this folder for centralized protection.
示例性的,WORM保护信息可以包括是否采用WORM保护或保护机制标识号。是否采用WORM保护可以包括当数据表示为0时,可以表示并未采用WORM保护,当数据表示为1时,可以表示采用了WORM保护的机制。保护机制标识号可以用于表示所采用的保护机制的标识号,根据保护机制标识号查询得到所采用的保护机制。需要说明的是,WORM保护信息存在于原始数据文件的文件描述元数据中,基于原始数据文件建立文件夹,在该文件夹子 目录下,任一数据文件都具有WORM保护信息,存在WORM保护机制。在文件备份或文件快照中,WORM保护信息随着原始文件的移动而移动。Exemplarily, the WORM protection information may include whether WORM protection is adopted or a protection mechanism identification number. Whether WORM protection is adopted may include that when the data is represented as 0, it can indicate that WORM protection is not adopted, and when the data is represented as 1, it can indicate that a WORM protection mechanism is adopted. The protection mechanism identification number can be used to indicate the identification number of the adopted protection mechanism, and the adopted protection mechanism can be obtained by querying the protection mechanism identification number. It should be noted that the WORM protection information exists in the file description metadata of the original data file. A folder is created based on the original data file, and the subfolders in the folder are stored in the subfolders. In the directory, any data file has WORM protection information and a WORM protection mechanism exists. In file backup or file snapshot, the WORM protection information moves along with the original file.
示例性的,在进行文件备份或快照时,可以在WORM卷根据文件树结构创建共享文件夹,对该文件夹内的文件及数据进行WORM保护。该文件夹具备WORM保护信息,则该文件夹的下级文件夹,即子目录文件夹也同样具备WORM保护信息。由于WORM保护机制具有不可篡改性,可以有效防止病毒破坏数据以及误操作或者系统崩溃导致的数据丢失,防止被黑客攻击导致的元数据丢失,有效的对快照存储进行不可变数据保护,极大地保证了文件数据的安全性。For example, when performing file backup or snapshot, a shared folder can be created in a WORM volume according to the file tree structure, and the files and data in the folder can be protected by WORM. If the folder has WORM protection information, then the folder's subordinate folders, that is, the sub-directory folders, also have WORM protection information. Since the WORM protection mechanism is tamper-proof, it can effectively prevent viruses from destroying data and data loss caused by misoperation or system crashes, prevent metadata loss caused by hacker attacks, and effectively protect snapshot storage from immutable data, greatly ensuring the security of file data.
原始数据文件的引用次数可以理解为该原始数据文件被使用、复制、备份或快照的次数。引用次数为0,可以表示该原始数据文件从未被引用过,证明其不被需要,在进行备份或快照时,可以选择性的略过。The number of times an original data file is referenced can be understood as the number of times the original data file is used, copied, backed up, or snapshotted. If the number of references is 0, it means that the original data file has never been referenced, proving that it is not needed and can be selectively skipped when backing up or taking snapshots.
图2是本申请另一实施例提供的一种文件快照方法的流程图。图3是本申请实施例提供的一种文件快照方法的文件树结构示意图。本实施例可适用于按文件树结构进行文件快照处理的情形,该方法可以由文件快照系统来执行,该文件快照系统可以采用硬件和/或软件的形式实现。Fig. 2 is a flow chart of a file snapshot method provided by another embodiment of the present application. Fig. 3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application. This embodiment is applicable to the case where file snapshot processing is performed according to the file tree structure. The method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.
在本实施例中,对数据描述文件以文件树结构进行备份,以及按照备份文件的目录对备份文件进行快照做进一步的说明。In this embodiment, the data description file is backed up in a file tree structure, and a snapshot of the backup file is taken according to the directory of the backup file for further description.
如图2所示,该方法包括:As shown in FIG. 2 , the method includes:
S201、获取数据描述文件。S201, obtain a data description file.
在本实施例中,数据描述文件包括元数据与原始数据文件。元数据包括索引元数据、数据引用计数元数据和文件描述元数据。索引元数据包括一级索引元数据和二级索引元数据。文件描述元数据用于描述包括原始文件数据的存储路径、一写多读WORM保护信息和原始数据文件的引用次数的文件存储信息。In this embodiment, the data description file includes metadata and the original data file. The metadata includes index metadata, data reference count metadata, and file description metadata. The index metadata includes primary index metadata and secondary index metadata. The file description metadata is used to describe the file storage information including the storage path of the original file data, write-once-read-many WORM protection information, and the number of references to the original data file.
S202、创建备份文件子目录。S202: Create a backup file subdirectory.
在本实施例中,根据文件树结构创建备份文件子目录。备份文件子目录可以根据需求自主命名,在本实施例中将其命名为“备份文件”。该“备份文件”子目录中可以包括对数据描述文件的全部备份内容。In this embodiment, a backup file subdirectory is created according to the file tree structure. The backup file subdirectory can be named according to the needs, and in this embodiment, it is named "backup file". The "backup file" subdirectory can include all backup contents of the data description file.
S203、在备份文件子目录下创建备份元数据子目录和备份文件数据子目录。S203: Create a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory.
在本实施例中,基于文件树结构,在备份文件子目录下创建备份元数据子目录和备份文件数据子目录。子目录命名规则可以根据需求决定,本实施例对此不作限定,例如可以将备份文件子目录下用于存储元数据的备份元数据子目录命名为“备份元数据”,将用于存储备份文件的备份文件数据子目录命名为“备份文件数据”。In this embodiment, based on the file tree structure, a backup metadata subdirectory and a backup file data subdirectory are created under the backup file subdirectory. The subdirectory naming rule can be determined according to the requirements, and this embodiment does not limit this. For example, the backup metadata subdirectory for storing metadata under the backup file subdirectory can be named "backup metadata", and the backup file data subdirectory for storing backup files can be named "backup file data".
S204、将数据描述文件中的元数据备份到备份元数据子目录下。S204: Back up the metadata in the data description file to the backup metadata subdirectory.
在本实施例中,备份元数据子目录下可以包括数据描述文件中的索引元数据、数据引用计数元数据和文件描述元数据。其中,索引元数据可以包括一级索引元数据和二级索引元数 据。数据引用计数元数据可以包括对数据文件的引用次数,例如可以是0次、1次、2次或n次。引用次数越多,说明该数据存在较高的被使用需求。In this embodiment, the backup metadata subdirectory may include index metadata, data reference count metadata, and file description metadata in the data description file. The index metadata may include primary index metadata and secondary index metadata. The data reference count metadata may include the number of references to the data file, for example, 0, 1, 2, or n times. The more references there are, the higher the demand for the data to be used.
示例性的,元数据A的备份路径可以呈现为:/备份文件/备份元数据/数据引用计数元数据/A。Exemplarily, the backup path of metadata A may be presented as: /backup file/backup metadata/data reference count metadata/A.
S205、根据数据描述文件中的元数据,将原始数据文件包含的原始数据按数量归类备份到备份文件数据子目录下。S205. According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
在本实施例中,根据数据描述文件中的元数据,将原始数据文件包含的原始数据按数量归类备份到备份文件数据子目录下,可以包括将原始数据文件中引用次数不为零的原始数据,按数量归类备份到备份文件数据子目录下。引用次数不为零的原始数据可以被认为是有被使用需求的原始数据。考虑到原始数据文件可能会有很多,为了避免数据文件太多导致不可预期的结果,本方案将原始数据文件按照一定数量的文件数归类处理,备份文件数据子目录下每一个编号对应的数据文件下存放一定数据量的数据文件,实现按数量归类备份。In this embodiment, according to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up to the backup file data subdirectory, which may include classifying and backing up the original data in the original data file whose number of references is not zero and backing up to the backup file data subdirectory by quantity. The original data whose number of references is not zero can be considered as the original data that is required to be used. Considering that there may be a lot of original data files, in order to avoid unpredictable results caused by too many data files, this solution classifies the original data files according to a certain number of files, and stores a certain amount of data files under each data file corresponding to a number in the backup file data subdirectory, so as to achieve classified backup by quantity.
示例性的,在备份文件数据子目录下可以最多包括M个数据文件子目录,每一个数据文件子目录中可以最多包括N个原始数据文件,在本实施例中,M和N取1024。例如原始数据文件包含2000个原始数据文件,将原始数据备份至备份文件数据子目录下,备份文件数据子目录可以包括两个数据文件子目录,数据文件子目录“0”用于存储原始-数据文件0~1023;数据文件子目录“1”用于存储原始数据文件1024~1999。在备份文件中,可以最多备份M*N个原始数据,在本实施例中可以是1024*1024个原始数据。Exemplarily, the backup file data subdirectory may include at most M data file subdirectories, and each data file subdirectory may include at most N original data files. In this embodiment, M and N are 1024. For example, the original data file contains 2000 original data files, and the original data is backed up to the backup file data subdirectory. The backup file data subdirectory may include two data file subdirectories, data file subdirectory "0" is used to store original data files 0 to 1023; data file subdirectory "1" is used to store original data files 1024 to 1999. In the backup file, at most M*N original data can be backed up, which can be 1024*1024 original data in this embodiment.
示例性的,原始数据文件的备份路径可以呈现为:/备份文件/备份文件数据/0/*。Exemplarily, the backup path of the original data file may be presented as: /backup file/backup file data/0/*.
例如,图4是本申请实施例提供的一种文件快照方法中的文件备份流程图。For example, FIG4 is a flowchart of a file backup in a file snapshot method provided in an embodiment of the present application.
如图4所示,对数据文件进行备份时,首先查找是否存在数据文件。若不存在数据文件,创建数据块文件,可以理解为创建一个备份文件子目录,以容纳备份的文件数据信息;若存在数据文件,修改数据块文件,可以理解为在备份文件子目录中新增备份的数据文件信息。As shown in Figure 4, when backing up a data file, first check whether the data file exists. If the data file does not exist, create a data block file, which can be understood as creating a backup file subdirectory to accommodate the backup file data information; if the data file exists, modify the data block file, which can be understood as adding the backup data file information in the backup file subdirectory.
继而查找是否存在二级索引文件(二级索引元数据存储于二级索引文件中)。若不存在二级索引文件,创建二级索引文件;若存在二级索引文件,修改二级索引文件,可以理解为在二级索引文件中新增二级索引元数据。其中,二级索引元数据,可以索引到数据文件中,确定其寄存的具体的数据,并可以通过二级索引确定数据存储的位置。Then, it searches for the existence of a secondary index file (the secondary index metadata is stored in the secondary index file). If the secondary index file does not exist, it is created; if the secondary index file exists, it is modified, which can be understood as adding secondary index metadata to the secondary index file. The secondary index metadata can be indexed into the data file to determine the specific data stored therein, and the location of data storage can be determined through the secondary index.
查找是否存在一级索引文件(一级索引元数据存储于一级索引文件中)。若不存在一级索引文件,创建一级索引文件;若存在一级索引文件,修改一级索引文件,可以理解为在一级索引文件中新增一级索引元数据。其中,一级索引元数据可以仅通过数据位来标记拥有哪些第二级索引描述元数据。例如,一级索引元数据可以存储二级索引元数据。Check whether there is a first-level index file (first-level index metadata is stored in the first-level index file). If there is no first-level index file, create a first-level index file; if there is a first-level index file, modify the first-level index file, which can be understood as adding first-level index metadata to the first-level index file. Among them, the first-level index metadata can only use data bits to mark which second-level index description metadata it has. For example, the first-level index metadata can store the second-level index metadata.
查找是否存在数据块引用文件(用于存储数据引用计数元数据),若不存在数据块引用文件,创建数据块引用文件;若存在数据块引用文件,修改数据块引用文件,可以理解为在数据块引用文件中新增数据引用计数元数据。其中,数据引用计数元数据,可以理解为能够记录数据文件中的数据是否被引用以及引用次数的数据。 Check whether there is a data block reference file (used to store data reference count metadata). If there is no data block reference file, create a data block reference file; if there is a data block reference file, modify the data block reference file, which can be understood as adding data reference count metadata to the data block reference file. Among them, data reference count metadata can be understood as data that can record whether the data in the data file is referenced and the number of references.
查找是否存在文件描述(用于存储文件描述元数据),若不存在文件描述,创建文件描述;若存在文件描述,修改文件描述,可以理解为在文件描述中新增文件描述元数据。其中,文件描述元数据,可以用于描述原始数据文件的文件存储信息(原始文件数据的存储路径、一写多读WORM保护信息和原始数据文件的引用次数)。Check whether there is a file description (used to store file description metadata). If there is no file description, create a file description; if there is a file description, modify the file description, which can be understood as adding file description metadata to the file description. Among them, the file description metadata can be used to describe the file storage information of the original data file (the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file).
S206、在对备份文件进行快照时创建快照文件子目录,并生成全局唯一标识。S206: When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated.
其中,快照文件子目录可以认为是用于存储原始数据文件对应的快照文件的子目录。The snapshot file subdirectory may be considered as a subdirectory for storing snapshot files corresponding to the original data files.
在本实施例中,在对备份文件进行快照时创建快照文件子目录,子目录命名规则可以根据需求决定,本实施例对此不作限定,例如可以将快照文件子目录命名为“快照文件”。该“快照文件”子目录中可以包括对备份文件的全部内容。全局唯一标识(Universally Unique Identifier,UUID),可以使每个文件夹都具有唯一的辨识信息,在所有空间和时间上被视为唯一的标识,无需考虑创建时的名称重复问题。In this embodiment, a snapshot file subdirectory is created when a snapshot of a backup file is taken. The subdirectory naming rule can be determined according to the needs, and this embodiment does not limit this. For example, the snapshot file subdirectory can be named "snapshot file". The "snapshot file" subdirectory can include all the contents of the backup file. The Universally Unique Identifier (UUID) allows each folder to have unique identification information and be regarded as a unique identifier in all space and time, without considering the problem of name duplication when creating it.
S207、在快照文件子目录下创建采用全局唯一标识命名的快照元数据子目录。S207: Create a snapshot metadata subdirectory named with a globally unique identifier under the snapshot file subdirectory.
其中,快照元数据子目录可以认为是用于存储备份文件数据子目录下的备份元数据对应的快照元数据的子目录,快照元数据是指对备份元数据进行快照得到的数据。The snapshot metadata subdirectory may be considered as a subdirectory for storing snapshot metadata corresponding to the backup metadata under the backup file data subdirectory. The snapshot metadata refers to data obtained by taking a snapshot of the backup metadata.
在本实施例中,在对备份文件进行快照时,在快照文件子目录下创建快照元数据子目录,用于存储快照元数据。快照元数据子目录采用所生成全局唯一标识进行命名。需要说明的是,备份数据文件可能进行多次快照,即备份数据文件将产生多个快照数据,因此,这部分数据将被移动到不同的采用全局唯一标识命名的快照子目录下。In this embodiment, when a snapshot is taken of a backup file, a snapshot metadata subdirectory is created under the snapshot file subdirectory to store the snapshot metadata. The snapshot metadata subdirectory is named using the generated globally unique identifier. It should be noted that the backup data file may be snapshotted multiple times, that is, the backup data file will generate multiple snapshot data, and therefore, this part of the data will be moved to different snapshot subdirectories named using globally unique identifiers.
S208、按照备份文件的备份文件数据子目录在快照文件中创建快照文件数据子目录。S208. Create a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file.
其中,快照文件数据子目录可以认为是用于存储备份文件数据子目录下的备份文件数据对应的快照文件数据的子目录,快照文件数据是指对备份文件数据进行快照得到的文件数据。The snapshot file data subdirectory may be considered as a subdirectory for storing snapshot file data corresponding to the backup file data under the backup file data subdirectory, and the snapshot file data refers to file data obtained by taking a snapshot of the backup file data.
在本实施例中,在对备份文件进行快照时,在快照文件子目录下创建快照文件数据子目录,用于存储快照文件数据。快照文件数据子目录的命名可以和备份文件数据子目录的命名相同。In this embodiment, when a snapshot is taken of a backup file, a snapshot file data subdirectory is created under the snapshot file subdirectory for storing snapshot file data. The snapshot file data subdirectory may be named the same as the backup file data subdirectory.
S209、将备份文件的索引元数据和文件描述元数据移动到快照元数据子目录下。S209: Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory.
在本实施例中,索引元数据可以包括一级索引元数据和二级索引元数据,即快照元数据子目录下可以包括一级索引元数据、二级索引元数据和文件描述元数据。将备份文件的索引元数据和文件描述元数据移动到快照元数据子目录下,此时备份文件中的索引元数据与文件描述元数据消失。In this embodiment, the index metadata may include primary index metadata and secondary index metadata, that is, the snapshot metadata subdirectory may include primary index metadata, secondary index metadata and file description metadata. The index metadata and file description metadata of the backup file are moved to the snapshot metadata subdirectory, and the index metadata and file description metadata in the backup file disappear.
示例性的,某一元数据的备份路径可以呈现为:/快照文件/快照元数据/二级索引元数据。Exemplarily, the backup path of a certain metadata may be presented as: /snapshot file/snapshot metadata/secondary index metadata.
S210、根据备份文件中备份的元数据,将备份的原始数据文件移动到快照文件数据子目录下。S210: According to the metadata backed up in the backup file, move the backed up original data file to the snapshot file data subdirectory.
在本实施例中,根据备份文件中备份的元数据,将备份的原始数据文件移动到快照文件数据子目录下包括将引用次数不为零的原始数据文件,移动到快照文件数据子目录下,此时备份文件中备份的原始数据文件消失。 In this embodiment, according to the metadata backed up in the backup file, the backed up original data files are moved to the snapshot file data subdirectory, including moving the original data files whose reference counts are not zero to the snapshot file data subdirectory. At this time, the backed up original data files in the backup file disappear.
需要说明的是,由于在备份文件数据子目录下备份文件数据按数量归类进行备份,在经过快照后,依然按数量归类进行快照存储在快照文件数据子目录下。在数据描述文件的元数据中设置文件受WORM保护,则快照文件数据子目录下备份的原始数据文件经过快照后,依然受WORM保护,从而实现既可以对数据进行分类管理,又可以有效的对快照存储进行不可变数据保护。It should be noted that since the backup file data in the backup file data subdirectory is classified by quantity and backed up, after the snapshot, it is still classified by quantity and stored in the snapshot file data subdirectory. If the file is set to be WORM-protected in the metadata of the data description file, the original data files backed up in the snapshot file data subdirectory are still WORM-protected after the snapshot, thereby achieving both classification management of data and effective immutable data protection of snapshot storage.
示例性的,在快照文件数据子目录下可以最多包括N个数据文件,每一个数据文件中可以最多包括M个原始数据,在本实施例中,N和M取1024。例如原始数据文件包含2000个原始数据文件,将原始数据快照至快照文件数据子目录下,快照文件数据子目录包括两个数据文件,数据文件“0”用于存储原始-数据文件0~1023;数据文件“1”用于存储原始数据文件1024~1999。在快照文件中,可以最多快照N*M个原始数据,在本实施例中可以是1024*1024个原始数据。Exemplarily, the snapshot file data subdirectory may include at most N data files, and each data file may include at most M original data. In this embodiment, N and M are 1024. For example, the original data file contains 2000 original data files, and the original data is snapshotted to the snapshot file data subdirectory. The snapshot file data subdirectory includes two data files, data file "0" is used to store original data files 0 to 1023; data file "1" is used to store original data files 1024 to 1999. In the snapshot file, at most N*M original data can be snapshotted, which can be 1024*1024 original data in this embodiment.
示例性的,备份的原始数据文件的快照路径可以呈现为:/快照文件/快照文件数据/全局唯一标识/0/*。Exemplarily, the snapshot path of the backed-up original data file may be presented as: /snapshot file/snapshot file data/globally unique identifier/0/*.
在本实施例中,通过获取数据描述文件,创建备份文件子目录,在备份文件子目录下创建备份元数据子目录和备份文件数据子目录,将数据描述文件中的元数据备份到备份元数据子目录下,根据数据描述文件中的元数据,将原始数据文件包含的原始数据按数量归类备份到备份文件数据子目录下,在对备份文件进行快照时创建快照文件子目录,并生成全局唯一标识,在快照文件子目录下创建采用全局唯一标识命名的快照元数据子目录,按照备份文件的备份文件数据子目录在快照文件中创建快照文件数据子目录,将备份文件的索引元数据和文件描述元数据移动到快照元数据子目录下,根据备份文件中备份的元数据,将备份的原始数据文件移动到快照文件数据子目录下。上述技术方案中,采用独有的目录树结构设计,既可以对数据进行分类管理,又可以有效的对快照存储进行不可变数据保护。在文件备份和快照中,对原始数据按数量进行分类存储,减少在一个文件夹中存储的数据数量,有效避免在文件备份过程中发生存储错误导致从头来过的情况,实现数据备份容灾领域数据的存储要求。并且实现了对数据的分类管理,有效提升了数据存储的安全性。In this embodiment, by obtaining a data description file, creating a backup file subdirectory, creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory, backing up the metadata in the data description file to the backup metadata subdirectory, and according to the metadata in the data description file, backing up the original data contained in the original data file by quantity to the backup file data subdirectory, creating a snapshot file subdirectory when taking a snapshot of the backup file, and generating a globally unique identifier, creating a snapshot metadata subdirectory named with a globally unique identifier under the snapshot file subdirectory, creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file, moving the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory, and moving the backed-up original data file to the snapshot file data subdirectory according to the metadata backed up in the backup file. In the above technical solution, a unique directory tree structure design is adopted, which can not only classify and manage data, but also effectively protect snapshot storage from immutable data. In file backup and snapshot, the original data is classified and stored by quantity, reducing the amount of data stored in a folder, effectively avoiding the situation where storage errors occur during file backup and cause starting over, and realizing the storage requirements of data in the field of data backup and disaster recovery. It also realizes the classification management of data, effectively improving the security of data storage.
在上述实施例基础上,本示例实施例对原始数据文件进行划分,图5是本申请实施例提供的一种文件快照方法中所涉及的数据块结构示意图,原始数据文件中存储的是业务中具体的数据,为了能够快速访问以及校验具体的数据,对原始数据文件内部进行了特殊数据结构设计。在本实施例中将原始数据文件进行了区域的逻辑划分,原始数据文件可以包括:Based on the above embodiments, this example embodiment divides the original data file. FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application. The original data file stores specific data in the business. In order to quickly access and verify specific data, a special data structure is designed inside the original data file. In this embodiment, the original data file is logically divided into regions. The original data file may include:
1)头部描述区域,用于描述文件系统的版本、数据块数量和校验长度信息。1) Header description area, used to describe the file system version, number of data blocks and checksum length information.
在本实施例中,文件系统的版本,其版本信息记录在每一个具体的文件中,文件包括上述所有元数据文件以及数据文件。对版本进行记录,可以针对不同版本的数据进行区别处理。这部分信息将以字符标记的形式优先存放在每个文件的头部前32字节。In this embodiment, the version of the file system, its version information is recorded in each specific file, and the file includes all the metadata files and data files mentioned above. By recording the version, data of different versions can be processed differently. This part of information will be stored in the first 32 bytes of each file header in the form of a character mark.
2)校验区域,用于记录根据校验长度信息和数据块数量所确定的校验值。2) A check area, used to record a check value determined based on the check length information and the number of data blocks.
在本实施例中,校验区长度可以根据校验值单位长度(例如4字节)乘以数据块数量(例 如1024)乘以每块的长度(例如1M)除校验长度(例如64K)计算得出,列式为:4*1024*(1*1024*1024)/(64*1024)。校验区域将根据头部描述区域的校验长度以及数据块数量进行记录每一个需要校验的数据的校验值。校验值可以根据哈希算法得出唯一的值,也可以根据其他方法确认。校验区域的设计,可以根据具体的数据重要性进行数据的校验,以确保数据的正确性。In this embodiment, the length of the check area can be calculated by multiplying the check value unit length (for example, 4 bytes) by the number of data blocks (for example, The calculation is: 4*1024*(1*1024*1024)/(64*1024) multiplied by the length of each block (e.g. 1024) divided by the checksum length (e.g. 64K). The checksum area will record the checksum value of each data to be checked according to the checksum length of the header description area and the number of data blocks. The checksum value can be a unique value obtained by the hash algorithm, or it can be confirmed by other methods. The design of the checksum area can verify the data according to the specific importance of the data to ensure the correctness of the data.
示例性的,将需要存储的数据按照一定长度进行哈希算法计算得到数据的校验值,该校验值存放在校验区域中。哈希算法例如可以是MurmurHash2算法,本申请实施例对此不设限制。Exemplarily, the data to be stored is subjected to a hash algorithm calculation according to a certain length to obtain a check value of the data, and the check value is stored in the check area. The hash algorithm may be, for example, the MurmurHash2 algorithm, which is not limited in the embodiment of the present application.
根据数据的重要性,在需要的情况下,可以从数据文件数据区域中获取的数据进行相同的算法计算得到校验值。将该值和校验区域中的值进行比对,以此确保数据的有效性。According to the importance of the data, if necessary, the same algorithm can be used to calculate the check value obtained from the data area of the data file. This value is compared with the value in the check area to ensure the validity of the data.
3)偏移区域,用于记录定位数据块标识的偏移量。3) Offset area, used to record the offset of the positioning data block identifier.
在本实施例中,偏移区域记录定位数据块标识的偏移量,通过数据块标识偏移量定位到具体的数据块。In this embodiment, the offset area records the offset of the positioning data block identifier, and the specific data block is positioned through the data block identifier offset.
4)数据区域,用于基于数据块存储原始数据。4) Data area, used to store original data based on data blocks.
在本实施例中,数据区域按照每块数据固定的数据大小(1MB)进行追加存储,其具体的数据块数量可以是头部描述信息中提到的数据块数量,例如可以默认为1024块。In this embodiment, the data area performs additional storage according to a fixed data size (1MB) of each data block, and the specific number of data blocks may be the number of data blocks mentioned in the header description information, for example, it may be 1024 blocks by default.
在本实施例中,通过对数据块数量进行了一定的限制,可以控制数据文件的大小,使得其在具体的快照业务中能得到最好的空间和效率的表现。In this embodiment, by imposing a certain limit on the number of data blocks, the size of the data file can be controlled so that the best space and efficiency performance can be obtained in a specific snapshot service.
图6是本申请实施例提供的一种文件快照系统的结构示意图。如图6所示,该系统包括:FIG6 is a schematic diagram of the structure of a file snapshot system provided in an embodiment of the present application. As shown in FIG6 , the system includes:
文件获取模块61,设置为获取数据描述文件,所述数据描述文件包括:元数据和原始数据文件;A file acquisition module 61 is configured to acquire a data description file, wherein the data description file includes metadata and an original data file;
文件备份模块62,设置为对所述数据描述文件进行备份,使所述数据描述文件中的数据基于文件树结构归类备份到备份文件中;A file backup module 62, configured to back up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;
文件快照模块63,设置为按照所述备份文件的目录对所述备份文件进行快照,使所述备份文件中的数据移动至快照文件中。The file snapshot module 63 is configured to take a snapshot of the backup file according to the directory of the backup file, so as to move the data in the backup file to the snapshot file.
本技术方案采用的文件快照系统,采用文件树结构将文件归类备份,可以实现对数据的分类管理,有效提升索引效率。The file snapshot system adopted in this technical solution uses a file tree structure to classify and back up files, which can realize classified management of data and effectively improve indexing efficiency.
例如,所述元数据包括:For example, the metadata includes:
索引元数据,设置为描述所述原始数据文件的索引信息;Index metadata, configured to describe index information of the original data file;
数据引用计数元数据,设置为描述所述原始数据文件中的原始数据的引用次数;Data reference count metadata, which is set to describe the number of references to the original data in the original data file;
文件描述元数据,设置为描述所述原始数据文件的文件存储信息。The file description metadata is configured to describe the file storage information of the original data file.
例如,所述索引元数据包括:For example, the index metadata includes:
一级索引元数据,设置为描述二级索引元数据;Primary index metadata, set to describe the secondary index metadata;
二级索引元数据,设置为描述所述原始数据文件中的原始数据的索引。The secondary index metadata is set to describe the index of the original data in the original data file.
例如,所述文件存储信息包括:所述原始文件数据的存储路径、一写多读WORM保护 信息和所述原始数据文件的引用次数。For example, the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection Information and number of citations of the original data file.
例如,所述文件备份模块62,可以设置为:For example, the file backup module 62 may be configured as follows:
创建备份文件子目录;Create a backup file subdirectory;
在所述备份文件子目录下创建备份元数据子目录和备份文件数据子目录;Creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;
将所述数据描述文件中的所述元数据备份到所述备份元数据子目录下;Backing up the metadata in the data description file to the backup metadata subdirectory;
根据所述数据描述文件中的所述元数据,将所述原始数据文件包含的原始数据按数量归类备份到所述备份文件数据子目录下。According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
例如,所述文件快照模块63,可以设置为:For example, the file snapshot module 63 may be configured as follows:
在对所述备份文件进行快照时创建快照文件子目录,并生成全局唯一标识;When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated;
在所述快照文件子目录下创建采用所述全局唯一标识命名的快照元数据子目录;按照所述备份文件的备份文件数据子目录在所述快照文件中创建快照文件数据子目录;Creating a snapshot metadata subdirectory named with the globally unique identifier under the snapshot file subdirectory; creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;
将所述备份文件的索引元数据和文件描述元数据移动到所述快照元数据子目录下;Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory;
根据所述备份文件中备份的元数据,将备份的原始数据文件移动到所述快照文件数据子目录下。According to the metadata backed up in the backup file, the backed up original data file is moved to the snapshot file data subdirectory.
例如,所述原始数据文件包括:头部描述区域、校验区域、偏移区域和数据区域;For example, the original data file includes: a header description area, a check area, an offset area and a data area;
所述头部描述区域用于描述文件系统的版本、数据块数量和校验长度信息;The header description area is used to describe the version of the file system, the number of data blocks and the check length information;
所述校验区域用于记录根据所述校验长度信息和所述数据块数量所确定的校验值;The check area is used to record a check value determined according to the check length information and the number of data blocks;
所述偏移区域用于记录定位数据块标识的偏移量;The offset area is used to record the offset of the positioning data block identifier;
所述数据区域用于基于数据块存储原始数据。The data area is used to store original data based on data blocks.
本申请实施例所提供的文件快照系统可执行本申请任意实施例所提供的文件快照方法,具备执行方法相应的功能模块和有益效果。The file snapshot system provided in the embodiments of the present application can execute the file snapshot method provided in any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.
图7示出了可以用来实施本申请的实施例的电子设备70的结构示意图。电子设备旨在表示多种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示多种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。Fig. 7 shows a block diagram of an electronic device 70 that can be used to implement an embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or required herein.
如图7所示,电子设备70包括至少一个处理器71,以及与至少一个处理器71通信连接的存储器,如只读存储器(ROM)72、随机访问存储器(RAM)73等,其中,存储器存储有可被至少一个处理器执行的计算机程序,处理器71可以根据存储在只读存储器(ROM)72中的计算机程序或者从存储单元78加载到随机访问存储器(RAM)73中的计算机程序,来执行多种适当的动作和处理。在RAM 73中,还可存储电子设备70操作所需的多种程序和数据。处理器71、ROM 72以及RAM 73通过总线74彼此相连。输入/输出(I/O)接口75也连接至总线74。As shown in FIG. 7 , the electronic device 70 includes at least one processor 71, and a memory connected to the at least one processor 71 in communication, such as a read-only memory (ROM) 72, a random access memory (RAM) 73, etc., wherein the memory stores a computer program that can be executed by at least one processor, and the processor 71 can perform a variety of appropriate actions and processes according to the computer program stored in the read-only memory (ROM) 72 or the computer program loaded from the storage unit 78 to the random access memory (RAM) 73. In the RAM 73, a variety of programs and data required for the operation of the electronic device 70 can also be stored. The processor 71, ROM 72, and RAM 73 are connected to each other via a bus 74. An input/output (I/O) interface 75 is also connected to the bus 74.
电子设备70中的多个部件连接至I/O接口75,包括:输入单元76,例如键盘、鼠标等; 输出单元77,例如多种类型的显示器、扬声器等;存储单元78,例如磁盘、光盘等;以及通信单元79,例如网卡、调制解调器、无线通信收发机等。通信单元79允许电子设备70通过诸如因特网的计算机网络和/或多种电信网络与其他设备交换信息/数据。A number of components in the electronic device 70 are connected to the I/O interface 75, including: an input unit 76, such as a keyboard, a mouse, etc.; Output unit 77, such as various types of displays, speakers, etc.; storage unit 78, such as magnetic disks, optical disks, etc.; and communication unit 79, such as network cards, modems, wireless communication transceivers, etc. The communication unit 79 allows the electronic device 70 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
处理器71可以是多种具有处理和计算能力的通用和/或专用处理组件。处理器71的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、多种专用的人工智能(AI)计算芯片、多种运行机器学习模型算法的处理器、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。处理器71执行上文所描述的多个方法和处理,例如文件快照方法。The processor 71 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 71 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a variety of special artificial intelligence (AI) computing chips, a variety of processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc. The processor 71 performs the multiple methods and processes described above, such as the file snapshot method.
在一些实施例中,文件快照方法可被实现为计算机程序,其被有形地包含于计算机可读存储介质,例如存储单元78。在一些实施例中,计算机程序的部分或者全部可以经由ROM 72和/或通信单元79而被载入和/或安装到电子设备70上。当计算机程序加载到RAM 73并由处理器71执行时,可以执行上文描述的文件快照方法的一个或多个步骤。备选地,在其他实施例中,处理器71可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行文件快照方法。In some embodiments, the file snapshot method may be implemented as a computer program, which is tangibly contained in a computer-readable storage medium, such as a storage unit 78. In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 70 via the ROM 72 and/or the communication unit 79. When the computer program is loaded into the RAM 73 and executed by the processor 71, one or more steps of the file snapshot method described above may be performed. Alternatively, in other embodiments, the processor 71 may be configured to perform the file snapshot method in any other suitable manner (e.g., by means of firmware).
本文中以上描述的系统和技术的多种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些多种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
用于实施本申请的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器,使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that when the computer programs are executed by the processor, the functions/operations specified in the flow charts and/or block diagrams are implemented. The computer programs may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.
在本申请的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。备选地,计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。计算机可读存储介质可以为非暂态计算机 可读存储介质。In the context of the present application, a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, device, or apparatus. A computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing. Alternatively, a computer readable storage medium may be a machine readable signal medium. More specific examples of machine readable storage media may include electrical connections based on one or more lines, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. A computer readable storage medium may be a non-transitory computer readable medium. A readable storage medium.
为了提供与用户的交互,可以在电子设备上实施此处描述的系统和技术,该电子设备具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、区块链网络和互联网。The systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), a blockchain network, and the Internet.
计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以避免相关技术中物理主机与VPS服务中,存在的管理难度大,业务扩展性弱的情况。A computing system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The client and server relationship is generated by computer programs running on the corresponding computers and having a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to avoid the management difficulty and weak business scalability of physical hosts and VPS services in related technologies.
应该理解,可以使用上面所示的多种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的多个步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请的技术方案所期望的结果,本文在此不进行限制。 It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the multiple steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution of this application can be achieved, and this document is not limited here.

Claims (10)

  1. 一种文件快照方法,包括:A file snapshot method, comprising:
    获取数据描述文件,所述数据描述文件包括:元数据和原始数据文件;Acquire a data description file, wherein the data description file includes: metadata and an original data file;
    对所述数据描述文件进行备份,使所述数据描述文件中的数据基于文件树结构归类备份到备份文件中;Backing up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;
    按照所述备份文件的目录对所述备份文件进行快照,使所述备份文件中的数据移动至快照文件中。A snapshot is taken of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
  2. 根据权利要求1所述的方法,其中,所述元数据包括:The method according to claim 1, wherein the metadata comprises:
    索引元数据,用于描述所述原始数据文件的索引信息;Index metadata, used to describe index information of the original data file;
    数据引用计数元数据,用于描述所述原始数据文件中的原始数据的引用次数;Data reference count metadata, used to describe the number of references to the original data in the original data file;
    文件描述元数据,用于描述所述原始数据文件的文件存储信息。The file description metadata is used to describe the file storage information of the original data file.
  3. 根据权利要求2所述的方法,其中,所述索引元数据包括:The method according to claim 2, wherein the index metadata comprises:
    一级索引元数据,用于描述二级索引元数据;Primary index metadata, used to describe secondary index metadata;
    二级索引元数据,用于描述所述原始数据文件中的原始数据的索引。The secondary index metadata is used to describe the index of the original data in the original data file.
  4. 根据权利要求2所述的方法,其中,所述文件存储信息包括:所述原始文件数据的存储路径、一写多读WORM保护信息和所述原始数据文件的引用次数。The method according to claim 2, wherein the file storage information includes: a storage path of the original file data, write-once-read-many WORM protection information, and a reference count of the original data file.
  5. 根据权利要求1所述的方法,其中,所述对所述数据描述文件进行备份,使所述数据描述文件中的数据基于目录树结构归类备份到备份文件中,包括:The method according to claim 1, wherein the step of backing up the data description file so that the data in the data description file is classified and backed up into a backup file based on a directory tree structure comprises:
    创建备份文件子目录;Create a backup file subdirectory;
    在所述备份文件子目录下创建备份元数据子目录和备份文件数据子目录;Creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;
    将所述数据描述文件中的所述元数据备份到所述备份元数据子目录下;Backing up the metadata in the data description file to the backup metadata subdirectory;
    根据所述数据描述文件中的所述元数据,将所述原始数据文件包含的原始数据按数量归类备份到所述备份文件数据子目录下。According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
  6. 根据权利要求5所述的方法,其中,所述按照所述备份文件的目录对所述备份文件进行快照,使所述备份文件中的数据移动至快照文件中,包括:The method according to claim 5, wherein taking a snapshot of the backup file according to the directory of the backup file so as to move the data in the backup file to the snapshot file comprises:
    在对所述备份文件进行快照时创建快照文件子目录,并生成全局唯一标识;When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated;
    在所述快照文件子目录下创建采用所述全局唯一标识命名的快照元数据子目录;Creating a snapshot metadata subdirectory named with the globally unique identifier under the snapshot file subdirectory;
    按照所述备份文件的备份文件数据子目录在所述快照文件中创建快照文件数据子目录;Creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;
    将所述备份文件的索引元数据和文件描述元数据移动到所述快照元数据子目录下;Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory;
    根据所述备份文件中备份的元数据,将备份的原始数据文件移动到所述快照文件数据子目录下。According to the metadata backed up in the backup file, the backed up original data file is moved to the snapshot file data subdirectory.
  7. 根据权利要求1所述的方法,其中,所述原始数据文件包括:头部描述区域、校验区域、偏移区域和数据区域;The method according to claim 1, wherein the original data file comprises: a header description area, a check area, an offset area and a data area;
    所述头部描述区域用于描述文件系统的版本、数据块数量和校验长度信息;The header description area is used to describe the version of the file system, the number of data blocks and the check length information;
    所述校验区域用于记录根据所述校验长度信息和所述数据块数量所确定的校验值;The check area is used to record a check value determined according to the check length information and the number of data blocks;
    所述偏移区域用于记录定位数据块标识的偏移量; The offset area is used to record the offset of the positioning data block identifier;
    所述数据区域用于基于数据块存储原始数据。The data area is used to store original data based on data blocks.
  8. 一种文件快照系统,包括:A file snapshot system, comprising:
    文件获取模块,设置为获取数据描述文件,所述数据描述文件包括:元数据和原始数据文件;A file acquisition module, configured to acquire a data description file, wherein the data description file includes: metadata and an original data file;
    文件备份模块,设置为对所述数据描述文件进行备份,使所述数据描述文件中的数据基于文件树结构归类备份到备份文件中;A file backup module, configured to back up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;
    文件快照模块,设置为按照所述备份文件的目录对所述备份文件进行快照,使所述备份文件中的数据移动至快照文件中。The file snapshot module is configured to take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
  9. 一种电子设备,包括:An electronic device, comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的文件快照方法。The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the file snapshot method according to any one of claims 1 to 7.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现权利要求1-7中任一项所述的文件快照方法。 A computer-readable storage medium stores computer instructions, wherein the computer instructions are used to enable a processor to implement the file snapshot method according to any one of claims 1 to 7 when executed.
PCT/CN2023/080695 2022-10-18 2023-03-10 File snapshot method and system, electronic device, and storage medium WO2024082525A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211272039.X 2022-10-18
CN202211272039.XA CN115543918A (en) 2022-10-18 2022-10-18 File snapshot method, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2024082525A1 true WO2024082525A1 (en) 2024-04-25

Family

ID=84734798

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080695 WO2024082525A1 (en) 2022-10-18 2023-03-10 File snapshot method and system, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN115543918A (en)
WO (1) WO2024082525A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543918A (en) * 2022-10-18 2022-12-30 上海爱数信息技术股份有限公司 File snapshot method, system, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365744A (en) * 2012-04-04 2013-10-23 国际商业机器公司 System and method using metadata image backup and traditional backup
CN103593262A (en) * 2013-11-15 2014-02-19 上海爱数软件有限公司 Virtual machine backup method based on classification
US9183208B1 (en) * 2010-12-24 2015-11-10 Netapp, Inc. Fileshot management
CN107526840A (en) * 2017-09-14 2017-12-29 郑州云海信息技术有限公司 File system snapshot querying method, device and computer-readable recording medium
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN115543918A (en) * 2022-10-18 2022-12-30 上海爱数信息技术股份有限公司 File snapshot method, system, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183208B1 (en) * 2010-12-24 2015-11-10 Netapp, Inc. Fileshot management
CN103365744A (en) * 2012-04-04 2013-10-23 国际商业机器公司 System and method using metadata image backup and traditional backup
CN103593262A (en) * 2013-11-15 2014-02-19 上海爱数软件有限公司 Virtual machine backup method based on classification
CN107526840A (en) * 2017-09-14 2017-12-29 郑州云海信息技术有限公司 File system snapshot querying method, device and computer-readable recording medium
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN115543918A (en) * 2022-10-18 2022-12-30 上海爱数信息技术股份有限公司 File snapshot method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115543918A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
US9697228B2 (en) Secure relational file system with version control, deduplication, and error correction
US10949405B2 (en) Data deduplication device, data deduplication method, and data deduplication program
US8458186B2 (en) Systems and methods for processing and managing object-related data for use by a plurality of applications
US8805849B1 (en) Enabling use of analytic functions for distributed storage system data
US11030054B2 (en) Methods and systems for data backup based on data classification
US11176165B2 (en) Search and analytics for storage systems
US11093448B2 (en) Methods and systems for metadata tag inheritance for data tiering
US11113148B2 (en) Methods and systems for metadata tag inheritance for data backup
US10956499B2 (en) Efficient property graph storage for streaming/multi-versioning graphs
US20200242080A1 (en) Methods and Systems for Natural Language Processing of Metadata
US20200241769A1 (en) Methods and systems for encryption based on cognitive data classification
WO2021129151A1 (en) File backup method and apparatus, and terminal device
WO2024082525A1 (en) File snapshot method and system, electronic device, and storage medium
US20200242159A1 (en) Methods and systems for event based tagging of metadata
US20200242077A1 (en) Methods and Systems for Metadata Tag Inheritance Between Multiple Storage Systems
CN115427945A (en) Custom metadata tag inheritance based on file system directory tree or object bucket
US11100048B2 (en) Methods and systems for metadata tag inheritance between multiple file systems within a storage system
US11176000B2 (en) Methods and systems for custom metadata driven data protection and identification of data
Sarkar et al. Query language support for timely data deletion
WO2020192663A1 (en) Data management method and related device
WO2024087426A1 (en) Full snapshot rapid generation method and apparatus, electronic device and storage medium
CN112925750A (en) Method, electronic device and computer program product for accessing data
US20130297576A1 (en) Efficient in-place preservation of content across content sources
US10642789B2 (en) Extended attribute storage
WO2017045493A1 (en) Method and device for setting files to be removed

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878528

Country of ref document: EP

Kind code of ref document: A1