WO2024082525A1

WO2024082525A1 - File snapshot method and system, electronic device, and storage medium

Info

Publication number: WO2024082525A1
Application number: PCT/CN2023/080695
Authority: WO
Inventors: 陈勇; 王瀚; 鲍苏宁
Original assignee: 上海爱数信息技术股份有限公司
Priority date: 2022-10-18
Filing date: 2023-03-10
Publication date: 2024-04-25
Also published as: CN115543918A

Abstract

Disclosed in the present application are a file snapshot method and system, an electronic device, and a storage medium. The method comprises: acquiring a data description file, the data description file comprising metadata and a raw data file; backing up the data description file, so that data in the data description file is classified and backed up into a backup file on the basis of a file tree structure; and taking a snapshot of the backup file according to the directory of the backup file, so as to move data in the backup file into a snapshot file.

Description

File snapshot method, system, electronic device and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on October 18, 2022, with application number 202211272039.X. The entire contents of this application are incorporated by reference into this application.

Technical Field

The present application relates to the field of computer technology, and in particular to a file snapshot method, system, electronic device, and storage medium.

Background technique

File system technology is very common and mature, and it is highly feasible to use the characteristics of the file system to complete the snapshot storage business. Not only can the storage methods of the file system and the network file system be realized, making snapshot storage more selective; but also the snapshot storage business can be realized by efficiently using the cheap network file system.

However, the related art cannot implement classification, storage and management of snapshot files.

Summary of the invention

The present application provides a file snapshot method, system, electronic device and storage medium to avoid the situation in the related art where snapshot classification storage and management cannot be implemented, and file snapshots are performed based on a directory tree structure to implement classification storage and management of snapshot files.

In a first aspect, an embodiment of the present disclosure provides a file snapshot method, including:

Obtain a data description file, which includes metadata and original data files;

Backing up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;

Take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.

In a second aspect, an embodiment of the present disclosure provides a file snapshot system, including:

A file acquisition module, configured to acquire a data description file, the data description file includes: metadata and original data files;

A file backup module, configured to back up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure;

The file snapshot module is configured to take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

at least one processor; and

a memory communicatively connected to at least one processor; wherein,

The memory stores a computer program that can be executed by at least one processor, and the computer program is executed by at least one processor so that the at least one processor can execute the file snapshot method provided by the above-mentioned first aspect embodiment.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium storing Computer instructions, where the computer instructions are used to enable a processor to implement the file snapshot method provided in the first aspect embodiment when executed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a flow chart of a file snapshot method provided by an embodiment of the present application;

FIG2 is a flowchart of a file snapshot method provided by another embodiment of the present application;

FIG3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application;

FIG4 is a flowchart of a file backup method in a file snapshot method provided in an embodiment of the present application;

FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application;

FIG6 is a schematic diagram of the structure of a file snapshot system provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

Detailed ways

It should be noted that the terms "first", "second" and "target" in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

FIG1 is a flowchart of a file snapshot method provided by an embodiment of the present application. The present embodiment is applicable to situations where file snapshot processing is performed according to a file tree structure. The method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.

As shown in FIG1 , the method includes:

S101. Obtain a data description file.

In this embodiment, to perform snapshot processing on a file, it is first necessary to obtain a data description file, which includes metadata and original data files.

Metadata is data used to describe data attributes. It is descriptive information about data and information resources, or it can be structural data used to provide information about a certain resource. It is used to support functions such as indicating storage locations, historical data, resource search, and file records, and to achieve effective discovery, search, integrated organization of information resources, and effective management of used resources.

The original data file may be a business data file that needs to be snapshotted, for example, a data file generated by a business system or a data file stored in a database and waiting for a snapshot.

S102: Back up the data description file so that the data in the data description file is classified and backed up into the backup file based on the file tree structure.

In this embodiment, a backup file of the data description file is obtained. The backup file may be a data description file stored based on a file tree structure, or may be a copy of the data description file, and the content of the backup file is the same as that of the data description file.

The file tree structure can be a tree structure composed of multiple files or multiple data, or a directory tree structure in the form of a directory. The backup file can include metadata and original data files.

For example, in the file tree structure, when performing a backup processing business, a folder named "backup file" can be created, and metadata and original data files can be grouped into the "backup file" folder to represent the basic backup file. When performing a specific backup business, the metadata and original data files can have a large number of original data files.

S103: Take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.

In this embodiment, a snapshot can be a completely usable copy of a backup file, which includes an image of the corresponding data at a certain point in time (the time when the copy starts). A snapshot can be a copy of the data it represents, or a replica of the data. For a backup file, a file snapshot is an instant copy of the backup file, which contains all the information of the backup file at the time the snapshot is generated, and is also a completely usable copy.

The backup file is snapshotted according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file. The snapshot is based on the snapshot of the backup data, and the backup file is snapshotted and stored based on the file tree structure of the backup data. When the backup file is snapshotted, the data is moved to the snapshot file, and the backup data in the backup file is cleared. After the file snapshot processing is performed on the data description file, only the original data and the snapshot data are saved, and the backup data no longer exists. The backup data can be a transitional data in the file snapshot process. It should be noted that the backup data can be tampered with or modified, but the snapshot data cannot be modified.

Exemplarily, when executing the snapshot processing service, a folder named "snapshot file" may be created. When taking a snapshot, the metadata and data files related to the basic backup storage are moved to the folder.

In this embodiment, a data description file is obtained, the data description file is backed up, the data in the data description file is classified and backed up in the backup file based on the file tree structure, a snapshot is taken of the backup file according to the directory of the backup file, and the data in the backup file is moved to the snapshot file. In the above technical solution, the file tree structure is used to classify and back up files, which can realize the classification management of data and effectively improve the indexing efficiency.

Based on the above embodiment, this exemplary embodiment interprets the metadata in the data description file, wherein the metadata includes:

1) Index metadata, which is used to describe the index information of the original data file.

In this embodiment, the index metadata may be a kind of description information for better describing what data the snapshot storage has. In the database, the target data is obtained according to the index value of the index metadata, and the index metadata may be used to store the index value.

For example, in order to perform snapshot storage faster, smaller index data is required, so the index metadata is designed and processed at two levels, including primary index metadata and secondary index metadata.

Among them, the primary index metadata can be used to describe the secondary index metadata.

In this embodiment, the primary index metadata may only use data bits to mark which secondary index description metadata it has. For example, the primary index metadata may store secondary index metadata.

Exemplarily, the primary index metadata may include 0 and 1, but is not limited to 0 and 1. 0 and 1 may include secondary index metadata, and the secondary index metadata may include a certain amount of data, such as 100. Under the primary index metadata, secondary index metadata 0-99 may be included, and under the primary index metadata 1, secondary index metadata 100-199 may be included. This embodiment does not limit the specific forms of the primary index metadata and the secondary index metadata.

Secondary index metadata can be used to describe the index of the original data in the original data file.

In this embodiment, the secondary index metadata can be truly indexed into the data file to determine the specific data stored therein, and the location of the data storage can be determined through the secondary index. In data backup and snapshot, the data to be backed up or snapshotted can be directly determined through the secondary index metadata in the index metadata.

2) Data reference count metadata, which is used to describe the number of references to the original data in the original data file.

In this embodiment, the data reference count metadata can be understood as data that can record the number of times the original data in the original data file is referenced, and whether the original data has been used is determined by recording its number of references. When the data reference count metadata count is 0, it means that the corresponding content is no longer used, and the content can be cleaned up.

Exemplarily, once the data stored in the database is used, the number of references to the data will increase by 1, that is, the data reference count metadata count will increase by 1, and the use of the data may include backup or snapshot. When the number of data references is 0, it can be said that the data has not been used, which can indicate that the data is not needed. In this case, the data can be selectively not stored when performing file backup or snapshot. Similarly, data is stored in the form of a file, which can include multiple pieces of data, and the file may also be backed up or snapshotted. When the file is copied and backed up, the number of file references increases by 1. If the number of file references is 0, it can be said that the file has not been used. In this case, the file can be selectively not stored when performing backup or snapshot processing.

For example, in file backup or snapshot, differential backup can be performed, that is, not all original data files and data are backed up, but part of them can be backed up, and the part of them can include data with non-zero data reference counts and metadata counts. Data with zero counts does not need to be backed up.

3) File description metadata, which is used to describe the file storage information of the original data file.

In this embodiment, in the file description metadata, the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file.

The storage path of the original file data may be storage directory information of the data file, such as /backup file/original data file/0/100 and the like.

WORM protection information is about WORM technology. WORM technology is Write Once Read Many (WORM) or immutable storage technology, which ensures that written data remains in a read-only state. Authorized users can read data stored in WORM, but cannot modify, delete, or overwrite the data, thus meeting the requirements of data preservation and security. Users can create a shared folder on a WORM volume, and all core confidential data is stored in this folder for centralized protection.

Exemplarily, the WORM protection information may include whether WORM protection is adopted or a protection mechanism identification number. Whether WORM protection is adopted may include that when the data is represented as 0, it can indicate that WORM protection is not adopted, and when the data is represented as 1, it can indicate that a WORM protection mechanism is adopted. The protection mechanism identification number can be used to indicate the identification number of the adopted protection mechanism, and the adopted protection mechanism can be obtained by querying the protection mechanism identification number. It should be noted that the WORM protection information exists in the file description metadata of the original data file. A folder is created based on the original data file, and the subfolders in the folder are stored in the subfolders. In the directory, any data file has WORM protection information and a WORM protection mechanism exists. In file backup or file snapshot, the WORM protection information moves along with the original file.

For example, when performing file backup or snapshot, a shared folder can be created in a WORM volume according to the file tree structure, and the files and data in the folder can be protected by WORM. If the folder has WORM protection information, then the folder's subordinate folders, that is, the sub-directory folders, also have WORM protection information. Since the WORM protection mechanism is tamper-proof, it can effectively prevent viruses from destroying data and data loss caused by misoperation or system crashes, prevent metadata loss caused by hacker attacks, and effectively protect snapshot storage from immutable data, greatly ensuring the security of file data.

The number of times an original data file is referenced can be understood as the number of times the original data file is used, copied, backed up, or snapshotted. If the number of references is 0, it means that the original data file has never been referenced, proving that it is not needed and can be selectively skipped when backing up or taking snapshots.

Fig. 2 is a flow chart of a file snapshot method provided by another embodiment of the present application. Fig. 3 is a schematic diagram of a file tree structure of a file snapshot method provided by an embodiment of the present application. This embodiment is applicable to the case where file snapshot processing is performed according to the file tree structure. The method can be executed by a file snapshot system, which can be implemented in the form of hardware and/or software.

In this embodiment, the data description file is backed up in a file tree structure, and a snapshot of the backup file is taken according to the directory of the backup file for further description.

As shown in FIG. 2 , the method includes:

S201, obtain a data description file.

In this embodiment, the data description file includes metadata and the original data file. The metadata includes index metadata, data reference count metadata, and file description metadata. The index metadata includes primary index metadata and secondary index metadata. The file description metadata is used to describe the file storage information including the storage path of the original file data, write-once-read-many WORM protection information, and the number of references to the original data file.

S202: Create a backup file subdirectory.

In this embodiment, a backup file subdirectory is created according to the file tree structure. The backup file subdirectory can be named according to the needs, and in this embodiment, it is named "backup file". The "backup file" subdirectory can include all backup contents of the data description file.

S203: Create a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory.

In this embodiment, based on the file tree structure, a backup metadata subdirectory and a backup file data subdirectory are created under the backup file subdirectory. The subdirectory naming rule can be determined according to the requirements, and this embodiment does not limit this. For example, the backup metadata subdirectory for storing metadata under the backup file subdirectory can be named "backup metadata", and the backup file data subdirectory for storing backup files can be named "backup file data".

S204: Back up the metadata in the data description file to the backup metadata subdirectory.

In this embodiment, the backup metadata subdirectory may include index metadata, data reference count metadata, and file description metadata in the data description file. The index metadata may include primary index metadata and secondary index metadata. The data reference count metadata may include the number of references to the data file, for example, 0, 1, 2, or n times. The more references there are, the higher the demand for the data to be used.

Exemplarily, the backup path of metadata A may be presented as: /backup file/backup metadata/data reference count metadata/A.

S205. According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.

In this embodiment, according to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up to the backup file data subdirectory, which may include classifying and backing up the original data in the original data file whose number of references is not zero and backing up to the backup file data subdirectory by quantity. The original data whose number of references is not zero can be considered as the original data that is required to be used. Considering that there may be a lot of original data files, in order to avoid unpredictable results caused by too many data files, this solution classifies the original data files according to a certain number of files, and stores a certain amount of data files under each data file corresponding to a number in the backup file data subdirectory, so as to achieve classified backup by quantity.

Exemplarily, the backup file data subdirectory may include at most M data file subdirectories, and each data file subdirectory may include at most N original data files. In this embodiment, M and N are 1024. For example, the original data file contains 2000 original data files, and the original data is backed up to the backup file data subdirectory. The backup file data subdirectory may include two data file subdirectories, data file subdirectory "0" is used to store original data files 0 to 1023; data file subdirectory "1" is used to store original data files 1024 to 1999. In the backup file, at most M*N original data can be backed up, which can be 1024*1024 original data in this embodiment.

Exemplarily, the backup path of the original data file may be presented as: /backup file/backup file data/0/*.

For example, FIG4 is a flowchart of a file backup in a file snapshot method provided in an embodiment of the present application.

As shown in Figure 4, when backing up a data file, first check whether the data file exists. If the data file does not exist, create a data block file, which can be understood as creating a backup file subdirectory to accommodate the backup file data information; if the data file exists, modify the data block file, which can be understood as adding the backup data file information in the backup file subdirectory.

Then, it searches for the existence of a secondary index file (the secondary index metadata is stored in the secondary index file). If the secondary index file does not exist, it is created; if the secondary index file exists, it is modified, which can be understood as adding secondary index metadata to the secondary index file. The secondary index metadata can be indexed into the data file to determine the specific data stored therein, and the location of data storage can be determined through the secondary index.

Check whether there is a first-level index file (first-level index metadata is stored in the first-level index file). If there is no first-level index file, create a first-level index file; if there is a first-level index file, modify the first-level index file, which can be understood as adding first-level index metadata to the first-level index file. Among them, the first-level index metadata can only use data bits to mark which second-level index description metadata it has. For example, the first-level index metadata can store the second-level index metadata.

Check whether there is a data block reference file (used to store data reference count metadata). If there is no data block reference file, create a data block reference file; if there is a data block reference file, modify the data block reference file, which can be understood as adding data reference count metadata to the data block reference file. Among them, data reference count metadata can be understood as data that can record whether the data in the data file is referenced and the number of references.

Check whether there is a file description (used to store file description metadata). If there is no file description, create a file description; if there is a file description, modify the file description, which can be understood as adding file description metadata to the file description. Among them, the file description metadata can be used to describe the file storage information of the original data file (the storage path of the original file data, the write-once-read-many WORM protection information and the number of references to the original data file).

S206: When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated.

The snapshot file subdirectory may be considered as a subdirectory for storing snapshot files corresponding to the original data files.

In this embodiment, a snapshot file subdirectory is created when a snapshot of a backup file is taken. The subdirectory naming rule can be determined according to the needs, and this embodiment does not limit this. For example, the snapshot file subdirectory can be named "snapshot file". The "snapshot file" subdirectory can include all the contents of the backup file. The Universally Unique Identifier (UUID) allows each folder to have unique identification information and be regarded as a unique identifier in all space and time, without considering the problem of name duplication when creating it.

S207: Create a snapshot metadata subdirectory named with a globally unique identifier under the snapshot file subdirectory.

The snapshot metadata subdirectory may be considered as a subdirectory for storing snapshot metadata corresponding to the backup metadata under the backup file data subdirectory. The snapshot metadata refers to data obtained by taking a snapshot of the backup metadata.

In this embodiment, when a snapshot is taken of a backup file, a snapshot metadata subdirectory is created under the snapshot file subdirectory to store the snapshot metadata. The snapshot metadata subdirectory is named using the generated globally unique identifier. It should be noted that the backup data file may be snapshotted multiple times, that is, the backup data file will generate multiple snapshot data, and therefore, this part of the data will be moved to different snapshot subdirectories named using globally unique identifiers.

S208. Create a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file.

The snapshot file data subdirectory may be considered as a subdirectory for storing snapshot file data corresponding to the backup file data under the backup file data subdirectory, and the snapshot file data refers to file data obtained by taking a snapshot of the backup file data.

In this embodiment, when a snapshot is taken of a backup file, a snapshot file data subdirectory is created under the snapshot file subdirectory for storing snapshot file data. The snapshot file data subdirectory may be named the same as the backup file data subdirectory.

S209: Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory.

In this embodiment, the index metadata may include primary index metadata and secondary index metadata, that is, the snapshot metadata subdirectory may include primary index metadata, secondary index metadata and file description metadata. The index metadata and file description metadata of the backup file are moved to the snapshot metadata subdirectory, and the index metadata and file description metadata in the backup file disappear.

Exemplarily, the backup path of a certain metadata may be presented as: /snapshot file/snapshot metadata/secondary index metadata.

S210: According to the metadata backed up in the backup file, move the backed up original data file to the snapshot file data subdirectory.

In this embodiment, according to the metadata backed up in the backup file, the backed up original data files are moved to the snapshot file data subdirectory, including moving the original data files whose reference counts are not zero to the snapshot file data subdirectory. At this time, the backed up original data files in the backup file disappear.

It should be noted that since the backup file data in the backup file data subdirectory is classified by quantity and backed up, after the snapshot, it is still classified by quantity and stored in the snapshot file data subdirectory. If the file is set to be WORM-protected in the metadata of the data description file, the original data files backed up in the snapshot file data subdirectory are still WORM-protected after the snapshot, thereby achieving both classification management of data and effective immutable data protection of snapshot storage.

Exemplarily, the snapshot file data subdirectory may include at most N data files, and each data file may include at most M original data. In this embodiment, N and M are 1024. For example, the original data file contains 2000 original data files, and the original data is snapshotted to the snapshot file data subdirectory. The snapshot file data subdirectory includes two data files, data file "0" is used to store original data files 0 to 1023; data file "1" is used to store original data files 1024 to 1999. In the snapshot file, at most N*M original data can be snapshotted, which can be 1024*1024 original data in this embodiment.

Exemplarily, the snapshot path of the backed-up original data file may be presented as: /snapshot file/snapshot file data/globally unique identifier/0/*.

In this embodiment, by obtaining a data description file, creating a backup file subdirectory, creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory, backing up the metadata in the data description file to the backup metadata subdirectory, and according to the metadata in the data description file, backing up the original data contained in the original data file by quantity to the backup file data subdirectory, creating a snapshot file subdirectory when taking a snapshot of the backup file, and generating a globally unique identifier, creating a snapshot metadata subdirectory named with a globally unique identifier under the snapshot file subdirectory, creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file, moving the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory, and moving the backed-up original data file to the snapshot file data subdirectory according to the metadata backed up in the backup file. In the above technical solution, a unique directory tree structure design is adopted, which can not only classify and manage data, but also effectively protect snapshot storage from immutable data. In file backup and snapshot, the original data is classified and stored by quantity, reducing the amount of data stored in a folder, effectively avoiding the situation where storage errors occur during file backup and cause starting over, and realizing the storage requirements of data in the field of data backup and disaster recovery. It also realizes the classification management of data, effectively improving the security of data storage.

Based on the above embodiments, this example embodiment divides the original data file. FIG5 is a schematic diagram of a data block structure involved in a file snapshot method provided in an embodiment of the present application. The original data file stores specific data in the business. In order to quickly access and verify specific data, a special data structure is designed inside the original data file. In this embodiment, the original data file is logically divided into regions. The original data file may include:

1) Header description area, used to describe the file system version, number of data blocks and checksum length information.

In this embodiment, the version of the file system, its version information is recorded in each specific file, and the file includes all the metadata files and data files mentioned above. By recording the version, data of different versions can be processed differently. This part of information will be stored in the first 32 bytes of each file header in the form of a character mark.

2) A check area, used to record a check value determined based on the check length information and the number of data blocks.

In this embodiment, the length of the check area can be calculated by multiplying the check value unit length (for example, 4 bytes) by the number of data blocks (for example, The calculation is: 4*1024*(1*1024*1024)/(64*1024) multiplied by the length of each block (e.g. 1024) divided by the checksum length (e.g. 64K). The checksum area will record the checksum value of each data to be checked according to the checksum length of the header description area and the number of data blocks. The checksum value can be a unique value obtained by the hash algorithm, or it can be confirmed by other methods. The design of the checksum area can verify the data according to the specific importance of the data to ensure the correctness of the data.

Exemplarily, the data to be stored is subjected to a hash algorithm calculation according to a certain length to obtain a check value of the data, and the check value is stored in the check area. The hash algorithm may be, for example, the MurmurHash2 algorithm, which is not limited in the embodiment of the present application.

According to the importance of the data, if necessary, the same algorithm can be used to calculate the check value obtained from the data area of the data file. This value is compared with the value in the check area to ensure the validity of the data.

3) Offset area, used to record the offset of the positioning data block identifier.

In this embodiment, the offset area records the offset of the positioning data block identifier, and the specific data block is positioned through the data block identifier offset.

4) Data area, used to store original data based on data blocks.

In this embodiment, the data area performs additional storage according to a fixed data size (1MB) of each data block, and the specific number of data blocks may be the number of data blocks mentioned in the header description information, for example, it may be 1024 blocks by default.

In this embodiment, by imposing a certain limit on the number of data blocks, the size of the data file can be controlled so that the best space and efficiency performance can be obtained in a specific snapshot service.

FIG6 is a schematic diagram of the structure of a file snapshot system provided in an embodiment of the present application. As shown in FIG6 , the system includes:

A file acquisition module 61 is configured to acquire a data description file, wherein the data description file includes metadata and an original data file;

A file backup module 62, configured to back up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;

The file snapshot module 63 is configured to take a snapshot of the backup file according to the directory of the backup file, so as to move the data in the backup file to the snapshot file.

The file snapshot system adopted in this technical solution uses a file tree structure to classify and back up files, which can realize classified management of data and effectively improve indexing efficiency.

For example, the metadata includes:

Index metadata, configured to describe index information of the original data file;

Data reference count metadata, which is set to describe the number of references to the original data in the original data file;

The file description metadata is configured to describe the file storage information of the original data file.

For example, the index metadata includes:

Primary index metadata, set to describe the secondary index metadata;

The secondary index metadata is set to describe the index of the original data in the original data file.

For example, the file storage information includes: the storage path of the original file data, the write-once-read-many WORM protection Information and number of citations of the original data file.

For example, the file backup module 62 may be configured as follows:

Create a backup file subdirectory;

Creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;

Backing up the metadata in the data description file to the backup metadata subdirectory;

According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.

For example, the file snapshot module 63 may be configured as follows:

When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated;

Creating a snapshot metadata subdirectory named with the globally unique identifier under the snapshot file subdirectory; creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;

Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory;

According to the metadata backed up in the backup file, the backed up original data file is moved to the snapshot file data subdirectory.

For example, the original data file includes: a header description area, a check area, an offset area and a data area;

The header description area is used to describe the version of the file system, the number of data blocks and the check length information;

The check area is used to record a check value determined according to the check length information and the number of data blocks;

The offset area is used to record the offset of the positioning data block identifier;

The data area is used to store original data based on data blocks.

The file snapshot system provided in the embodiments of the present application can execute the file snapshot method provided in any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 7 shows a block diagram of an electronic device 70 that can be used to implement an embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or required herein.

As shown in FIG. 7 , the electronic device 70 includes at least one processor 71, and a memory connected to the at least one processor 71 in communication, such as a read-only memory (ROM) 72, a random access memory (RAM) 73, etc., wherein the memory stores a computer program that can be executed by at least one processor, and the processor 71 can perform a variety of appropriate actions and processes according to the computer program stored in the read-only memory (ROM) 72 or the computer program loaded from the storage unit 78 to the random access memory (RAM) 73. In the RAM 73, a variety of programs and data required for the operation of the electronic device 70 can also be stored. The processor 71, ROM 72, and RAM 73 are connected to each other via a bus 74. An input/output (I/O) interface 75 is also connected to the bus 74.

A number of components in the electronic device 70 are connected to the I/O interface 75, including: an input unit 76, such as a keyboard, a mouse, etc.; Output unit 77, such as various types of displays, speakers, etc.; storage unit 78, such as magnetic disks, optical disks, etc.; and communication unit 79, such as network cards, modems, wireless communication transceivers, etc. The communication unit 79 allows the electronic device 70 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

The processor 71 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 71 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a variety of special artificial intelligence (AI) computing chips, a variety of processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc. The processor 71 performs the multiple methods and processes described above, such as the file snapshot method.

In some embodiments, the file snapshot method may be implemented as a computer program, which is tangibly contained in a computer-readable storage medium, such as a storage unit 78. In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 70 via the ROM 72 and/or the communication unit 79. When the computer program is loaded into the RAM 73 and executed by the processor 71, one or more steps of the file snapshot method described above may be performed. Alternatively, in other embodiments, the processor 71 may be configured to perform the file snapshot method in any other suitable manner (e.g., by means of firmware).

Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

The computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that when the computer programs are executed by the processor, the functions/operations specified in the flow charts and/or block diagrams are implemented. The computer programs may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

In the context of the present application, a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, device, or apparatus. A computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing. Alternatively, a computer readable storage medium may be a machine readable signal medium. More specific examples of machine readable storage media may include electrical connections based on one or more lines, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. A computer readable storage medium may be a non-transitory computer readable medium. A readable storage medium.

To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).

The systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), a blockchain network, and the Internet.

A computing system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The client and server relationship is generated by computer programs running on the corresponding computers and having a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to avoid the management difficulty and weak business scalability of physical hosts and VPS services in related technologies.

It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the multiple steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution of this application can be achieved, and this document is not limited here.

Claims

A file snapshot method, comprising:

Acquire a data description file, wherein the data description file includes: metadata and an original data file;

Backing up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;

A snapshot is taken of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
The method according to claim 1, wherein the metadata comprises:

Index metadata, used to describe index information of the original data file;

Data reference count metadata, used to describe the number of references to the original data in the original data file;

The file description metadata is used to describe the file storage information of the original data file.
The method according to claim 2, wherein the index metadata comprises:

Primary index metadata, used to describe secondary index metadata;

The secondary index metadata is used to describe the index of the original data in the original data file.
The method according to claim 2, wherein the file storage information includes: a storage path of the original file data, write-once-read-many WORM protection information, and a reference count of the original data file.
The method according to claim 1, wherein the step of backing up the data description file so that the data in the data description file is classified and backed up into a backup file based on a directory tree structure comprises:

Create a backup file subdirectory;

Creating a backup metadata subdirectory and a backup file data subdirectory under the backup file subdirectory;

Backing up the metadata in the data description file to the backup metadata subdirectory;

According to the metadata in the data description file, the original data contained in the original data file is classified by quantity and backed up in the backup file data subdirectory.
The method according to claim 5, wherein taking a snapshot of the backup file according to the directory of the backup file so as to move the data in the backup file to the snapshot file comprises:

When taking a snapshot of the backup file, a snapshot file subdirectory is created, and a globally unique identifier is generated;

Creating a snapshot metadata subdirectory named with the globally unique identifier under the snapshot file subdirectory;

Creating a snapshot file data subdirectory in the snapshot file according to the backup file data subdirectory of the backup file;

Move the index metadata and file description metadata of the backup file to the snapshot metadata subdirectory;

According to the metadata backed up in the backup file, the backed up original data file is moved to the snapshot file data subdirectory.
The method according to claim 1, wherein the original data file comprises: a header description area, a check area, an offset area and a data area;

The header description area is used to describe the version of the file system, the number of data blocks and the check length information;

The check area is used to record a check value determined according to the check length information and the number of data blocks;

The offset area is used to record the offset of the positioning data block identifier;

The data area is used to store original data based on data blocks.
A file snapshot system, comprising:

A file acquisition module, configured to acquire a data description file, wherein the data description file includes: metadata and an original data file;

A file backup module, configured to back up the data description file so that the data in the data description file is classified and backed up into a backup file based on a file tree structure;

The file snapshot module is configured to take a snapshot of the backup file according to the directory of the backup file, so that the data in the backup file is moved to the snapshot file.
An electronic device, comprising:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the file snapshot method according to any one of claims 1 to 7.
A computer-readable storage medium stores computer instructions, wherein the computer instructions are used to enable a processor to implement the file snapshot method according to any one of claims 1 to 7 when executed.