CN114064361A - Data writing method executed in backup related operation and backup gateway system - Google Patents

Data writing method executed in backup related operation and backup gateway system Download PDF

Info

Publication number
CN114064361A
CN114064361A CN202111355822.8A CN202111355822A CN114064361A CN 114064361 A CN114064361 A CN 114064361A CN 202111355822 A CN202111355822 A CN 202111355822A CN 114064361 A CN114064361 A CN 114064361A
Authority
CN
China
Prior art keywords
data
backed
backup
check value
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111355822.8A
Other languages
Chinese (zh)
Inventor
孟令斌
贾志威
程实
薛志辉
傅翠云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111355822.8A priority Critical patent/CN114064361A/en
Publication of CN114064361A publication Critical patent/CN114064361A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data writing method and a backup gateway system executed in a backup related operation are disclosed. The system comprises: the protocol interface layer is used for providing a uniform interface corresponding to backup related operation for each application system; the input/output layer is used for receiving the target storage type and transmitting the target storage type to the storage driving layer; the storage drive layer is used for determining a target drive program matched with the target storage type according to the target storage type and finishing backup related operation, wherein the input and output layer executes the following operation when performing backup operation: segmenting the data to be backed up of the current version into a plurality of pieces; calculating a check value of each slice; providing a plurality of slices to a storage drive layer, respectively; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment. The backup gateway system is arranged between an application system and different types of target storage, faces to the application system, and shields implementation details of the different types of storage, so that the various types of target storage can be conveniently butted.

Description

Data writing method executed in backup related operation and backup gateway system
Technical Field
The present disclosure relates to the field of computers, and in particular, to a data writing method and a backup gateway system executed in backup related operations.
Background
In the computer field, due to the requirement of information level protection and the backup Storage requirement of a client, a target Storage of a backup operation has a plurality of different Storage types, such as an object Storage OSS, a distributed file Storage, an AWS 3, an IDC (Network Attached Storage) NAS (Network Attached Storage) under a client line, an SAN (Storage Area Network), a tape library, an optical disk tower, or the like. In order to realize the backup of the target storage, the application system designs a plurality of backup interfaces, and as the number of the application systems increases, a great deal of pressure is brought to operation and maintenance personnel by maintaining the plurality of backup interfaces.
Disclosure of Invention
In view of the above, an object of the present disclosure is to provide a data writing method, a data reading method and a backup gateway system performed in a backup related operation, so as to solve the above problems.
In a first aspect, an embodiment of the present disclosure provides a method for writing data executed in a backup related operation, including:
segmenting the data to be backed up of the current version into a plurality of pieces;
calculating a check value of each slice;
writing the plurality of slices to a target storage; and
metadata objects are organized and stored, the metadata objects including shard information and a check value for each shard.
In some embodiments, the data writing method further comprises:
judging whether the data to be backed up of the prior version exists or not, if so, obtaining a metadata object of the data to be backed up of the prior version, and obtaining fragment information and a check value of each fragment from the metadata object of the data to be backed up of the prior version;
the splitting the data to be backed up of the current version into a plurality of pieces is as follows:
segmenting the data to be backed up of the current version into a plurality of pieces according to the fragmentation information;
the data writing method further comprises the following steps:
comparing the check value of each piece of the data to be backed up of the current version with the check value of the corresponding piece of the data to be backed up of the prior version;
then said writing said plurality of slices to the target storage is:
and only writing the corresponding piece of the data to be backed up of the current version with inconsistent check value into the position of the corresponding piece of the data to be backed up of the previous version in the target storage.
In some embodiments, the data writing method further comprises:
calculating the integral check value of the data to be backed up of the current version; and
and storing the integral check value of the current version of the data to be backed up into the metadata object.
In some embodiments, the data writing method further comprises: and writing each piece and the check value of the piece into the same storage unit of the target storage.
In some embodiments, the plurality of slices are written to the target storage by parallel writes.
In some embodiments, further comprising:
retrieving a configuration file based on the target storage type to obtain a storage driver adapted to the target storage type, the writing the plurality of slices to the target storage comprises:
writing the plurality of slices to a target storage through a storage driver adapted to the target storage type.
In some embodiments, the backup-related operations comprise: backup operation, dump operation and sandbox operation, wherein the target storage type is one of the following storages: local storage, local cloud storage, offsite cloud storage, and other cloud storage.
In a second aspect, an embodiment of the present disclosure provides a method for reading data performed in a backup related operation, including:
acquiring fragment information of the backed-up data and a check value of each fragment from a metadata object of the backed-up data;
reading each piece of the backed-up data from a target storage according to the fragment information of the backed-up data;
calculating a check value of each piece of the backed-up data;
comparing the calculated check value of each piece with the check value of each piece acquired from the metadata object of the backed-up data, and if the check values are not consistent, marking the check value of each piece as a problem piece;
acquiring a global check value of the backed-up data from a metadata object of the backed-up data; and
and correcting the problem piece according to the whole check value and the check value of the read normal piece except the problem piece.
In some embodiments, parallel reads are employed to read the respective slices of backed up data from the target storage.
In a third aspect, an embodiment of the present disclosure provides a backup gateway system, including:
the protocol interface layer is used for providing a uniform interface corresponding to backup related operation for each application system, and a target storage type is required to be provided when the uniform interface is called;
the input/output layer is used for receiving the target storage type and transmitting the target storage type to the storage driving layer;
the storage drive layer is used for determining a target drive program matched with the target storage type according to the target storage type and finishing backup related operation through the target drive program, wherein the configuration file comprises corresponding relation data of the storage type and the storage drive program,
wherein, the backup related operation comprises backup operation, the backup gateway system also transmits the data to be backed up of the current version to the input/output layer through the uniform interface when performing the backup operation,
when the input and output layer performs backup operation, the following operations are executed: the data to be backed up of the current version is segmented into a plurality of pieces; calculating a check value of each slice; providing the plurality of slices to the storage drive layer, respectively; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment.
In some embodiments, the backup-related operations include a restore operation and a dump operation, and the backup gateway system, upon the restore operation or the dump operation, further performs the following operations:
acquiring fragment information of the backed-up data and a check value of each fragment from a metadata object of the backed-up data;
reading each piece of the backed-up data from a target storage according to the fragment information of the backed-up data;
calculating a check value of each piece of the backed-up data;
comparing the calculated check value of each piece with the check value of each piece acquired from the metadata object of the backed-up data, and if the check values are not consistent, marking the check value of each piece as a problem piece;
acquiring a global check value of the backed-up data from a metadata object of the backed-up data; and
and correcting the problem piece according to the whole check value and the check value of the read normal piece except the problem piece.
In a fourth aspect, an embodiment of the present disclosure provides a backup gateway device, including:
the protocol interface unit is used for providing a uniform interface corresponding to backup related operation for each application system, and a target storage type is required to be provided when the uniform interface is called;
the input/output unit is used for receiving the target storage type and transmitting the target storage type to the storage driving unit;
the storage driving unit is configured to determine a target driver matched with the target storage type according to the target storage type, and complete backup related operations through the target driver, where the configuration file includes data of a correspondence between the storage type and the storage driver, the backup related operations include backup operations, and the backup gateway system further transmits data to be backed up of a current version to the input/output layer through the uniform interface when performing the backup operations,
when the input and output layer performs backup operation, the following operations are executed: the data to be backed up of the current version is segmented into a plurality of pieces; calculating a check value of each slice; providing the plurality of slices to the storage drive layer, respectively; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor, where the memory further stores computer instructions executable by the processor, and the computer instructions, when executed, implement the data writing method according to the first aspect or the data reading method according to the second aspect.
In a sixth aspect, the present disclosure provides a computer-readable medium storing computer instructions executable by an electronic device, where the computer instructions, when executed, implement the data writing method of the first aspect or the data reading method of the second aspect.
The backup gateway system provided by the embodiment of the disclosure is arranged between a plurality of application systems and a plurality of target storages of different storage types, is oriented to various application systems, shields implementation details of various different storage types, and provides a uniform backup interface for the plurality of application systems, thereby facilitating docking of the target storages of different storage types, and simultaneously, the backup gateway system supports operations such as deduplication and fragmentation of written data according to requirements of clients.
Drawings
The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:
FIG. 1 is a hardware deployment diagram of a cloud service;
FIG. 2 is a schematic diagram of an exemplary application system, backup gateway system, and target storage of multiple different storage types;
fig. 3 is a schematic structural diagram of a backup gateway system provided in an embodiment of the present disclosure;
fig. 4a is a schematic diagram illustrating a relationship between processes of a backup gateway system according to an embodiment of the present disclosure;
FIG. 4b is an exemplary diagram of one data to be backed up stored at location L2 in FIG. 4 a;
FIG. 5 is a flowchart of a method for implementing backup related operations provided by an embodiment of the present disclosure;
FIG. 6a is a flowchart of a method for writing data implemented in a backup-related operation according to an embodiment of the present disclosure;
FIG. 6b is a flowchart of a data writing method implemented in a backup-related operation according to another embodiment of the disclosure;
FIG. 7 is a flowchart of a method for reading data of a backup-related operation according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a backup gateway device according to an embodiment of the present disclosure;
FIG. 9 shows a block diagram of an electronic device for implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.
Fig. 1 is a hardware deployment diagram. As shown, deployment diagram 100 includes a terminal 103 and a server cluster 102 in communication via a network 101.
Network 101 is a combination of one or more of a variety of communication technologies implemented based on exchanging signals, including but not limited to wired technologies employing electrically and/or optically conductive cables, and wireless technologies employing infrared, radio frequency, and/or other forms. In different application scenarios, the network 101 may be the internet, a wide area network or a local area network, and may be a wired network or a wireless network, for example, the network 101 is a local area network within a company.
The server cluster 102 is made up of a plurality of physical servers. The terminal 103 may be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. Various application systems are deployed on the server cluster 102, and the terminal 103 can acquire services provided by these application systems via the network 101.
FIG. 2 is a schematic diagram of an exemplary application system, a backup gateway system, and a scenario of a plurality of different storage types of target storage. As shown in the figure, the backup plug-in 301 and the dump plug-in 302 in the database backup system 300 respectively call a unified interface in the backup gateway system 500 to respectively complete the backup of the data to be backed up to the local storage 201 or the local cloud storage 202, or read the data from the local storage 201 or the local cloud storage 202 and store the data elsewhere. The local store 201 supports three ways: NAS (Network Attached Storage), FTP Storage 2012, and tape library 2013. The local cloud storage 202 includes cloud object storage 2014, database live backup 2015. Enterprise-level hybrid cloud backup platform 400 provides backup services 401, restore services 402, dump services 403, and sandbox services 404, which also invoke the unified interfaces in backup gateway system 500 to complete backup, restore, dump, and sandbox operations.
As shown in connection with fig. 1, database backup system 300 is deployed on one or more physical servers in server cluster 102, and hybrid cloud backup platform 400 backs up on the cloud servers. The cloud server is a virtual computer system formed by integrating software and hardware resources of one or more physical servers by adopting cloud software and redistributing the software and hardware resources. There are several situations for the various services referred to in fig. 2. For example, backup operations may be performed by backing up data from a cloud to local storage, backing up data from a system cloud to a local storage cloud, backing up data from one storage cloud to another storage cloud off-site, and backing up data from a system cloud to another cloud. Here, the other cloud storage 204 includes clouds 2041 to 2043, and 2041 to 2043 refer to clouds provided by companies other than the company. The cross-region cloud storage 203 is a storage cloud in region B.
It should be noted that, the database backup system 300 and the enterprise-level hybrid cloud backup platform 400 need to compile the code of the backup gateway system 500 into the executable program of the system in the compiling stage, so as to be able to successfully call each interface in the backup gateway system 500. The backup gateway system 500 is presented in different file formats based on different programming languages. For example, using JAVA programming, backup gateway system 500 is a JAR package consisting of a plurality of class files, and database backup system 300 or enterprise-level hybrid cloud backup platform 400 loads the JAR into a project and integrates the JAR package into the interior of an executable file by compiling, and when the executable file executes, calls various interfaces in the JAR.
Fig. 3 is a schematic structural diagram of a backup gateway system 500 according to an embodiment of the present disclosure. The overall architecture of the backup gateway system 500 is divided into 3 layers: a protocol interface layer 501, a data input output layer 502, and a storage driver layer 503.
The protocol interface layer 501 is used to provide a unified interface to the outside. Each interface provides multiple interface types so that a user can select it according to his or her needs, for example, five interface types of SDK 5011, API 5012, NFS/SMB (remote mount) 5013, FUSE (local mount) 5014, and file interface 5015 are shown.
The data input/output layer 502 receives the parameter values transmitted from the unified interface of the protocol interface layer 501 for the upper layer, performs corresponding operations, and transmits the parameter values to the corresponding interface of the storage driver layer 503 for the lower layer. The corresponding operations performed by the data input output layer 502 include, but are not limited to: metadata organization and storage 5021, deduplication 5022, garbage reclamation 5023, slicing 5024, and replication 5025. The metadata organization and storage 5021 is to construct metadata objects for data to be backed up and maintain the metadata objects, and the format of the metadata objects will be described in an exemplary manner. Deduplication 5022 refers to the deletion of duplicate data from a storage pool. Slicing refers to slicing the data to be backed up into multiple slices (chunks). Copy 5025 refers to copying one or several of the slices to a specified location. Garbage collection 5023 is the deletion of garbage data generated during the process, such as the deletion of temporary data made for the data to be backed up and metadata.
The storage driver layer 503 is used to provide a uniform interface to the data input/output layer 502 of the upper layer, and interface with the storage pool 200 of the lower layer, and perform corresponding operations through corresponding storage drivers.
In one embodiment, the storage driver layer 503 includes a configuration file for storing the correspondence data between the storage type and the storage driver. The unified interface provided by the storage driver layer 503 receives the target storage type from the unified interface of the protocol interface layer 501, retrieves the corresponding relationship data in the configuration file based on the target storage type to obtain a storage driver adapted to the target storage type, and uses this storage driver to implement backup related operations, where the backup related operations are mainly a backup operation, a restore operation, a dump operation, and a sandbox operation corresponding to the above four services. These four operations are actually composed of a read data operation and a write data operation. The read data operation is to read data from the target storage of the specified storage type through the corresponding storage driver. The write data operation is to write data to the target storage of the specified storage type through the corresponding storage driver.
In one embodiment, the storage drive layer 503 includes the following functions: cache 5031, object store 5032, object store driver 5033, NAS driver 5034, S3 driver 5035, and HCFS driver 5036. The storage driver is an interface program that interfaces the software with the target storage type of the storage pool. NAS driver 5034, S3 driver 5035 and HCFS driver 5036 correspond to the interfaces of NAS2011, cross-domain cloud storage 203 and tape library 2013 in fig. 2, respectively. Of course, some of the storage drivers are not shown.
The write data operation and the read data operation performed in the backup related operation will be described in detail below based on fig. 3.
In one embodiment, the data writing operation is to receive data to be backed up through a corresponding interface of the protocol layer interface layer, and then perform the following steps: firstly, slicing 5024 is used for slicing data, and a sliced data unit is called a slice (chunk); then, a metadata object is constructed for the data to be backed up by adopting metadata organization and storage 5021; finally, the plurality of slices and metadata objects are written to the storage pool 200 by invoking the unified interface for write operations provided by the storage driver layer 503.
In some embodiments, the check value for each slice and the overall check value for the data to be backed up are calculated and organized in the metadata object for the data to be backed up. As an example, a metadata object of data to be backed up may include the following 3 parts: header information (header), Segment information (Segment), and trailer information (Tailer). The header information records information such as the time of starting writing of a file, an algorithm for generating a check value, and a file version number. The segment information records the check value and the fragmentation information of each fragmented piece, and the fragmentation information includes, for example, how many pieces of data to be backed up are fragmented, the offset of each piece, the size of each piece, and the like. And the tail information records the information of the check value, the data size, the number of the pieces and the like of the whole data to be backed up.
Fig. 4a is a diagram for illustrating an example of a relationship between processes of a backup gateway system according to an embodiment of the present disclosure. In this example, the system daemon Server receives data to be backed up and transmits the data to the fragmentation process chunkservice for fragmentation, the chunkservice sends a plurality of fragments to the metadata maintenance process MetaService, the metadata maintenance process MetaService appends information about the plurality of fragments to an existing metadata object, and the MetaService uploads the plurality of fragments to the storage engine StorageEngine in a concurrent manner, and then the storage engine StorageEngine calls the storage driver ossserver to upload the metadata object and the plurality of fragments to the target storage in the storage pool 200. As shown in the figure, a plurality of pieces of metadata objects and data to be backed up are stored at locations L1 and L2, respectively.
Fig. 4b is an exemplary diagram of one data to be backed up stored at the location L2 in fig. 4 a. Referring to the figure, n storage units (e.g., storage blocks or storage pages) are used to store n pieces of data to be backed up and check values of the n pieces, and each storage unit 401 stores one piece and a check value of the piece. The overall verification value FV of the data to be backed up is stored in another storage unit. Thus, when a data reading operation is performed, the metadata object of the data to be backed up may be read first, the layout information of the entire data to be backed up is obtained according to the metadata object, then the check value of each piece is read out from the storage pool 200 based on the offset of each piece in the data to be backed up, the check value of each piece is compared with the corresponding check value taken out from the metadata object, if the check values are consistent, it is indicated that the piece has no falsified or lost data, if the check values are inconsistent, it is indicated that the piece has been falsified or data is lost, and the above operations are repeated to finally obtain all pieces of the data to be backed up. Alternatively, a read data operation may be aborted when the parity value of a slice read from the storage pool 200 does not match the parity value of the slice in the metadata object. In addition, the whole verification value FV of the data to be backed up may be taken out from the storage pool 200, and based on the correspondence between the verification value FV and the verification values of all the slices, it is determined whether the whole verification value FV of the data to be backed up and the verification values of all the slices satisfy a predetermined relationship, so as to determine that the data is not tampered or lost.
Fig. 4b illustrates an exemplary and non-limiting storage manner, and in fact, if the check values of the respective slices and the global check value of the data to be backed up are already stored in the metadata object, it may not be necessary to store the check value of each slice and the global check value of the data to be backed up in the storage pool 200 any more. Thus, when data reading operation is performed, the metadata object of the data to be backed up may be read first, the layout information of the entire data to be backed up may be acquired from the metadata object, then each piece may be read one by one from the storage pool 200, the check value of the piece may be calculated according to the algorithm for generating the check value in the metadata object, and compared with the check value of the piece directly read out from the metadata object to determine whether the two pieces are consistent, if not, the piece may be marked as a problem piece, and then the next piece may be continuously read until all pieces of the data to be backed up are read out.
In some embodiments, the duplication 5025 and deduplication 5022 are performed with check values in the metadata object. Specifically, since the backup operation is usually performed periodically, when the current version of the data to be backed up is copied to the storage pool 200, it is first checked whether there is a metadata object of the previous version of the data to be backed up, if so, the current version of the data to be backed up is segmented into a plurality of pieces according to the segment information in the metadata object of the previous version of the data to be backed up, and the check value of each piece is calculated one by one, the check value of each piece is compared with the check value of the corresponding piece of the previous version of the data taken from the metadata object, if so, it is indicated that the pieces of the two versions are the same, it is not necessary to copy the piece into the storage pool 200, and if not, it is indicated that the corresponding pieces of the two versions are different, the piece is copied to the position of the corresponding piece in the storage pool 200. For deduplication 5022, through check values of each slice in metadata objects of two data to be backed up, if the check values are consistent, it is indicated that the corresponding two slices are the same, only one slice is reserved, and metadata objects of the two data to be backed up, for example, metadata objects a and B, respectively contain offsets and check values of slices chunk1 and chunk2, the chunks 1 are deleted, and information in chunk1 in the metadata object a is modified according to information of chunk2 in the B.
Fig. 5 is a flowchart of a method for implementing backup related operations according to an embodiment of the present disclosure. The method specifically comprises the following steps.
In step S501, a backup related request is received via a unified interface, the backup related request comprising a target storage type.
In step S502, the configuration file is retrieved based on the target storage type to obtain a storage driver adapted to the target storage type.
In step S503, a backup-related operation is implemented based on the storage driver adapted to the target storage type.
Referring to FIG. 2, the backup-related operations include a backup operation, a restore operation, a dump operation, and a sandbox operation, and the target storage type is one of the following storages: local storage, local cloud storage, offsite cloud storage, and other cloud storage. The method proposed in this embodiment may be implemented in the backup gateway system shown in fig. 2, and the method receives a backup related request from a specific application through a unified interface, reads a target storage type from the request, then retrieves the corresponding relationship data between the storage type and the storage driver in the pre-configured configuration file according to the target storage type to obtain the storage driver adapted to the target storage type, and implements a backup related operation based on the storage driver adapted to the target storage type.
According to the embodiment, a unified interface related to backup can be provided for each application system, and the matching driver determined from the plurality of storage drivers according to the target storage type in the unified interface can be used for completing backup related operations through the corresponding relation data between the storage type and the storage driver.
Fig. 6a is a flowchart of a data writing method implemented in a backup related operation according to an embodiment of the present disclosure. The method specifically comprises the following steps.
In step S601, the data to be backed up of the current version is sliced into a plurality of slices.
In step S602, a check value for each slice is calculated.
In step S603, the plurality of slices are written to the target storage.
In step S604, a metadata object including fragmentation information and a check value for each fragment is organized and stored.
According to the embodiment, the data to be backed up is divided into a plurality of pieces, the check value of each piece is calculated, and then the plurality of pieces are respectively stored in the target storage, wherein the target storage is one of the following types: the method comprises the steps of local storage, local cloud storage, remote cloud storage and other cloud storage, then organizing fragmentation information of each fragment and a check value of each fragment into a metadata object, and storing the metadata object. With reference to fig. 5, the step of storing the plurality of slices into the target storage respectively is completed by a storage driver adapted to the target storage type.
Fig. 6b is a flowchart of a data writing method implemented in a backup-related operation according to another embodiment of the present disclosure. The method specifically comprises the following steps.
In step S651, it is determined whether there is a previous version of data to be backed up, and if so, step S652 is executed.
Since the backup operation is usually performed periodically, in this embodiment, it is first checked whether there is a metadata object of the data to be backed up of the previous version, and if so, it is determined that the data to be backed up of the previous version is stored.
In step S652, a metadata object of the data to be backed up of the previous version is obtained, and the slice information and the check value of each slice are obtained from the metadata object of the data to be backed up of the previous version. If the metadata object contains the storage time of the previous version of the data to be backed up, the time may also be retrieved.
In step S653, the current version of data to be backed up is sliced into a plurality of slices according to the slicing information
In this step, the data to be backed up of the current version is segmented according to the segmentation mode of the data to be backed up of the previous version, so as to obtain a plurality of slices. For example, if the data to be backed up of the previous version is divided into one piece according to every 100 rows, the data to be backed up of the current version is divided into one piece according to every 100 rows. For example, if the developer recognizes that part of the data to be backed up is data that is frequently modified and the rest of the data is data that is not frequently modified, a slicing policy is specified to ensure that the data that is frequently modified and the data that is not frequently modified do not appear in the same slice, so that the obtained multiple slices are generally slices of an unfixed size, and similarly, the data to be backed up of the current version is sliced into multiple slices based on the slicing policy.
In step S654, the check value of each slice of the current version of data to be backed up is compared with the check value of the corresponding slice of the previous version of data to be backed up.
In step S655, it is determined whether the check value of each piece of the current version of data to be backed up is consistent with the check value of the corresponding piece of the previous version of data to be backed up. If not, step S656 is executed.
In step S656, the corresponding piece of the current version of the data to be backed up whose check values are inconsistent is copied to the location of the corresponding piece of the previous version of the data to be backed up in the target storage.
In step S657, whether all the slices are checked. If not, the process goes to step S654.
Steps S654 to S657 are to compare the check value of each piece of the current version with the check value of each piece of the previous version to determine whether the check value of each piece of the data to be backed up of the current version is consistent with the check value of the corresponding piece of the previous version, if so, it indicates that the corresponding piece of the two versions has not been modified, it may not be written into the corresponding location of the target storage, and if not, it indicates that the corresponding piece of the two versions has been modified, it is necessary to overwrite the corresponding piece of the previous version with the piece of the current version.
In this embodiment, in the data writing operation of the backup-related operation, it is determined whether to overwrite a corresponding piece of the previous version of the data to be backed up, which has been stored in the target storage, with each piece of the current version of the data to be backed up, by comparing the check value of each piece of the current version of the data to be backed up with the check value of each piece of the previous version of the data to be backed up. By the method, some pieces of the data to be backed up of the current version do not need to be copied to the target storage, so that the network bandwidth is saved, and the backup efficiency is improved.
In some embodiments, an overall check value of the current version of the data to be backed up is calculated, and a check value of each piece of the current version of the data to be backed up and an overall check value of the current version of the data to be backed up are written to the target storage along with each piece. Further, each slice and the check value of the slice are written into the same storage unit of the target storage.
In some embodiments, multiple slices and/or multiple check values corresponding to multiple slices are written in parallel to the target storage. For example, multiple write interfaces of the storage driver may be called in parallel to write multiple slices and/or multiple check values corresponding to multiple slices to the target storage. By concurrently writing to the target storage, backup performance can be improved.
In some embodiments, some of the pieces of data to be backed up on the target storage may be modified. For example, the application system provides a modification request, the backup gateway system obtains modification position information and a replacement text of the data to be backed up from the request, determines the slice where the backup gateway system is located according to the modification position information, and covers the corresponding slice with the replacement text.
Fig. 7 is a flowchart of a data reading method of a backup related operation according to an embodiment of the present disclosure. The method specifically comprises the following steps.
In step S701, the fragment information of the data to be backed up and the check value of each fragment are acquired from the metadata object of the data to be backed up.
In step S702, each piece of data to be backed up is read from the target storage according to the piece information of the data to be backed up.
In step S703, a check value of each piece of data to be backed up is calculated.
In step S704, the calculated check value of each piece is compared with the check value of each piece acquired from the metadata object of the data to be backed up, and if the check values are not consistent, step S705 is executed.
In step S705, the flag is a question sheet.
In step S706, the integrity check value of the backed-up data is acquired from the metadata object of the backed-up data.
In step S707, the defective patch is corrected based on the overall check value and the check values of the normal patches other than the defective patch that have been read out.
The embodiment provides a method for reading data to be backed up. Specifically, a metadata object of data to be backed up is found, then fragmentation information of the data to be backed up and a check value of each fragment are read from the metadata object, then each fragment of the data to be backed up is read from a target storage according to the fragmentation information of the data to be backed up, the check value of each fragment is calculated, the calculated check value of each fragment is compared with the check value of each fragment obtained from the metadata object of the data to be backed up, and if the calculated check values are not consistent, the data is marked as a problem fragment. Further, the overall check value of the data to be backed up is obtained from the metadata object of the data to be backed up, and error correction is performed on the problem pieces according to the overall check value and the read normal pieces except the problem pieces.
In some embodiments, step S702 is performed by parallel reading, i.e., reading multiple slices of data to be backed up from the target storage by parallel reading.
In some embodiments, a pre-read may also be provided, i.e. a pre-read of a number of sheets after the current sheet. For example, the application system sends a request to the backup gateway system for reading some pieces of data to be backed up, and the backup gateway system may continue to read some pieces behind these pieces after completing the reading operation of these pieces, and cache these pieces in the backup gateway system, and read the pieces from the cache to the application system when the subsequent application system sends a request to continue reading. The pre-read can improve read performance. Pre-reading may be used, for example, in read operations in recovery service 402, dump service 3, and sandbox service 404 in FIG. 2.
It should be understood that, in the above embodiments, the data to be backed up is usually presented in the form of a file, but the format of the file is not limited.
Fig. 8 is a schematic structural diagram of a backup gateway apparatus 800 according to an embodiment of the present disclosure. The protocol interface unit 801, the input/output unit 802, and the drive interface unit 803 included in the backup gateway apparatus 800 correspond to the interface protocol layer 501, the data input/output layer 502, and the storage drive layer 503 in fig. 3, respectively.
Briefly, the protocol interface unit 801 is configured to provide a plurality of unified interfaces corresponding to a plurality of backup-related operations to the respective application systems, each of which provides a plurality of interface types so that a user can select according to his/her needs, for example, fig. 3 illustrates five interface types of SDK 5011, API 5012, NFS/SMB (remote mount) 5013, FUSE (local mount) 5014, and file interface 5015, and the plurality of backup-related operations include a backup service 401, a restore service 402, a dump service 403, and a sandbox service 404. That is, in the example of fig. 3, each backup-related operation provides five interface types for selection by the application system. Also, the target storage type may be specified by a parameter of one of each unified interface.
The input/output unit 802 receives the parameter value of the storage type transmitted from the protocol interface unit 801, performs a corresponding operation, and transmits the parameter value to a corresponding interface of the driver interface unit 803.
The storage driver unit 803 is configured to provide a unified interface to the data input/output unit 802, for use by the input/output unit 802, and interface with various storage types in the underlying storage pool through a configuration file, and when the storage driver unit 803 receives a target storage type transmitted from the protocol unit interface 801, determine a matching storage driver according to the target storage type and the corresponding relationship data between the storage type in the configuration file and the storage driver, so as to complete a corresponding backup related operation based on the matching target driver.
Further, when the storage related operation is a backup, dump, or sandbox operation, the write data operation performed by the input output unit 802 includes the following operations:
segmenting the data to be backed up of the current version into a plurality of pieces; calculating a check value of each slice; writing a plurality of slices to a target storage; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment.
Further, when the storage related operation is a backup, dump, or sandbox operation, the write data operation performed by the input output unit 802 includes the following operations:
judging whether the data to be backed up of the prior version exists or not, if so, obtaining a metadata object of the data to be backed up of the prior version, and obtaining corresponding fragment information and a check value of each fragment from the metadata object of the data to be backed up of the prior version; segmenting the data to be backed up of the current version into a plurality of pieces according to the corresponding piece information; calculating a check value of each piece of the current version of data to be backed up; comparing the check value of each piece of the data to be backed up of the current version with the check value of the corresponding piece of the data to be backed up of the previous version; and only copying the corresponding piece of the current version of the data to be backed up with inconsistent check value to the position of the corresponding piece of the previous version of the data to be backed up in the target storage.
Further, when the storage related operation is a restore, dump or sandbox operation, the read data operation performed by the input output unit 802 includes the following steps:
acquiring fragment information of data to be backed up and a check value of each fragment from a metadata object of the data to be backed up; reading each piece of the data to be backed up from the target storage according to the piece information of the data to be backed up; calculating a check value of each piece of data to be backed up; and comparing the calculated check value of each piece with the check value of each piece acquired from the metadata object of the data to be backed up, and if the check values are not consistent, marking the piece as a problem piece.
The disclosed embodiments also provide an electronic device 900, as shown in fig. 9, which includes a memory 902 and a processor 901 on a hardware level, and in addition, in some cases, an input-output device 903 and other hardware 904. The Memory 902 is, for example, a Random-Access Memory (RAM), and may also be a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. The input/output device 903 is, for example, a display, a keyboard, a mouse, a network controller, or the like. The processor 901 may be constructed based on various models of processors currently on the market. The processor 901, the memory 902, the input/output device 903, and other hardware 904 are connected to each other via a bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one line is shown in FIG. 9, but this does not represent only one bus or one type of bus.
The memory 902 is used for storing programs. In particular, the program may comprise program code comprising computer instructions. The memory may include both memory and non-volatile storage and provides computer instructions and data to the processor 901. The processor 901 reads a corresponding computer program from the memory 902 into the memory and then runs the method of the above embodiment at the logic level.
As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.
Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (13)

1. A method for writing data executed in a backup-related operation includes:
segmenting the data to be backed up of the current version into a plurality of pieces;
calculating a check value of each slice;
writing the plurality of slices to a target storage; and
metadata objects are organized and stored, the metadata objects including shard information and a check value for each shard.
2. The data writing method of claim 1, wherein the data writing method further comprises:
judging whether the data to be backed up of the prior version exists or not, if so, obtaining a metadata object of the data to be backed up of the prior version, and obtaining fragment information and a check value of each fragment from the metadata object of the data to be backed up of the prior version;
the splitting the data to be backed up of the current version into a plurality of pieces is as follows:
segmenting the data to be backed up of the current version into a plurality of pieces according to the fragmentation information;
the data writing method further comprises the following steps:
comparing the check value of each piece of the data to be backed up of the current version with the check value of the corresponding piece of the data to be backed up of the prior version;
then said writing said plurality of slices to the target storage is:
and only writing the corresponding piece of the data to be backed up of the current version with inconsistent check value into the position of the corresponding piece of the data to be backed up of the previous version in the target storage.
3. The data writing method of claim 2, further comprising:
calculating the integral check value of the data to be backed up of the current version; and
and storing the integral check value of the current version of the data to be backed up into the metadata object.
4. The data writing method of claim 2, wherein the data writing method further comprises: and writing each piece and the check value of the piece into the same storage unit of the target storage.
5. The data writing method of claim 2, wherein the plurality of tiles are written to the target storage by parallel writing.
6. The data writing method of claim 1, further comprising:
retrieving a configuration file based on the target storage type to obtain a storage driver adapted to the target storage type, the writing the plurality of slices to the target storage comprises:
writing the plurality of slices to a target storage through a storage driver adapted to the target storage type.
7. The data writing method of any one of claims 1 to 6, wherein the backup-related operation comprises: backup operation, dump operation and sandbox operation, wherein the target storage type is one of the following storages: local storage, local cloud storage, offsite cloud storage, and other cloud storage.
8. A method of reading data performed in a backup-related operation, comprising:
acquiring fragment information of the backed-up data and a check value of each fragment from a metadata object of the backed-up data;
reading each piece of the backed-up data from a target storage according to the fragment information of the backed-up data;
calculating a check value of each piece of the backed-up data;
comparing the calculated check value of each piece with the check value of each piece acquired from the metadata object of the backed-up data, and if the check values are not consistent, marking the check value of each piece as a problem piece;
acquiring a global check value of the backed-up data from a metadata object of the backed-up data; and
and correcting the problem piece according to the whole check value and the check value of the read normal piece except the problem piece.
9. The data reading method of claim 8, wherein reading the respective slices of backed up data from the target storage is performed using parallel reads.
10. A backup gateway system, comprising:
the protocol interface layer is used for providing a uniform interface corresponding to backup related operation for each application system, and a target storage type is required to be provided when the uniform interface is called;
the input/output layer is used for receiving the target storage type and transmitting the target storage type to the storage driving layer;
the storage driver layer is configured to determine a target driver matched with the target storage type according to the target storage type, and complete backup related operations through the target driver, where the configuration file includes data corresponding to the storage type and the storage driver, the backup related operations include backup operations, and the backup gateway system further transmits data to be backed up of a current version to the input/output layer through the uniform interface when performing the backup operations,
when the input and output layer performs backup operation, the following operations are executed: the data to be backed up of the current version is segmented into a plurality of pieces; calculating a check value of each slice; providing the plurality of slices to the storage drive layer, respectively; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment.
11. A backup gateway apparatus, comprising:
the protocol interface unit is used for providing a uniform interface corresponding to backup related operation for each application system, and a target storage type is required to be provided when the uniform interface is called;
the input/output unit is used for receiving the target storage type and transmitting the target storage type to the storage driving unit;
the storage driving unit is configured to determine a target driver matched with the target storage type according to the target storage type, and complete backup related operations through the target driver, where the configuration file includes data of a correspondence between the storage type and the storage driver, the backup related operations include backup operations, and the backup gateway system further transmits data to be backed up of a current version to the input/output layer through the uniform interface when performing the backup operations,
when the input and output layer performs backup operation, the following operations are executed: the data to be backed up of the current version is segmented into a plurality of pieces; calculating a check value of each slice; providing the plurality of slices to the storage drive layer, respectively; and organizing and storing a metadata object, the metadata object including fragmentation information and a check value for each fragment.
12. An electronic device comprising a memory and a processor, the memory further storing computer instructions executable by the processor, the computer instructions, when executed, implementing the method of writing data according to any of claims 1 to 7.
13. A computer readable medium storing computer instructions executable by an electronic device, the computer instructions, when executed, implementing the method of writing data according to any one of claims 1 to 7.
CN202111355822.8A 2021-11-16 2021-11-16 Data writing method executed in backup related operation and backup gateway system Pending CN114064361A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111355822.8A CN114064361A (en) 2021-11-16 2021-11-16 Data writing method executed in backup related operation and backup gateway system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111355822.8A CN114064361A (en) 2021-11-16 2021-11-16 Data writing method executed in backup related operation and backup gateway system

Publications (1)

Publication Number Publication Date
CN114064361A true CN114064361A (en) 2022-02-18

Family

ID=80272866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111355822.8A Pending CN114064361A (en) 2021-11-16 2021-11-16 Data writing method executed in backup related operation and backup gateway system

Country Status (1)

Country Link
CN (1) CN114064361A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915555A (en) * 2022-04-27 2022-08-16 广州河东科技有限公司 Gateway driving communication method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915555A (en) * 2022-04-27 2022-08-16 广州河东科技有限公司 Gateway driving communication method, device, equipment and storage medium
CN114915555B (en) * 2022-04-27 2024-03-12 广州河东科技有限公司 Gateway drive communication method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10545833B1 (en) Block-level deduplication
US8683156B2 (en) Format-preserving deduplication of data
US8356017B2 (en) Replication of deduplicated data
US8788466B2 (en) Efficient transfer of deduplicated data
US9928210B1 (en) Constrained backup image defragmentation optimization within deduplication system
US7366859B2 (en) Fast incremental backup method and system
US8904137B1 (en) Deduplication system space recycling through inode manipulation
US8224792B2 (en) Generation of realistic file content changes for deduplication testing
US10740184B2 (en) Journal-less recovery for nested crash-consistent storage systems
US11663236B2 (en) Search and analytics for storage systems
US10353867B1 (en) Method and system for verifying files for garbage collection
CN108415986B (en) Data processing method, device, system, medium and computing equipment
US11243850B2 (en) Image recovery from volume image files
CN102323930B (en) Mirroring data changes in a database system
US9749193B1 (en) Rule-based systems for outcome-based data protection
US9817834B1 (en) Techniques for performing an incremental backup
US11210273B2 (en) Online file system check using file system clone
US20180341561A1 (en) Determining modified portions of a raid storage array
CN114064361A (en) Data writing method executed in backup related operation and backup gateway system
US20210255933A1 (en) Using inode entries to mirror data operations across data storage sites
US9830471B1 (en) Outcome-based data protection using multiple data protection systems
US10719256B1 (en) Performance of deduplication storage systems
US10372683B1 (en) Method to determine a base file relationship between a current generation of files and a last replicated generation of files
US10242025B2 (en) Efficient differential techniques for metafiles
US10747610B2 (en) Leveraging distributed metadata to achieve file specific data scrubbing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40067027

Country of ref document: HK