CN114077574A

CN114077574A - Data processing method and device and electronic equipment

Info

Publication number: CN114077574A
Application number: CN202010818303.XA
Authority: CN
Inventors: 梁建群
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2022-02-22

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device and a computer readable storage medium. The method comprises the following steps: the method comprises the steps of cloning snapshot data of a first cloud disk, wherein the snapshot data comprise source data recorded by a snapshot and metadata of the source data, and the metadata of the source data comprise a snapshot version number corresponding to the source data, a data identifier of the source data and storage parameters of the source data; the cloning of the snapshot data of the first cloud disk includes: the method further includes copying metadata in the snapshot data and creating a hard-link file associated with source data in the snapshot data in a second cloud disk.

Description

Data processing method and device and electronic equipment

Technical Field

The disclosed embodiments relate to the field of data processing technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium.

Background

The cloud disk cloning refers to cloning a snapshot of a parent cloud disk, obtaining a copy which is the same as the snapshot after cloning, and taking the copy as a child cloud disk. The cloud disk snapshot is used for recording a static image of the cloud disk at the current moment when the cloud disk snapshot is taken. After cloning, the modification of the child cloud disk does not affect the parent cloud disk.

At present, a cloud disk cloning method needs to add a piece of metadata in a mother cloud disk to record that a child cloud disk is added to the mother cloud disk. After the cloud disks are cloned for multiple generations, for example, the nth generation sub cloud disk of the 1 st generation cloud disk is cloned to obtain the nth +1 generation sub cloud disk (the nth generation sub cloud disk is a parent cloud disk of the nth +1 generation sub cloud disk), metadata in each parent cloud disk becomes huge and bloated, and management and storage become more and more complex. When reading data of a cloud disk, a generation of backtracking search may be required according to the genealogy, which increases the reading time.

Therefore, there is a need to provide a new method for cloud disk cloning.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium, so as to implement a new cloud disk cloning scheme.

According to a first aspect of the present disclosure, there is provided a data processing method, including:

the method comprises the steps of cloning snapshot data of a first cloud disk, wherein the snapshot data comprise source data recorded by a snapshot and metadata of the source data, and the metadata of the source data comprise a snapshot version number corresponding to the source data, a data identifier of the source data and storage parameters of the source data;

the cloning of the snapshot data of the first cloud disk includes: the method further includes copying metadata in the snapshot data and creating a hard-link file associated with source data in the snapshot data in a second cloud disk.

Optionally, the first cloud disk includes one or more data sets, where source data and metadata corresponding to the same data identifier belong to the same data set;

the cloning of the snapshot data of the first cloud disk includes: and cloning snapshot data of each data set of the first cloud disk.

Optionally, the metadata of the source data further includes a data sequence number of the source data;

upon receiving a write command to the second cloud disk, the method further comprises:

acquiring target source data and a data identifier of the target source data from the write command;

writing the target source data into a free storage space of a second cloud disk;

sequentially increasing the snapshot version number of the executed snapshot on the basis of the latest snapshot version number of the second cloud disk to obtain the snapshot version number corresponding to the target source data;

determining a data sequence number of the target source data based on a sequential increasing mode;

generating metadata of target source data and writing the metadata into a free storage space of a second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

according to the data identification of the target source data, candidate metadata are found out from the metadata of the second cloud disk; in the case where the candidate metadata is one, taking the candidate metadata as target metadata; when the candidate metadata are multiple, selecting the candidate metadata with the largest data sequence number from the multiple candidate metadata as target metadata;

writing the target source data into a free storage space of a second cloud disk under the condition that a hard link file corresponding to the target metadata exists in the second cloud disk;

Optionally, when a write command to the second cloud disk is received, the method further includes:

and under the condition that the second cloud disk does not have the hard link file corresponding to the target metadata, deleting operation is carried out according to the storage parameters in the target metadata, and the target source data is written according to the storage parameters in the target metadata.

when a read command for a second cloud disk is received, the method further comprises:

acquiring a target data identifier and a target snapshot version number from the read command;

when it is determined that metadata which is consistent with both the target data identifier and the target snapshot version number exists in all metadata of the second cloud disk, determining the metadata which is consistent with both the target data identifier and the target snapshot version number as first target metadata;

determining the metadata with the largest data sequence number in the first target metadata as second target metadata;

and performing reading operation according to the storage parameters in the second target metadata.

Optionally, when a read command for the second cloud disk is received, the method further includes:

when it is determined that metadata which is consistent with the target data identifier exists in all metadata of the second cloud disk and metadata which is consistent with both the target data identifier and the target snapshot version number does not exist, sequentially decreasing the target snapshot version number until third target metadata is determined from all metadata of the second cloud disk; the third target metadata comprises a target data identifier and a target snapshot version number which is processed in a descending order;

determining the metadata with the largest data sequence number in the third target metadata as fourth target metadata;

and performing reading operation according to the storage parameters in the fourth target metadata.

According to a second aspect of the present disclosure, there is provided a data processing apparatus comprising:

the processing module is used for cloning snapshot data of the first cloud disk, wherein the snapshot data comprises source data recorded by a snapshot and metadata of the source data, and the metadata of the source data comprises a snapshot version number corresponding to the source data, a data identifier of the source data and a storage parameter of the source data;

the processing module comprises a copying sub-module and a creating sub-module;

the replication submodule is used for replicating the metadata in the snapshot data in a second cloud disk;

the creating submodule is used for creating a hard link file associated with the source data in the snapshot data in the second cloud disk.

According to a third aspect of the present disclosure, there is provided a data processing apparatus comprising:

a data processing apparatus provided in a second aspect of the present disclosure; alternatively, the first and second electrodes may be,

a processor and a memory for storing computer instructions which, when executed by the processor, implement the data processing method provided by the first aspect of the disclosure.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium characterized by having stored thereon computer instructions which, when executed by a processor, implement the data processing method provided by the first aspect of the present disclosure.

In the embodiment of the disclosure, cloud disk cloning is performed by copying metadata in snapshot data of a first cloud disk and creating a hard link file associated with source data in the snapshot data, so that the metadata of the first cloud disk can be prevented from being added and a second cloud disk can conveniently inquire the cloned data.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a block diagram of a hardware configuration of an electronic device that can be used to implement an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a data processing method according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of metadata of a data processing method of an embodiment of the present disclosure;

fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a block diagram showing a hardware configuration of an electronic apparatus 1000 that can implement an embodiment of the present disclosure.

The electronic device 1000 may be a laptop, desktop, tablet, server, workstation, etc.

The servers can be unitary servers or distributed servers across multiple computers or computer data centers. The server may be of various types, such as, but not limited to, a node device of a content distribution network, a storage server of a distributed storage system, a cloud database server, a cloud computing server, a cloud management server, a web server, a news server, a mail server, a message server, an advertisement server, a file server, an application server, an interaction server, a storage server, a database server, a proxy server, or the like. In some embodiments, each server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server. For example, a server, such as a blade server, a cloud server, etc., or may be a server group consisting of a plurality of servers, which may include one or more of the above types of servers, etc.

As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 is capable of wired or wireless communication, for example, and may specifically include WiFi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1600 may include, for example, a touch screen, a keyboard, a somatosensory input, and the like. A user can input/output voice information through the speaker 1700 and the microphone 1800.

The electronic device shown in fig. 1 is merely illustrative and is in no way intended to limit the embodiments of the disclosure, their application, or uses. In an embodiment of the present disclosure, the memory 1200 of the electronic device 1000 is used for storing instructions for controlling the processor 1100 to operate so as to execute any data processing method provided by the embodiment of the present disclosure. It should be understood by those skilled in the art that although a plurality of means are shown for the electronic device 1000 in fig. 1, embodiments of the present disclosure may only refer to some of the means therein, for example, the electronic device 1000 may only refer to the processor 1100 and the storage 1200. The skilled person can design the instructions according to the disclosed embodiments of the present disclosure. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

< data processing method embodiment >

The data processing method provided by the embodiment of the disclosure includes step S102.

S102, cloning snapshot data of the first cloud disk. Cloning snapshot data of a first cloud disk, comprising: the metadata in the snapshot data is replicated in the second cloud disk and a hard link file associated with the source data in the snapshot data is created.

Snapshot is a technique for recording the process of data change, which is used to record the static image of the data at the current moment when the snapshot is taken. The modification of the data can be traced back through the snapshot technology, the data damage can be repaired, the loss of the data is reduced, and the reading of the data can also depend on the snapshot technology. In the embodiment of the present disclosure, when taking a snapshot of the first cloud disk, the source data and the metadata corresponding to the source data may be taken as a snapshot together.

Metadata (data about data) is data that describes data, and the described data is referred to as "source data". The metadata can be used to describe the attributes and parameters of the source data, and the source data can be managed through the metadata. In embodiments of the present disclosure, the source data may be the smallest unit of data for storing, modifying, reading, and deleting.

The first cloud disk may be pre-partitioned into one or more data sets. That is, the first cloud disk itself may be one data set, or the first cloud disk may be divided into a plurality of data sets. In step S102, the snapshot data of the first cloud disk is cloned, which may be the snapshot data of each data set of the first cloud disk.

When the first cloud disk is divided into a plurality of data sets, the first cloud disk may be logically divided into the plurality of data sets according to some mapping rule. In a first example, the source data and the metadata corresponding to each data identifier are divided into one data set only according to the data identifiers, that is, the source data and the metadata corresponding to one data identifier only belong to one data set, and one data set only contains the source data and the metadata corresponding to one data identifier. In a second example, the first cloud disk is divided into a plurality of data sets based on a hash mapping algorithm for data identifiers, in this case, source data and metadata corresponding to one data identifier only belong to one data set, but one data set may contain source data and metadata corresponding to one or more data identifiers, and this mapping method may make sizes of the plurality of data sets uniform.

Likewise, the second cloud disk may also be divided into one or more data sets. The mapping rule and the mapping algorithm of the second cloud disk for dividing the data set can be the same as those of the first cloud disk.

"hard link" is a data sharing technique, and when a hard link is created for an existing file a (referred to as an "original file") to obtain a file B (referred to as a "link file"), the original file and the link file share the same data content. Deleting only the link file or deleting only the original file does not affect the data content. That is, deleting a link file has no effect on the original file, and deleting the original file has no effect on the link file, so the hard link file mechanism can be used to prevent the file from being deleted by mistake.

Referring to fig. 2, after a data set of the first cloud disk is snapshot, snapshot data includes source data a1, source data a2, and source data A3, and metadata of source data a1, metadata of source data a2, and metadata of source data A3. The snapshot data of the data set of the first cloud disk is cloned, and after cloning, metadata of source data A1, metadata of source data A2 and metadata of source data A3 exist in the second cloud disk, and a hard link file B1 of source data A1, a hard link file B2 of source data A2 and a hard link file B3 of source data A3 also exist in the second cloud disk. Based on the hard link file B1, the hard link file B2, and the hard link file B3, the second cloud disk may share the data content of the source data a1, the source data a2, and the source data A3 recorded by the snapshot data, and the source data a1, the source data a2, and the source data A3 are regarded as the existing source data of the second cloud disk. In the example shown in fig. 2, the metadata in the snapshot data occupies about 100KB, and the time for performing the copy is very short, which may require only several tens of microseconds.

In the embodiment of the disclosure, by creating a hard link file associated with source data in snapshot data of a first cloud disk in a second cloud disk, the first cloud disk and the second cloud disk can share data content of the source data recorded by the snapshot data, and the data content of the source data recorded by the snapshot data is protected.

In the embodiment of the disclosure, cloud disk cloning is performed by copying metadata in snapshot data of a first cloud disk and creating a hard link file associated with source data in the snapshot data, without copying the source data of the first cloud disk to a second cloud disk, so that cloning speed is increased, and occupation of a storage space of the second cloud disk is reduced.

In this embodiment of the disclosure, for a first cloud disk, metadata of source data of the first cloud disk includes a snapshot version number corresponding to the source data, a data identifier of the source data, and a storage parameter of the source data. In this embodiment of the disclosure, the metadata of the source data of the first cloud disk may further include a data sequence number of the source data.

In an embodiment of the present disclosure, for the second cloud disk, the metadata of the source data includes a snapshot version number corresponding to the source data, a data identifier of the source data, a data sequence number of the source data, and a storage parameter of the source data.

In the embodiment of the disclosure, the metadata can be used to achieve the purpose of data retrieval, so that the source data can be quickly and accurately searched and read in the storage space, and the operations of reading, modifying and deleting the source data are facilitated.

The snapshot version number corresponding to the source data is explained below.

In the embodiment of the disclosure, after a snapshot is made on the cloud disk, a snapshot version number is allocated to a next snapshot (the next snapshot is not made yet) in advance according to a sequential increasing mode, and the allocation principle is that the snapshot version number of the next snapshot is sequentially increased by one unit on the basis of the snapshot version number of the previous snapshot. And performing snapshot on each pair of cloud disks, and sequentially increasing the version number of the pre-allocated snapshot by one unit.

Take the first cloud disk as an example to explain: the version number of the snapshot pre-allocated to the initial first cloud disk is 1, and at this time, the first cloud disk has not taken a snapshot yet. And executing a snapshot once by each pair of first cloud disks, and sequentially increasing the version numbers of the pre-allocated snapshots by one unit. A specific example is given below: the snapshot version number of the first snapshot is 1, and after the snapshot is taken for the first time, the pre-allocated snapshot version number is changed into 2. The snapshot version number of the second snapshot is 2, and after the second snapshot is taken, the pre-allocated snapshot version number is changed into 3. The snapshot version number of the third snapshot is 3, and after the third snapshot is taken, the pre-allocated snapshot version number is changed into 4. By analogy, after the nth snapshot (the snapshot version number is N) is completed, a snapshot version number can be allocated to the (N + 1) th snapshot in advance, the snapshot version number allocated to the (N + 1) th snapshot is N +1, and N is an integer greater than or equal to 1.

Taking the second cloud disk as an example for explanation: since the snapshot data of the first cloud disk is cloned, the initial snapshot version number of the second cloud disk is the snapshot version number of the cloned snapshot data, and the initial executed snapshot of the second cloud disk is the cloned snapshot data. And executing one snapshot for each pair of second cloud disks, and sequentially increasing the version numbers of the pre-allocated snapshots by one unit. A specific example is given below: the snapshot version number of the snapshot data cloned from the first cloud disk is X, and after the snapshot data is cloned from the first cloud disk, the pre-assigned snapshot version number becomes X + 1. The snapshot version number of the first snapshot after cloning is X +1, and the pre-assigned snapshot version number becomes X + 2. The snapshot version number of the second snapshot after cloning is X +2, and the pre-assigned snapshot version number is changed to X + 3. The snapshot version number of the third snapshot after cloning is X +3, and the pre-assigned snapshot version number becomes X + 4. By analogy, after the nth snapshot (the snapshot version number is X + N) after cloning is completed, the snapshot version number can be allocated to the (N + 1) th snapshot after cloning in advance, the snapshot version number allocated to the (N + 1) th snapshot after cloning is X + N +1, wherein X and N are integers greater than or equal to 1.

In the embodiment of the present disclosure, the snapshot version number corresponding to the source data is the pre-assigned snapshot version number corresponding to the time point of writing the source data. For example, after the nth snapshot (the snapshot version number is Y), the pre-allocated assigned snapshot version number is changed to Y +1, and during the period from the completion of the nth snapshot to the completion of the (N + 1) th snapshot, the snapshot version number corresponding to the source data written in the cloud disk is Y +1, where Y and N are integers greater than or equal to 1.

In the embodiment of the present disclosure, if a piece of data content is modified, one source data of the same data identifier is obtained through each modification, multiple source data of the same data identifier are obtained through multiple modifications, and in order to record the modification of the data content and distinguish the multiple source data of the same data identifier, the parameter of the source data may further include a data sequence number.

The following regards data to be written as target source data, and describes how to determine a data sequence number of the target source data based on a sequential increasing manner when writing the target source data into the cloud disk.

In the first mode, the data sequence number of the source data is incremented by taking the cloud disk as a range. That is, when writing a target source data into the cloud disk, the data sequence number of the target source data is incremented by one unit based on the existing maximum data sequence number of the cloud disk. For example, if the maximum data sequence number existing in the cloud disk is 100, when the target source data is written into the cloud disk, the data sequence number of the target source data is 101.

In the second mode, the data sequence number of the source data is incremented by the data set. That is, for a data set, each time a target source data is written into the data set, the data sequence number of the target source data is incremented by one unit based on the existing maximum data sequence number of the data set. For example, if the maximum data sequence number existing in the data set is 100, when the target source data is written into the data set, the data sequence number of the target source data is 101.

In the third mode, the data sequence number of the source data is incremented by taking the data identifier as a range. For a data identifier, when a target source data with the data identifier is written into the cloud disk, the data sequence number of the target source data is incremented by one unit on the basis of the largest data sequence number in all data sequence numbers corresponding to the data identifier. For example, for the data identifier "1", the largest one of all data sequence numbers corresponding to the data identifier "1" in the cloud disk is 100, and when the target source data with the data identifier "1" is written into the cloud disk, the data sequence number of the target source data is 101.

It can be seen that one source data can be uniquely determined according to the data identifier of the source data and the data sequence number of the source data. From the data sequence number, it can be determined which source data is the latest source data among the plurality of source data having the same data identification.

In an embodiment, if the metadata copied from the first cloud disk by the second cloud disk does not contain a data sequence number, the second cloud disk may assign a data sequence number to the source data corresponding to each copied metadata based on a sequential increment principle after the copying, and add the assigned data sequence number to the corresponding metadata.

In one embodiment of the present disclosure, the storage parameters of the source data may include: the file name of the data file storing the source data, the offset of the source data in the data file, and the size of the source data. Specifically, the file name refers to a specific name of a data file storing the source data, and the specific data file can be located by the file name. The offset is a position offset of the source data relative to a certain preset position in the data file, and the offset indicates a starting position of the source data in the data file. The size of the source data refers to the size of the storage space occupied by the source data. Combining the starting position and the size of the source data, the end position of the source data in the data file can be located. Through the storage parameters with the structure, the source data can be quickly and accurately positioned, so that the source data can be quickly acquired.

In a specific example, as shown in FIG. 3, based on metadata "< data identification: 3; the snapshot version number is 2; the data serial number is 2; the file name is 10; offset is 300; data size 8> "and metadata" < data identification 3; the snapshot version number is 2; the data serial number is 3; the file name of the stored data is 10; the offset is 200; 8>, it can be seen that, for the data identification "3", the data sequence number of the source data of which the data identification "3" is written at the latest time is 3, and the source data of which the data identification "3" is written at the latest time is stored at the position of the offset amount 200 in the data file with the file name of 10.

In a specific example, the cloud disk may be snapshot when a snapshot instruction is received. In a specific example, the cloud disk may be snapshot according to a preset snapshot instruction cycle. In the embodiment of the present disclosure, the snapshot of the cloud disk may be performed on all data sets of the cloud disk, or may be performed on part of the data sets of the cloud disk. The method of the disclosed embodiments is not limited to the triggering mechanism of the snapshot.

In a specific example, when a snapshot is made on a data set, first source data that needs to be snapshot in the data set is first determined. For each data identifier in the data set, the largest source data in the data sequence numbers is selected as the first source data from all the source data with the data identifier. Then, a snapshot is made of all of the first source data of the data set and metadata of all of the first source data.

How to manage the data of the second cloud disk is described in a plurality of specific examples.

< example 1>

Regarding the data to be written as target source data, when a write command to the second cloud disk is received, the method further includes steps S202-S208.

S202, acquiring target source data and data identification of the target source data from the write command.

And S204, writing the target source data into a free storage space of the second cloud disk.

And S206, sequentially increasing the snapshot version number of the executed snapshot on the basis of the latest snapshot version number of the second cloud disk to obtain the snapshot version number corresponding to the target source data. And determining the data sequence number of the target source data based on a sequential increasing mode.

And S208, generating metadata of the target source data and writing the metadata into a free storage space of the second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

It can be seen that in this embodiment, the incremental writing mode is adopted without distinguishing whether the write command is to modify the existing source data or to write the source data identified by the completely new data.

According to one embodiment of the present disclosure, when writing source data into the second cloud disk, each time the source data is written, the source data is stored into a free storage space and corresponding metadata is generated for the source data, where the metadata of the source data includes a snapshot version number corresponding to the source data, a data identifier of the source data, a data sequence number of the source data, and a storage parameter of the source data. That is, each written source data is incrementally stored and corresponding metadata is generated. By the method, when the active data of the second cloud disk is modified, the source data to be modified can be directly written in the free space of the cloud disk without adopting an in-situ modification mode, so that the data processing time is saved, and the data processing efficiency is improved.

In one embodiment of the present disclosure, each metadata of the first cloud disk and the second cloud disk may include a snapshot version number corresponding to the source data, a data identifier of the source data, a storage parameter of the source data, and a data sequence number. In an embodiment of the present disclosure, the first cloud disk and the second cloud disk may both adopt an incremental writing mode when receiving the write command.

< example 2>

Regarding data to be written as target source data, when a write command to the second cloud disk is received, the method further includes:

first, target source data and a data identifier of the target source data are acquired from a write command.

And then, according to the data identification of the target source data, searching metadata consistent with the data identification of the target source data in all metadata of the second cloud disk.

And if the second cloud disk does not have metadata consistent with the data identifier of the target source data, the target source data to be written is the source data of the brand new data identifier. In this case, steps S302-S306 are performed.

S302, writing the target source data into a free storage space of the second cloud disk.

And S304, sequentially increasing the snapshot version number of the executed snapshot on the latest time of the second cloud disk to obtain the snapshot version number corresponding to the target source data. And determining the data sequence number of the target source data based on a sequential increasing mode.

S306, generating metadata of the target source data and writing the metadata into a free storage space of the second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

If the metadata which is consistent with the data identification of the target source data exists in the second cloud disk, the existing source data in the second cloud disk is required to be modified. In this case, metadata that is consistent with the data identification of the target source data is determined as candidate metadata. In the case where there is one candidate metadata, the candidate metadata is taken as the target metadata. When there are a plurality of candidate metadata, the candidate metadata having the largest data number is selected as the target metadata from among the plurality of candidate metadata. In this way, it is ensured that the target metadata is the latest metadata of all metadata corresponding to the data identifier of the target source data.

And further judging whether a hard link file corresponding to the target metadata exists in the second cloud disk.

If a hard-link file corresponding to target metadata exists in the second cloud disk, the target metadata is metadata copied from snapshot data of the first cloud disk, and the target metadata points to corresponding source data in the first cloud disk. And in the case that the hard link file corresponding to the target metadata exists in the second cloud disk, performing steps S402-S406 in an incremental modification manner.

S402, writing the target source data into a free storage space of the second cloud disk.

And S404, sequentially increasing the snapshot version number of the executed snapshot on the latest time of the second cloud disk to obtain the snapshot version number corresponding to the target source data. And determining the data sequence number of the target source data based on a sequential increasing mode.

S406, generating metadata of the target source data and writing the metadata into a free storage space of the second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

Referring to fig. 2, it is assumed that the target metadata determined from the second cloud disk is metadata of the source data a2 according to the write command, which indicates that the write command is intended to modify the source data a 2. There is a hard link file B2 corresponding to the metadata of the source data a2 in the second cloud disk, and steps S402-S406 are performed. After steps S402-S406 are executed, metadata of source data B4 and source data B4 are added to the second cloud disk, the data identifier of source data B4 is identical to the data identifier of source data a2, the data sequence number of source data B4 is greater than the data sequence number of source data a2, and the snapshot version number corresponding to source data B4 is the current pre-assigned snapshot version number.

And if the hard link file corresponding to the target metadata does not exist in the second cloud disk, the target metadata is not the metadata copied from the snapshot data of the first cloud disk, and the source data pointed by the target metadata is the source data which is exclusively shared by the second cloud disk.

In the case that the hard link file corresponding to the target metadata does not exist in the second cloud disk, step S502 may be executed in an in-place modification manner.

S502, deleting operation is carried out according to the storage parameters in the target metadata, and the target source data are written according to the storage parameters in the target metadata. That is, the existing source data to be modified is deleted first, and then the modified source data is written in the same storage location.

In this case, it is possible to eliminate generation of completely new metadata. In this case, the snapshot version number corresponding to the target metadata and the data sequence number of the target metadata may be modified so that the target metadata conforms to the current actual situation. Specifically, the snapshot version number of the snapshot executed last time on the second cloud disk is sequentially incremented to obtain the modified snapshot version number of the target metadata. And modifying the data sequence number of the target metadata based on a sequential increasing principle.

Referring to fig. 2, it is assumed that the target metadata determined from the second cloud disk is metadata of the source data B4 according to the write command, which indicates that the write command is intended to modify the source data B4. The second cloud disk does not have a hard link file corresponding to the metadata of the source data B4, the source data B4 is source data that the second cloud disk exclusively shares, and step S502 is executed to modify the source data B4 into target source data in place.

< example 3>

first, a target data identification and a target snapshot version number are obtained from a read command.

And then, according to the data identifier of the target source data and the target snapshot version number, searching metadata which is consistent with the target data identifier and the target snapshot version number in all metadata of the second cloud disk.

And executing steps S602-S606 when determining that the metadata which is consistent with the target data identification and the target snapshot version number exists in all the metadata of the second cloud disk.

S602, determining the metadata consistent with the target data identification and the target snapshot version number as first target metadata.

S604, determining the metadata with the largest data sequence number in the first target metadata as second target metadata.

And S606, reading operation is carried out according to the storage parameters in the second target metadata.

In a specific example, referring to fig. 3, the target data identifier in the read command is 1, and the target snapshot version number is 2, that is, the client wants to read the source data with the data identifier of 1 and the snapshot version number of 2 from the second cloud disk. The server finds out two first target metadata according to the target data identifier 1 and the target snapshot version number 2, determines the metadata with the data sequence number of 5 as second target metadata, and reads the source data according to the storage parameters in the second target metadata and returns the source data to the client.

And executing the steps S702-S706 when determining that the metadata which is consistent with the target data identification exists in all the metadata of the second cloud disk and the metadata which is consistent with both the target data identification and the target snapshot version number does not exist.

And S702, sequentially decreasing the target snapshot version number until third target metadata is determined from all metadata of the second cloud disk, wherein the third target metadata comprises a target data identifier and the target snapshot version number which is sequentially decreased.

Specifically, when it is determined that metadata consistent with the target data identifier exists in all metadata of the second cloud disk, but metadata consistent with both the target data identifier and the target snapshot version number does not exist, in order to avoid missing in the data reading process, the target snapshot version number may be reselected according to a rule that the target snapshot version number is sequentially decreased in a descending manner to read the source data.

For example, the target data identification in the read command is 2 and the target snapshot version number is 6. All metadata in the second cloud disk with data identification 2 are considered as candidate metadata, and the candidate metadata form a candidate metadata set. When the metadata with the snapshot version number of 6 does not exist in the candidate metadata set, the metadata with the snapshot version number of 5 is tried to be acquired in the candidate metadata set as third target metadata. And if the candidate metadata set does not have metadata with the snapshot version number of 5, continuing to decrement the target snapshot version number, and trying to acquire the metadata with the snapshot version number of 4 in the candidate metadata set as third target metadata. And so on until a third target metadata is acquired in the candidate metadata.

S704, determining the metadata with the largest data sequence number in the third target metadata as fourth target metadata.

And S706, reading operation is carried out according to the storage parameters in the fourth target metadata.

< example 4 >:

when receiving a data scrubbing instruction, the method further comprises:

s802, reading the target data identification from the data cleaning command.

S804, searching out metadata consistent with the target data identification from all metadata of the second cloud disk as candidate metadata;

and S806, determining the source data to be cleaned according to the candidate metadata.

And S808, executing data cleaning operation on the source data to be cleaned.

In a specific example, performing a data cleansing operation on source data to be cleansed includes: determining storage parameters of the source data to be cleaned according to the metadata of the source data to be cleaned, and deleting the source data to be cleaned according to the storage parameters.

In another specific example, performing a data cleansing operation on the source data to be cleansed may further include cleansing metadata of the source data to be cleansed. In this example, when the data cleaning instruction is executed, not only the source data to be cleaned is cleaned, but also the metadata of the source data to be cleaned is cleaned, so that the storage space of the source data is released, and the storage space of the metadata is correspondingly released, so as to fully utilize the storage space.

How to determine the source data to be cleaned from the candidate metadata in step S806 is explained below.

First, it is assumed that a target data identifier in the data cleaning instruction is "1", and metadata in the second cloud disk that is consistent with the target data identifier is candidate metadata, where the candidate metadata includes:

< data ID: 1, Snapshot version number: 1, data sequence number: 5, … >

< data identification: 1, Snapshot version number: 2, data sequence number: 20, … >

< data identification: 1, Snapshot version number: 2, data sequence number: 22, … >

< data identification: 1, Snapshot version number: 3, data sequence number: 23, … >

< data identification: 1, Snapshot version number: 3, data sequence number: 24, … >

< data identification: 1, Snapshot version number: 3, data sequence number: 25, … >

< data identification: 1, Snapshot version number: 5, data sequence number: 30, … >

< data identification: 1, Snapshot version number: 5, data sequence number: 31, … >

< data identification: 1, Snapshot version number: 6, data sequence number: 33, … >

< data identification: 1, Snapshot version number: 6, data sequence number: 38, … >

< data identification: 1, Snapshot version number: 7, data sequence number: 40, … >

< data identification: 1, Snapshot version number: 7, data sequence number: 45, … >

Since the snapshot version number in the metadata is the pre-assigned snapshot version number, a snapshot with a snapshot version number of 7 may not have been made. If the snapshot with the snapshot version number of 7 is already made, it means that no operation of writing the source data with the target data identifier is performed after the snapshot with the snapshot version number of 7.

As can be seen from the candidate metadata, after the snapshot with the snapshot version number of 3, in the period from the snapshot with the snapshot version number of 4 to the snapshot, the operation of writing the source data with the target data identifier is not performed, so that the candidate metadata with the snapshot version number of 4 does not exist in the second cloud disk.

And under the condition that invalid snapshots do not exist in the snapshots corresponding to all the candidate snapshot version numbers, determining the source data to be cleaned according to the candidate metadata, including steps S902-S904.

And S902, respectively determining the target metadata to be cleaned corresponding to the candidate snapshot version number aiming at each candidate snapshot version number.

Determining the target metadata to be cleaned corresponding to the candidate snapshot version number, including:

and determining the maximum data sequence number as an effective data sequence number in all candidate metadata corresponding to the candidate snapshot version number. And selecting the metadata with the data sequence number smaller than the effective data sequence number from all the candidate metadata corresponding to the candidate snapshot version number as the target metadata to be cleaned corresponding to the candidate snapshot version number.

And S904, determining the source data corresponding to all the target metadata to be cleaned as the source data to be cleaned.

In this example, for candidate metadata with a data identifier of 1, the determined target metadata to be cleaned includes:

< data ID: 1, Snapshot version number: 1, data sequence number: 5, … >

And under the condition that invalid snapshots exist in the snapshots corresponding to all the candidate snapshot version numbers, determining the source data to be cleaned according to the candidate metadata, including the steps S1002-S1008.

S1002, taking the snapshot version number of the invalid snapshot as a first target snapshot version number, and taking the snapshot version number of the next snapshot of the invalid snapshot as a second target snapshot version number.

S1004, determining whether a second target snapshot version number exists in all candidate snapshot version numbers.

S1006, when the second target snapshot version number exists in all the candidate snapshot version numbers, determining the candidate metadata with the first target snapshot version number as the target metadata to be cleaned.

When the second target snapshot version number does not exist in all the candidate snapshot version numbers, it is indicated that the snapshot corresponding to the second target snapshot version number does not contain the source data with the data identifier as the target data identifier, and the snapshot corresponding to the second target snapshot version number may need to reference the source data with the data identifier as the target data identifier and the corresponding snapshot version number as the first target snapshot version number. For example, in the execution process of the foregoing read instruction, step S702 refers to the source data corresponding to the snapshot version number of the previous snapshot. In this case, the metadata with the largest data sequence number needs to be retained in the candidate metadata corresponding to the first target snapshot version number.

And S1008, determining the source data corresponding to all the target metadata to be cleaned as the source data to be cleaned.

Two specific examples are given below:

(1) assuming that the snapshot with the snapshot version number of 5 is an invalid snapshot, the remaining valid snapshots among the snapshots corresponding to all the candidate snapshot version numbers include: snapshots with a snapshot version number of 1, snapshots with a snapshot version number of 2, snapshots with a snapshot version number of 3, and snapshots with a snapshot version number of 6. If a snapshot with a snapshot version number of 7 has been currently taken, the remaining valid snapshots also include the snapshot with a snapshot version number of 7.

In this case, the metadata corresponding to the valid source data whose data is identified as "1" includes:

< data ID: 1, Snapshot version number: 1, data sequence number: 5, … >

< data identification: 1, Snapshot version number: 2, data sequence number 22, … >

< data identification: 1, Snapshot version number: 3, data sequence number 25, … >

< data identification: 1, Snapshot version number: 6, data sequence number 38, … >

< data ID: 1, Snapshot version number: 7, data sequence number 45, … >

In addition to these valid source data, the other source data whose data is identified as "1" is the source data to be cleaned.

In this case, the metadata of the source data to be cleaned includes:

(2) Assuming that the snapshot with the snapshot version number of 3 is an invalid snapshot, the remaining valid snapshots among the snapshots corresponding to all the candidate snapshot version numbers include: snapshots with a snapshot version number of 1, snapshots with a snapshot version number of 2, snapshots with a snapshot version number of 5, and snapshots with a snapshot version number of 6. If a snapshot with a snapshot version number of 7 has been currently taken, the remaining valid snapshots also include the snapshot with a snapshot version number of 7.

< data ID: 1, Snapshot version number: 1, data sequence number: 5, … >

< data identification: 1, Snapshot version number: 5, data sequence number 31, … >

< data ID: 1, Snapshot version number: 7, data sequence number 45, … >

In this case, the metadata of the source data to be cleaned includes:

< data processing apparatus embodiment >

In one embodiment of the present disclosure, a data processing apparatus is provided. The data processing apparatus includes a processing module.

The processing module is used for cloning snapshot data of the first cloud disk. The snapshot data includes source data recorded by the snapshot and metadata of the source data, and the metadata of the source data includes a snapshot version number corresponding to the source data, a data identifier of the source data, and a storage parameter of the source data.

Specifically, the processing module may include a replication sub-module and a creation sub-module.

The replication sub-module is used for replicating the metadata in the snapshot data in the second cloud disk.

The creating submodule is used for creating a hard link file associated with source data in the snapshot data in the second cloud disk.

In a specific example, the first cloud disk includes one or more data sets, wherein the same data identification corresponds to source data and metadata belonging to the same data set. The processing module is specifically configured to clone snapshot data of each data set of the first cloud disk.

In a specific example, the metadata of the source data further includes a data sequence number of the source data.

In a specific example, the data processing apparatus further includes a first write command execution module.

The first write command execution module is configured to: when a write command for a second cloud disk is received, acquiring target source data and a data identifier of the target source data from the write command; writing the target source data into a free storage space of a second cloud disk; sequentially increasing the snapshot version number of the executed snapshot on the basis of the latest snapshot version number of the second cloud disk to obtain the snapshot version number corresponding to the target source data; determining a data sequence number of the target source data based on a sequential increasing mode; generating metadata of target source data and writing the metadata into a free storage space of a second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

In a specific example, the data processing apparatus further includes a second write command execution module.

The second write command execution module is configured to: when a write command for a second cloud disk is received, acquiring target source data and a data identifier of the target source data from the write command; according to the data identification of the target source data, candidate metadata are found out from the metadata of the second cloud disk; in the case where the candidate metadata is one, taking the candidate metadata as target metadata; when the candidate metadata are multiple, selecting the candidate metadata with the largest data sequence number from the multiple candidate metadata as target metadata; writing the target source data into a free storage space of a second cloud disk under the condition that a hard link file corresponding to the target metadata exists in the second cloud disk; sequentially increasing the snapshot version number of the executed snapshot on the basis of the latest snapshot version number of the second cloud disk to obtain the snapshot version number corresponding to the target source data; determining a data sequence number of the target source data based on a sequential increasing mode; generating metadata of target source data and writing the metadata into a free storage space of a second cloud disk, wherein the metadata of the target source data comprises a snapshot version number corresponding to the target source data, a data identifier of the target source data, a data sequence number of the target source data and a storage parameter of the target source data.

The second write command execution module is further configured to: and under the condition that the second cloud disk does not have the hard link file corresponding to the target metadata, deleting operation is carried out according to the storage parameters in the target metadata, and the target source data is written according to the storage parameters in the target metadata.

In a specific example, the data processing apparatus further comprises a read command execution module.

The read command execution module is configured to: when a read command for a second cloud disk is received, acquiring a target data identifier and a target snapshot version number from the read command; when it is determined that metadata which is consistent with both the target data identifier and the target snapshot version number exists in all metadata of the second cloud disk, determining the metadata which is consistent with both the target data identifier and the target snapshot version number as first target metadata; determining the metadata with the largest data sequence number in the first target metadata as second target metadata; and performing reading operation according to the storage parameters in the second target metadata.

The read command execution module is further configured to: when it is determined that metadata which is consistent with the target data identifier exists in all metadata of the second cloud disk and metadata which is consistent with both the target data identifier and the target snapshot version number does not exist, sequentially decreasing the target snapshot version number until third target metadata is determined from all metadata of the second cloud disk; the third target metadata comprises a target data identifier and a target snapshot version number which is processed in a descending order; determining the metadata with the largest data sequence number in the third target metadata as fourth target metadata; and performing reading operation according to the storage parameters in the fourth target metadata.

< electronic device embodiment >

In one embodiment of the present disclosure, an electronic device is provided. The electronic device may include the foregoing data processing apparatus for implementing the data processing method of any embodiment of the present disclosure.

In one embodiment of the present disclosure, as shown in fig. 4, an electronic device 300 is provided. The electronic device 300 may comprise a memory 32 and a processor 31, the memory 32 being configured to store computer instructions which, when executed by the processor 31, implement the data processing method of any embodiment of the present disclosure.

In an embodiment of the present disclosure, the electronic device may be an electronic product such as a desktop, a notebook, a server, a workstation, and the like.

< computer-readable storage Medium embodiment >

In one embodiment of the present disclosure, a computer-readable storage medium is provided having stored thereon computer instructions which, when executed by a processor, implement the data processing method of any of the embodiments of the present disclosure.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The disclosed embodiments may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement aspects of embodiments of the disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations for embodiments of the present disclosure may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the disclosed embodiments by personalizing the custom electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of the computer-readable program instructions.

Various aspects of embodiments of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the embodiments of the present disclosure is defined by the appended claims.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the first cloud disk comprises one or more data sets, wherein a same data identification corresponds to source data and metadata belonging to a same data set;

3. The method of claim 1, wherein the metadata of the source data further comprises a data sequence number of the source data;

4. The method of claim 1, wherein the metadata of the source data further comprises a data sequence number of the source data;

5. The method of claim 4, wherein upon receiving a write command to a second cloud disk, the method further comprises:

6. The method of claim 1, wherein the metadata of the source data further comprises a data sequence number of the source data;

7. The method of claim 6, wherein upon receiving a read command to a second cloud disk, the method further comprises:

8. A data processing apparatus, comprising:

the processing module comprises a copying sub-module and a creating sub-module;

9. An electronic device, comprising:

the data processing apparatus of claim 8; alternatively, the first and second electrodes may be,

a processor and a memory for storing computer instructions which, when executed by the processor, implement the data processing method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon computer instructions, which, when executed by a processor, implement the data processing method of any one of claims 1-7.