WO2023040305A1

WO2023040305A1 - Data backup system and apparatus

Info

Publication number: WO2023040305A1
Application number: PCT/CN2022/092467
Authority: WO
Inventors: 杜翔; 罗先强; 陈克云
Original assignee: 华为技术有限公司
Priority date: 2021-09-18
Filing date: 2022-05-12
Publication date: 2023-03-23
Also published as: CN115840662A

Abstract

The present application provides a data backup system and apparatus. The system comprises a data processing device and a storage device. The data processing device is configured to receive a write request sent from a first host, the write request carrying first data to be backed up to the storage device; to perform a deduplication and compression operation on the first data to obtain second data; and to write the second data into the storage device. In the method, the calculation operations, such as writing into a memory the data to be backed up and deduplicating the data to be backed up, are all performed by the data processing device, without consuming CPU resources of the first host, thereby reducing the impact on the production environment of the first host and increasing the CPU utilization of the first host.

Description

A data backup system and device

Cross References to Related Applications

This application claims the priority of the Chinese patent application filed with the Intellectual Property Office of the People's Republic of China on September 18, 2021, with the application number 202111101810.2 and the application name "A Data Backup System and Device", the entire contents of which are incorporated by reference In this application.

technical field

The present application relates to the field of computer technology, in particular to a data backup system and device.

Background technique

In the field of information data management, backup usually refers to the process of copying all or part of the data set in the file system or database system from the disk or storage array of the business host to other storage media.

In one backup mode, data can be backed up to a remote storage device through backup software deployed on the business host. Because the data that needs to be backed up usually has a high data duplication rate, in view of this feature, the backup software provider usually integrates the deduplication technology at the source side (that is, the business host side) to deduplicate the stored duplicate data in the backup data , so as to reduce the amount of data transmission between the business host and the storage device, so as to achieve the purpose of increasing the logical backup bandwidth.

However, the above deduplication operation consumes more CPU computing resources on the service host, which may have a greater impact on the service performance of the service host.

Contents of the invention

The present application provides a data backup system and device, which are used to reduce performance impact on service hosts on the basis of ensuring backup bandwidth.

In a first aspect, an embodiment of the present application provides a data backup system, and the system includes a data processing apparatus and a storage device. In this system, the data processing device may be used to receive a write request sent by a service host (eg, the first host), and the write request is used to request to write the data to be backed up (eg, the first data) into the storage device. After receiving the write request, the storage device may perform a deduplication and compression operation on the first data carried in the write request to obtain data after the deduplication and compression operation (for example, called second data), and send the second data to to the storage device. The storage device is used to receive the second data sent by the data processing device and store the second data. So far, the backup of the first data is completed.

Through the above design, the calculation operations such as writing the data to be backed up (such as the first data) into the internal memory and deduplicating and compressing the data to be backed up all take place in the data processing device, without consuming the CPU resources of the first host, thereby reducing the need for the second The impact of the production environment of the first host increases the CPU utilization of the first host.

In a possible implementation manner, the data processing device may be a network card or a data processing unit (data processing unit, DPU).

Through the above design, when the data processing device is a network card or a DPU, it can be integrated or installed in a service host in a pluggable manner, making deployment more convenient.

In a possible implementation manner, after the data processing device receives the write request, it stores the first data carried in the write request into the memory of the data processing device, and then returns a write request completion response to the first host; When the deduplication operation is performed on the first data, the data processing device is specifically configured to obtain the first data from the internal memory, and delete data blocks in the first data that are duplicated with data blocks already stored in the storage device.

Through the above design, the data processing device performs computing operations such as deduplication and compression on the data to be backed up, thereby reducing the consumption of CPU resources of the first host and improving the backup efficiency of the backup task on the first host.

In a possible implementation manner, after receiving the write request, the data processing device is further configured to return the write request completion response to the first host after storing the first data in the memory of the data processing device; Acquire the first data from the internal memory and store it in the persistent storage medium of the data processing device; when performing deduplication and compression operations on the first data, the data processing device is specifically used to retrieve the data from the persistent Acquiring the first data from a storage medium; deleting a data block in the first data that is duplicated with a data block already stored in the storage device.

Through the above design, the data processing device can temporarily write the file data to be backed up to a local persistent storage medium, such as a disk, and since the file data has been stored persistently, logical data backup is completed. In this way, data backup can be completed in the data processing device. The backup process only depends on the computing power of the data processing device and the read/write bandwidth and size of the disk, and is no longer affected by the bandwidth performance from the host to the storage device and the processing capacity of the storage device. For large data backup scenarios, the backup performance can be significantly improved and the backup window can be shortened. In addition, since logical backup does not need to be deduplicated, deleted, or compressed, and it does not need to be sent to a storage device for storage, it does not involve network communication overhead, which can significantly shorten the backup window and improve backup efficiency.

In a possible implementation manner, the data processing apparatus further includes a first file system, the first file system is the same as the second file system of the storage device, and the write request sent by the first host is a write request based on the second file system, For example, it is used to write the first data into the first file in the second file system; when the data processing device stores the first data to the persistent storage medium of the data processing device, it is specifically used to write the first data into the first file through the first file system A data is stored to the persistent storage medium of the data processing device.

Through the above design, the data processing device stores and manages the data of the file to be backed up sent by the host through the local file system.

In a possible implementation manner, the data processing device is further configured to receive a file creation request, where the file creation request is used to request to create the first file in the second file system; and send the file creation request to the storage device;

The storage device is further configured to create the first file requested by the file creation request in the second file system of the storage device, and generate mapping address information of the first file, where the mapping address information is used to indicate that the data of the first file is located at The data processing device, or for indicating the access path of the data of the first file in the data processing device; sending a successful creation response to the data processing device;

The data processing device is further configured to: receive the creation success response sent by the storage device, and create the first file in the first file system; if the write request sent by the first host is used to request to write the first data into the second file system in the second file system A file; when the data processing device stores the first data to the persistent storage medium of the data processing device, it is specifically configured to write the first data into the first file in the first file system.

Through the above design, the storage device obtains the data of the file according to the mapping address information of the file, so as the entry of data access, it can serve more device data access and provide data access flexibility.

In a possible implementation manner, the data processing device is further configured to: delete the first file stored in the data processing device after storing the data of the first file in the storage device through a deduplication and compression operation; The storage device is used to modify the mapping address information of the first file to indicate data of the first file and store it in the storage device.

Through the above design, the data processing device deletes the data after storing the data in the storage device, which can improve the utilization rate of the storage medium.

In a possible implementation manner, the data processing device is further configured to receive a first read request sent by the second host, where the first read request is used to read at least part of the data of the first file; and forward the read request to The storage device; the storage device is used to send a second read request to the data processing device when determining that the data of the first file is located in the data processing device according to the mapping address information of the first file, and the second read request uses The data processing device is further configured to read at least part of the data of the first file from the data processing device according to the read request.

Through the above design, the data processing device provides backup services for the host, that is, on the basis of storing the data to be backed up of the host to the local persistent storage medium of the data processing device, it can also provide data access services for the other devices, providing data access flexibility.

In the second aspect, the embodiment of the present application also provides a data processing device, the data processing device has the function of implementing the behavior in the method example of the first aspect above, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here. The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions. In a possible design, the structure of the data processing device includes a receiving module, a processing module, and a sending module. These modules can perform the corresponding functions in the method example of the first aspect above. For details, refer to the detailed description in the method example, and details are not repeated here.

In a third aspect, the present application also provides a computing device, the computing device includes a processor and a memory, and may also include a communication interface, the processor executes the program instructions in the memory to implement the above-mentioned first aspect or the first A method provided by any possible implementation of the aspect. The memory is coupled with the processor, and stores necessary program instructions and data during the data backup process. The communication interface is used to communicate with other devices, such as receiving a write request sent by the first host, or sending second data to the storage device.

In a fourth aspect, the present application provides a computer-readable storage medium. When the computer-readable storage medium is executed by a computing device, the computing device executes the aforementioned first aspect or any possible implementation of the first aspect. provided method. The program is stored in the storage medium. The storage medium includes but not limited to volatile memory, such as random access memory, and nonvolatile memory, such as flash memory, hard disk drive (hard disk drive, HDD), and solid state drive (solid state drive, SSD).

In a fifth aspect, the present application provides a program product for a computing device, the program product for a computing device includes computer instructions, and when executed by a computing device, the computing device executes the aforementioned first aspect or any possible implementation of the first aspect method provided in the method. The computer program product may be a software installation package, and if the method provided in the aforementioned first aspect or any possible implementation of the first aspect needs to be used, the computer program product may be downloaded and executed on a computing device. program product.

In the sixth aspect, the present application also provides a computer chip, the chip is connected to the memory, and the chip is used to read and execute the software program stored in the memory, and implement the above first aspect and each possibility of the first aspect. The method described in the implementation of the .

Description of drawings

FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 2 is a schematic flow diagram of a data backup method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of another system architecture provided by an embodiment of the present application;

FIG. 4 is a schematic flow diagram of another data backup method provided in the embodiment of the present application;

FIG. 5 is a schematic flow diagram of creating a file provided by the embodiment of the present application;

FIG. 6 is a schematic flow chart of file data migration provided by the embodiment of the present application;

FIG. 7 is a schematic flowchart of a data access method provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.

Detailed ways

In order to facilitate the understanding of the data backup method provided by the embodiment of the present application, concepts and terms involved in the embodiment of the present application are briefly described first.

1. The file system is a structured data file storage and organization form. All the data in the computer are 0 and 1, and a series of 01 combinations stored on the hardware media are completely impossible for users to distinguish and manage. Therefore, this application uses the concept of "file" to organize these data, and the data used for the same purpose composes different types of files according to the structure required by different applications. Usually different suffixes are used to refer to different types, and then each file is given a name that is easy to understand and remember. And when there are many files, these files are grouped according to a certain division method, and each group of files is placed in the same directory (or folder). In addition, there may be a subdirectory (subdirectory or subfolder) under the directory except files, and all files and directories form a tree structure. This tree structure has a dedicated name: File System (File System). There are many types of file systems, the common ones are FAT/FAT32/NTFS of Windows, EXT2/EXT3/EXT4/XFS/BtrFS of Linux, etc. In order to facilitate the search, start from the root node and go down to the file itself, and use special characters for the names of these directories, subdirectories, and files (such as "\" for Windows/DOS, "/" for Unix-like systems) ) together, such a string of characters is called a file path, such as "/etc/systemd/system.conf" in Linux or "C:\Windows\System32\taskmgr.exe" in Windows. A path is a unique identifier for accessing a specific file. For example, D:\data\file.exe under Windows is the path of a file, which represents the file.exe file under the data directory under the D partition.

2. The first, second, and other numbers involved in this application are only for the convenience of description, and are not used to limit the scope of the embodiments of this application, and also indicate the sequence. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character "/" generally indicates that the contextual objects are an "or" relationship.

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic structural diagram of a backup system provided by an embodiment of the present invention. Referring to Figure 1, the system includes a host 110 (Figure 1 only shows one host 110, but this embodiment of the application does not limit it), a data processing device 120, and a storage device 130 (Figure 1 only shows one storage device 130, but this embodiment of the present application does not limit it).

1) The host 110 may be a computing device deployed on the user side, and the computing device may be a physical machine or a virtual machine. Physical machines include but are not limited to desktop computers, servers (such as application servers, file servers, database servers, etc.), notebook computers, and mobile devices.

Host 110, as a business host that needs to back up data, is generally equipped with backup software. The host 110 backs up the data in the host 110 by running the backup software. The backup software is also provided with a backup strategy. The backup strategy can be a backup strategy , may also be a policy set by the user, and the backup policy may include, for example: backup start time, data to be backed up, and a target storage device for backing up the data. Specifically, the host 110 sends the data to be backed up in the host to the data processing device by running the backup software and according to the backup policy set in the backup software, which will be described in detail below.

2) The data processing device 120 is connected between the host 110 and the storage device 130 , and is used to process the data sent by the host 110 , for example, perform deduplication and compression processing, and send the processed data to the storage device 130 . The specific process of data processing by the data processing device 120 will be described in detail below. The data processing device 120 may be a data processor (data processing unit, DPU), a smart network card (smartnic), or other components, which are not limited in this embodiment of the present application.

In terms of hardware, the data processing device 120 includes a processor 121 , a memory 122 , a front-end interface 123 , and a back-end interface 124 . The processor 121 , the memory 122 , the front-end interface 123 and the back-end interface 124 are connected through a bus 125 .

Wherein, the processor 121 is a central processing unit (central processing unit, CPU), hardware logic circuit, processing core, application specific integrated circuit (application specific integrated circuit, ASIC) chip, AI chip or programmable logic device (programmable logic device, PLD) implementation, the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), a system on a chip ( system on chip, SoC) or any combination thereof.

The processor 121 may be configured to process a data backup request or a data restoration request from the host 110 . Exemplarily, when the processor 121 receives the data backup request sent by the host computer 110 through the front-end interface 123, it will temporarily save the data to be backed up carried in the data backup request to the memory 122. When the total amount of data reaches a certain threshold, the processor 121 will persistently store the data in the memory 122. In the embodiment of this application, two ways of data persistence are provided. One is to send the data to the storage device 130 for storage. persistent storage. When performing persistence in this way, the processor 121 will perform deduplication and compression processing on the persistent data, and then send it to the storage device 130 for storage. The specific process of deduplication and compression will be described in detail below. The other is to first persist the data to the local hard disk, then deduplicate and compress the data persisted to the hard disk, and send the deduplicated and compressed data to the storage device 130 for storage. For the specific process, please refer to the following a detailed description of . Only one processor 121 is shown in FIG. 1 . In practical applications, there are usually multiple processors 121 , and one processor 121 has one or more processor cores. This embodiment does not limit the number of processors and the number of processor cores.

The memory 122 refers to an internal memory directly exchanging data with the processor 121. It can read and write data at any time, and the speed is very fast. It is used as a temporary data storage for the operating system or other running programs. The memory 122 includes at least two types of memory, for example, the memory 122 can be either a random access memory or a read only memory (ROM). The random access memory is, for example, dynamic random access memory (DRAM), or storage class memory (SCM). DRAM is a semiconductor memory, which, like most RAM, is a volatile memory device. SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-class memory can provide faster read and write speeds than hard disks, but the access speed is slower than DRAM, and the cost is also cheaper than DRAM. . However, the DRAM and the SCM are only illustrative examples in this embodiment of the present application, and the memory 122 may also include other random access memories, such as static random access memory (static random access memory, SRAM) and the like. As for the read-only memory, for example, it may be programmable read-only memory (programmable read only memory, PROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM) and the like. In addition, the memory 122 may also be a dual in-line memory module or a dual in-line memory module (DIMM), that is, a module composed of DRAM. In practical applications, multiple memories 122 and different types of memories 122 may be configured in the data processing device 120 . This embodiment does not limit the quantity and type of the memory 122 .

The front-end interface 123 is used to transmit data between the data processing device 120 and the host 110 . Exemplarily, the front-end interface 123 may be a Peripheral Component Interconnect Express (PCIe) interface, and the data processing device 120 and the host 110 are connected through a PCIe bus. The front-end interface 123 can also be other types of interfaces, such as a non-volatile memory host controller (non-volatile memory express, NVMe) interface, which is not limited in this application, and any method for realizing communication between the two is applicable to this application. Application example.

The backend interface 124 is used to transmit data between the data processing device 120 and the storage device 130 . The backend interface may be a network card, and the network card is connected to a network, so that the data processing device 120 and the storage device 130 can communicate through the network. The network can be wired or wireless communication. Exemplary, a network generally refers to any telecommunications or computer network, including, for example, an intranet, a wide area network (WAN), a local area network (LAN), a personal area network (PAN), the Internet or Wireless network (such as WIFI, 5th Generation (5G) communication technology). Specifically, the data processing device 120 may communicate with the storage device 130 using various network protocols, such as TCP/IP protocol, UDP/IP protocol, RDMA protocol, and the like. In addition to the example of passing through the above-mentioned network, the data processing device 120 may also communicate with the storage device 130 through a fiber optic switch. Alternatively, the fiber optic switch can also be replaced with an Ethernet switch, an InfiniBand switch, a converged Ethernet-based remote direct memory access (RDMA over converged ethernet, RoCE) switch, and the like.

The bus 125 includes but is not limited to: PCIe bus, double data rate (double data rate, DDR) bus, interconnection bus supporting multi-protocol (hereinafter referred to as multi-protocol interconnection bus, which will be described in detail below), serial advanced Technology attachment (serial advanced technology attachment, SATA) bus and serial connection SCSI (serial attached scsi, SAS) bus, controller area network bus (Controller Area Network, CAN), computer standard connection (computer express link, CXL) standard bus wait.

3) The storage device 130 is configured to provide data backup services for the host 110 . The storage device 130 may be, but not limited to, a storage area network (Storage Area Network, SAN) device or a network attached storage (network attached storage, NAS) device. If the storage device 130 is a NAS device, the NAS device may be used to provide a file-level sharing service for the host. In addition, FIG. 1 only shows one storage device 130. In practical applications, the storage device 130 in the embodiment of the present application can be the storage device 130 in the centralized storage system, or any storage device in the distributed storage system. device 130.

Next, taking the system shown in FIG. 1 as an example, the data backup method provided by the embodiment of the present application will be described. For ease of description, the following embodiments of the present application will be described by taking the data processing device 120 as a DPU as an example.

Referring to FIG. 2, FIG. 2 is a schematic flow chart of a data backup method provided in the embodiment of the present application. As shown in FIG. 2, the method includes the following steps:

Step 201: The host 110 sends a write request to the DPU 120 in response to the backup command, for backing up the data indicated by the backup command to the storage device 130 . The write request carries data that needs to be backed up to the storage device 130 . The backup instruction may be generated in response to a user's backup operation, or may be automatically generated by the host 110 according to a preset backup strategy. Hourly back up the specified file system or specified file or specified database.

Taking the backup file as an example, when the host 110 performs the backup, the backup software calls the open function to open the file to be backed up under the file system, and then triggers the fwrite system call to transfer the data of the file. Exemplarily, the fwrite system call Carry information such as the data (or the storage address of the data) of the specified file and the path name of the file, so as to request the data of the specified file to be written into the corresponding file of the file system of the storage device 130 through the fwrite system call, To complete the data backup of the file.

It should be noted that, in one embodiment, the write request may be the fwrite system call. For example, when the DPU 120 uses the PCIe interface and the virtio-fs paravirtualization technology to access the host 110, the two functions of the DPU 120 and the host 110 may be implemented. The operating systems communicate directly. In this case, the DPU 120 can recognize and receive the fwrite system call sent by the host 110 . In another embodiment, the write request may also be a message in other protocol frame formats generated based on the fwrite system call. For example, when the DPU 120 accesses the host 110 through other methods such as an NVMe interface, the write request may be based on the fwrite system Call the resulting RPC call. In addition, fwrite here refers to writing file data. Generating a write request based on fwrite is only an example. The embodiment of the present application does not limit the type and generation method of the write request. For example, the write request can also be generated based on the write system call .

Step 202 , the DPU 120 stores the data in the write request into the memory 122 .

In step 203, the DPU 120 sends a request success response to the host 110, which is used to indicate that the DPU 120 has received the write request. It should be noted that step 203 is an optional step and is not required to be executed, so it is shown in a dashed box in FIG. 2 .

In step 204, the DPU 120 obtains data to be backed up from the memory 122 (for example, first data), performs deduplication and compression operations on the first data, and records the deduped and compressed data as second data.

After receiving the write requests from the host 110 , the DPU 120 may temporarily save the data in the write requests in the memory 122 , and then, the DPU 120 acquires the first data from the memory 122 . The first data here may be a continuous piece of data with a preset length, or variable-length data, which is not limited in this embodiment of the present application.

After the DPU 120 acquires the first data, it performs a deduplication and compression operation on the first data. The deduplication and compression operation refers to data deduplication and/or data compression. Among them, data deduplication refers to the use of algorithms to eliminate duplicate data, thereby reducing the storage space occupied by data. In this application, if duplicate data is detected during backup, it will be discarded, and then a pointer will be created to point to the data copy that has been backed up, which can reduce the amount of data transmitted between the DPU 120 and the storage device 130, and reduce the network load. Specifically, according to the deduplication granularity, the deduplication method includes at least file-level deduplication and sub-file-level deduplication (also called block-level deduplication). In file-level deduplication, deduplication is performed in units of files. Let's introduce them respectively:

(1) File-level deduplication, also known as single-instance storage (SIS), detects and removes duplicate file copies. It stores only one copy of the file, so all other copies are replaced with pointers to the only copy. File-level deduplication is simple and fast, but it cannot eliminate duplicate content in files. For example, two 10MB Powerpoint presentation files differ only in the title page, they will not be considered as duplicate files. The two files will be stored separately.

(2) Sub-file-level deduplication refers to decomposing a file/object into data blocks of fixed size or variable size, and performing deduplication operations in units of data blocks. Sub-file level deduplication removes duplicate data between files. There are two ways to implement subfile deduplication: fixed-length blocks and variable-length segments. Fixed-length block deduplication divides files into fixed-length blocks and uses a hash algorithm to find duplicate data. Fixed-length blocks are simple, but may miss a lot of duplicate data, because similar data may have different block boundaries. Imagine adding a person's name to the title page of a document, the entire document will be shifted, and all blocks will be changed, making it impossible to detect duplicate data. In variable-length segment deduplication, if there is a change in one segment, only the boundaries of this segment are adjusted, leaving the remaining segments unchanged. Compared with the fixed block method, this method improves the ability to identify duplicate data segments.

Taking block-level deduplication of variable size as an example, the deduplication process is described as follows: First, according to the preset algorithm (such as the Content-Defined Chunking (CDC) algorithm, the embodiment of the present application does not do this. limit) to determine the boundaries of each block in the first data, thereby dividing the first data into multiple blocks, and the size of each block may be different; The hash value is similar to the fingerprint information of the data block, and the data blocks with the same content have the same fingerprint information, so it can be confirmed whether the content of the data block is the same by matching the fingerprint; Fingerprint matching is performed on the data blocks that have been stored before. The data blocks that have been stored before are no longer stored repeatedly. Only the hash value is used as the index information to record the data block, and the index information of the data block is mapped to the specific storage location. For New data blocks that have not been stored before are first stored physically, and then recorded using the hash value index. According to the above process, it can be guaranteed that the same data block is stored on the physical medium at least once, so as to achieve the purpose of deduplicating data.

For example, the data deduplication process in this application may include:

The DPU 120 divides the first data into multiple data blocks based on the CDC algorithm, for example, the multiple data are block 1 , block 2 , block 3 and block 4 . Then calculate the fingerprint (ie hash value) of each data block, and then send the fingerprint of each data block to the storage device 130 . The storage device 130 traverses the local fingerprint library, which includes the fingerprints of the data blocks stored by the storage device 130, and inquires whether there is a fingerprint (fp1) of the block 1, a fingerprint (fp2) of the block 2, and a fingerprint of the block 2 of the first data in the fingerprint library. 3's fingerprint (fp3), block 4's fingerprint (fp4), if they exist, the data block is a repeated data block, for example, fp1 and fp3 exist in the fingerprint database, fp2 and fp4 do not exist, then block 1 and block 3 is a repeated data block, and blocks 2 and 4 are non-repeated data blocks. Afterwards, the storage device 130 sends the query result to the DPU 120 . The query result here is used to indicate whether there is a repeated data block in the first data, identification information of the repeated data block, and the like.

Correspondingly, the DPU 120 determines the repeated data block in the first data according to the query result, and the DPU 120 generates the metadata of the repeated data block, such as the metadata including but not limited to: the fingerprint of the data block, the data block in the first The offset in the data and the length of the data block, etc., are not limited to the metadata in this embodiment of the present application. If only the first data is deduplicated without other data processing, in this case the DPU 120 can send the metadata of the repeated data block and the data of the non-duplicated data block to the storage device 130 . As in the above example, the deduplicated data includes fp1, fp3, data of data block 2, and data of data block 4. After deduplication, data transmission volume can be reduced, resource overhead for backup and network burden can be reduced, and logical backup bandwidth can be increased.

The present application can also use a compression algorithm to compress the first data. The compression algorithm can be, for example, Shannon-Fano algorithm, Huffman coding, arithmetic coding, LZ77/LZ78 coding, etc. This application does not limit this, any can Existing algorithms for compressing data and compression algorithms that may be applied in the future are applicable to the embodiments of the present application. In addition, the compression algorithm may be specified by the user, or may be adaptively compressed by the system according to a preset policy, which is not limited in this embodiment of the present application.

The foregoing data processing manners may be used alone or in combination. For example, the DPU 120 may only perform data deduplication on the first data. Alternatively, DPU 120 may also perform data compression on only the first data. For another example, the DPU 120 may deduplicate the data first, and then compress the deduplicated data, etc., which is not limited in this embodiment of the present application.

Step 205 , the DPU 120 sends the second data to the storage device 130 .

Step 206, the storage device 130 stores the second data.

In the process of storing the second data, the storage device 130 writes the data of the unrepeated data block in the storage medium, and generates metadata of the data, and the metadata may include the storage location of the data in the storage device 130, Fingerprints etc. For a repeated data block, it is only necessary to record the metadata of the repeated data block, and make the metadata point to the currently stored data block. So far, the backup of the first data is completed.

It should be noted that the DPU 120 may send the second data to the storage device 130, or may aggregate multiple second data into a data block of a specified size and then send it to the storage device 130, that is, repeat the above steps to obtain multiple The second data corresponding to the first data is aggregated and sent to the storage device 130 together, thereby reducing the number of write IOs.

In the above method, the data to be backed up is written into the internal memory 122, and the data to be backed up is deduplicated and other computing operations all occur in the DPU 120, which does not consume the CPU resources of the host 110, reduces the impact on the production environment of the host 110, and improves the performance of the host. 110 CPU utilization.

The embodiment of the present application also provides another data processing apparatus 220 . Fig. 3 shows a schematic diagram of the structure of the data processing device 220. On the basis of Fig. 1, a hard disk 126 is added in Fig. 3. For the functions, please refer to the introduction of relevant components in the data processing device 120 in FIG.

Hard disk 126, also can be referred to as auxiliary memory, hard disk 126 can be non-volatile memory (non-volatile memory), such as read-only memory (read-only memory, ROM), hard disk drive (hard disk drive, HDD) or solid state Drive (solid state disk, SSD), etc. Different from the memory 122, the speed of reading and writing data of the hard disk 126 is slower than that of the memory 122, and is usually used for persistently storing data. In this application, DPU 220 may be used to provide backup service for host 110 , and hard disk 126 may be used to store backup data sent by host 110 . Wherein, the size of the hard disk 126 provided in the DPU 220 can be determined according to the size of the data that the host 110 needs to back up each time. For example, if the host 110 generates 1TB of data in one backup cycle, the hard disk 126 may be at least 1TB in size.

Next, in conjunction with FIG. 4 , another data backup method provided by the embodiment of the present application will be described by taking the system shown in FIG. 3 as an example. In the following, the data processing device 220 is DPU220 as an example for introduction. The implementation of the method in this system can be divided into two processes. The first process is to create a file system/file for the DPU220 (step 401-step 406). In the second process, the DPU 220 backs up the files of the host 110 to the storage device 130 (step 407-step 413).

In step 401, the DPU 220 creates a file system.

In this method, the DPU 220 "formats" the hard disk 126 to establish a local file system for managing the local storage space (including the hard disk 126 ). The file system may be of any type (such as ext3, zfs, etc.), which is not limited in this embodiment of the present application. It should be noted that the type of the local file system of the DPU 220 and the type of the file system of the first host may be the same or different. Similarly, the type of the local file system of the DPU 220 and the type of the file system on the storage device 130 may be the same , may also be different, which is not limited in the embodiments of the present application. In addition, the embodiment of the present application does not limit the timing for the DPU 220 to create the local file system.

Step 402 : the host 110 sends a create request to the DPU 220 to request to create an object under the file system of the secondary storage device 130 . Objects here refer to directories or files in the file system.

In one embodiment, the host 110 can mount the file system of the storage device 130 to a local directory of the host 110, and then the host 110 can execute the file system of the storage device 130 like creating a file , directory and other operations. Taking creating a file as an example, the host 110 may send a creation request for requesting to create a file under the file system of the storage device 130, so that the storage device 130 creates a file in the local file system.

For example, as shown in (a) of FIG. 5 , the host 110 mounts the file system whose root directory is /FS0/ in the storage device 130 to the /mnt/ directory of the host 110. Exemplarily, the application program of the host 110 Call the open function to request to create an object (directory or file) under /mnt/FS0/. The open system call will carry the specified path name, object name and object type. For example, the open request is open{"mnt/FS0/vm0.vmdk", O_CREAT}, which means creating vm0.vmdk under the path mnt/FS0/. It should be noted that (1) the open function above is only an example, and the embodiment of the present application does not limit the type and generation method of the creation request. (2) There is no strict time sequence between step 401 and step 402. Step 401 and step 402 can be executed at the same time, or step 401 can be executed before step 402, or step 401 can be executed after step 402. This is not limited.

Step 403: DPU 220 sends the creation request to storage device 130 .

Step 404: the storage device 130 creates an object under the specified path in response to the creation request.

After receiving the creation request, the storage device 130 creates the file under the corresponding path of the file system. Continue referring to (b) of FIG. vm0.vmdk. If the creation is successful, a successful creation response is sent to the DPU220. If the creation fails, for example, a file with the same name already exists in the local file system of the storage device 130 , then the creation fails, and the storage device 130 returns a creation failure response to the DPU 220 . The following takes the successful creation as an example to illustrate.

It is worth noting that, if the storage device 130 is created successfully, the DPU 220 will also create the same file system or file, and after receiving the write request of the file sent by the host 110, the DPU 220 will temporarily write the data of the file into the DPU 220 In the corresponding file of the local file system, that is to say, the file on the storage device 130 does not actually store data during this period. To this end, in an implementation manner, the storage device 130 generates file attribute information for the created file, the file attribute information is used to indicate the file attribute, and the file attribute includes at least two types: normal file (regular) and stub file (stub) , wherein, a normal file means that the file data is stored locally in the storage device 130 . The stub file means that the data of the file is not stored locally on the storage node, but is stored at the mapping address of the stub file. The mapping address is used to indicate the actual storage location of the file data. Exemplarily, the mapping address includes a device identifier, which is used to uniquely indicate a device. Optionally, it may also include a file path, etc. This embodiment of the present application does not limit this, as long as it can indicate the data source. For example, referring to (b) of FIG. 5 , the mapping address of FS0/vm0.vmdk on the storage device 130 is DPU220, or the mapping address is DPU220:/FS0/vm0.vmdk. When the storage device 130 receives the access request of the stub file, the storage device 130 obtains the data of the file from the mapping address of the stub file. The data access method in this scenario will be described in detail below.

In step 405, the storage device 130 sends a response to the DPU 220 indicating that the creation is successful.

In step 406, the DPU 220 creates the same object under the specified path corresponding to the local file system of the DPU 220 based on the aforementioned creation request.

Continuing to refer to (b) of FIG. 5 , after DPU220 receives the successful creation response sent by storage device 130, DPU220 responds to the above open {"mnt/FS0/vm0.vmdk", O_CREAT} request, in the local file system Create the same file system in a certain directory of , as shown in (b) of FIG. 5 , DPU220 creates an FSO directory under the data/ directory, and creates vm0.vmdk in the FSO directory. At this point, a new file is created. One or more files or directories can be created through the above method, for example, vm1.vmdk in (c) of FIG. 5 can also be created based on the above method.

Through the above process, the DPU 220 will create the same file system (such as FS0:/) and files as the host 110 and the storage device 130 . Subsequently, the DPU 220 may store the backup data of the same file in the host 110 into a corresponding file in the local file system.

Step 407 , the host 110 sends a write request to the DPU 220 in response to the backup command, for backing up the data indicated by the backup command to the storage device 130 .

Step 408 , the DPU 220 stores the data in the write request into the memory 122 .

Step 409 , the DPU 220 sends a request success response to the host 110 .

Wherein, Step 407 to Step 409 are the same as Step 201 to Step 203 respectively, and will not be repeated here.

Step 410 , the DPU 220 stores the data to be backed up in the memory 122 to the hard disk 126 .

Specifically, the DPU220 writes the data to be backed up into corresponding files in the local file system for persistent storage. Taking the scenario shown in FIG. 5 as an example, assume that in step 407 , the backup software of the host 110 triggers the fwrite system call to request to write the data of vm0.vmdk into vm0.vmdk of the storage device 130 . Correspondingly, in response to the fwrite system call, the DPU220 first writes the data of vm0.vmdk into the local memory 122, then obtains at least part of the data of the vm0.vmdk from the memory 122, and writes it into the vm0.vmdk of the local file system of the DPU220 , to complete the persistent storage of vm0.vmdk.

In an optional implementation manner, the DPU 220 may first determine whether the vm0.vmdk file exists in the local file system, and if so, write the data of vm0.vmdk into the vm0.vmdk of the local file system. If it does not exist, you can create the file in the local file through the above steps. Or, if the vm0.vmdk file does not exist in the local file system, or there is no remaining storage space in the hard disk 126 , the DPU 220 may also send the data of the vm0.vmdk file to the storage device 130 .

There are many ways for DPU220 to trigger the data in memory 122 to be written into hard disk 126. The set trigger condition is that the amount of data in the memory 122 reaches a preset threshold, or reaches a preset time, for example, the DPU 220 periodically writes the data in the memory 122 to the hard disk 126 .

In another embodiment, the DPU 220 triggers writing the data in the internal memory 122 into the hard disk 126 according to the instruction of the host 110, for example, the host 110 sends instruction information (such as called first instruction information) to the DPU 220, and the first instruction information uses Instructing the DPU 220 to persistently store the data in the memory 122 . Exemplarily, the backup software of the host 110 may trigger an fsync call based on a preset policy, and the fsync call is used to instruct writing data into a persistent storage medium. Wherein, the preset policy may be but not limited to: 1) Periodically trigger fsync call. 2) Using the file as a unit, an fsync call is triggered for each file; 3) the DPU 220 triggers the fsync call after a preset period of time after sending the write request, etc., which is not limited in this embodiment of the present application. Correspondingly, after receiving the fsync call, the DPU 220 writes the data in the memory 122 into the hard disk 126 . After the DPU 220 writes the data in the memory 122 into the hard disk 126, it returns an fsync call success response to the host. Specifically, the DPU 220 may write all data in the current internal memory 122 to the hard disk 126 , or may write data belonging to a specified file to the hard disk 126 . For example, the backup software may send a fysch call carrying a file handle corresponding to vm0.vmdk to DPU 220 to instruct DPU to write all data of vm0.vmdk in memory 122 into hard disk 126 . For the method of PDU220 obtaining data of other files (such as vm1.vmdk) and writing the data of vm1.vmdk into the local file system, please refer to the above description, and details will not be repeated here.

Step 411 , the DPU 220 obtains the data to be backed up from the hard disk 126 (for example, the first data), and performs deduplication and compression operations on the first data to obtain deduplication and compressed data (for example, the second data).

Exemplarily, when the DPU 220 writes the data into the corresponding file of the local file system, the background sequentially reads the data in the local file system from the file header for aggregation, and deduplicates and compresses a piece of aggregated data such as the first data, to get the second data. Wherein, the specific manner of performing the deduplication and compression operation on the first data by the DPU 220 may refer to the foregoing description, which will not be repeated here.

Step 412 , the DPU 220 sends the second data to the storage device 130 .

Step 413 , the storage device 130 writes the data into a corresponding file in the local file system of the storage device 130 .

It should be noted that the data sent by the DPU 220 to the storage device 130 each time may be partial data of a file, that is to say, the DPU 220 and the storage device 130 may repeat steps 411 to 413 multiple times to complete the data of a complete file. migrate. Then in this process, as an optional implementation, DPU220 can also record the offset position (such as offset) of the migrated data in the file, in other words record the offset position of the data to be read next time, as a migration The cursor for the task. For example, the data size of the file vm0.vmdk is 100M, the DPU reads the data of 0-60M of the file from the hard disk 126 for the first time, and records the offset position of 60M, and the next time it can start from the position of 60M of the file read data. Correspondingly, the DPU 220 may notify the storage device 130 of the offset position of the migrated data in the file.

After the DPU 220 has migrated all the file data to the storage device 130 , it may send indication information (such as second indication information) for indicating that the file data has been completely migrated to the storage device 130 . After receiving the second number indication information, the storage device 130 can modify the file attribute of the file to "regular", that is, a normal file, and the data identifying the file no longer points to the local data of the DPU 220 . Exemplarily, the indication information may be in the same data packet as the data of the file. For example, a data packet includes a header and a payload, and the header includes but is not limited to one or more of the following: the offset of the migrated data Shift position, second indication information. The second indication information may include 1 bit, and different values of this bit indicate whether the data of the file is completely sent to the storage device 130, for example, when the value is 0, it means that it has been completely sent, and when the value is 1, it means that it is not completely sent. send.

Optionally, after the execution is successful, the storage device 130 sends a successful execution response to the DPU 220 , and the DPU 220 can delete the data of the file locally on the DPU 220 after receiving the response. As shown in (a) of Figure 6, the DPU 220 sends the data of vm0.vmdk to the storage device 130, and the storage device 130 writes the data of vm0.vmdk into vm0.vmdk. After all writing is completed, refer to (b of Figure 6 ), the storage device 130 modifies the file attribute of vm0.vmdk to a normal file, and sends a successful execution response to the DPU 220, and then the DPU 220 can delete the vm0.vmdk in the local file system.

In the above manner, the DPU 220 can first temporarily write the file data to be backed up into the local file system, that is, into the hard disk 126. Since the file data has been stored persistently, the logical data backup is completed. In this way, data backup can be completed in the DPU220, the backup process only depends on the computing power of the DPU220 and the read-write bandwidth and size of the disk, and is no longer limited by the bandwidth performance from the host to the storage device 130 and the processing capability of the storage device 130, For big data backup scenarios, it can significantly improve backup performance and shorten the backup window. In addition, since the file data does not need to be deduplicated, and does not need to be sent to the storage device 130 for storage, it does not involve network communication overhead, which can significantly shorten the backup window and improve backup efficiency.

Based on the data processing method shown in FIG. 4 above, the embodiment of the present application may also provide a data access method.

FIG. 7 is a system provided by the embodiment of the present application. As shown in FIG. 7, the system includes host 0, host 1, DPU0, DPU1, host 2, and storage device 130, wherein host 0, DPU0, and storage device 130 are respectively It is the host 110, DPU220, and storage device 130 in FIG. 3, and host 1 is connected to DPU1.

Fig. 7 shows a schematic flow diagram corresponding to the data access method applied to the system, the flow includes:

1) The storage device 130 receives a data access request (for example, referred to as a first access request) sent by the host 1 through the DPU1.

Assume that the data access request is used to request access to data of FS0/vm1.vmdk. As shown in FIG. 7 , the file attribute of the file vm1.vmdk in the file system of the storage device 130 is a stub file, and the mapping address is DPU0.

Exemplarily, the first access request may access some data of vm1.vmdk, and the storage device 130 can judge whether this part of data is stored locally in the storage device 130 according to the recorded offset position of the migrated data of vm1.vmdk, if stored locally , then the storage device 130 can directly send the part of data to DPU1. Otherwise, storage device 130 retrieves the data from DPU0.

2) The storage device 130 sends a data access request (such as a second access request) to DPU0. Correspondingly, DPU0 receives the second data access request.

The second access request is used to request access to data of FS0/vm1.vmdk.

It should be noted that the first access request and the second access request may be the same or different, which is not limited in this embodiment of the present application. For example, when DPU0 can recognize the first access request, the storage device 130 can directly forward the first access request to DPU0, and at this time, the second access request is the first access request. Conversely, when DPU0 cannot recognize the first access request, storage device 130 generates a second access request that DPU0 can recognize based on the first access request, and at this time the second access request is different from the first access request.

3) DPU0 obtains the data of vm1.vmdk from the local file system of DPU0 in response to the second access request, and sends the data to the storage device 130 . Correspondingly, the storage device 130 receives the data sent by DPU0.

In an optional implementation manner, when the storage device 130 receives the data of vm1.vmdk sent by the DPU0, the storage device 130 may write the data of vm1.vmdk into the vm1.vmdk of the local file system of the storage node.

4) The storage node sends the data of vm1.vmdk to host 1.

This application does not limit the communication with the storage node and the PDU. For example, the storage device 130 can also receive the data access request sent by the host 2, and obtain the data requested by the host 2 through the above method, and return it to the host 2, which will not be repeated here. .

The foregoing method can provide flexibility in data access during the data backup process.

Based on the same inventive concept as the method embodiment, the embodiment of the present application further provides a data processing device, the data processing device is configured to execute the method executed by the DPU in the above method embodiment. As shown in FIG. 8 , the data processing device 800 includes a receiving module 801 , a processing module 802 and a sending module 803 ; specifically, in the data processing device 800 , the modules are connected through a communication channel.

The receiving module 801 is configured to receive a write request sent by the first host, and the write request carries the first data that needs to be backed up to the storage device 130; for the specific implementation, please refer to the description of step 201 in FIG. 2 , or refer to the steps in FIG. 4 The description of 407 will not be repeated here.

The processing module 802 is configured to perform deduplication and compression operations on the first data to obtain second data; for the specific implementation, please refer to the description of step 204 in FIG. 2, or refer to the description of step 411 in FIG. 4, here I won't repeat them here.

A sending module 803, configured to send the second data to the storage device 130. For a specific implementation, please refer to the description of step 205 in FIG. 2 , or refer to the description of step 412 in FIG. 4 , which will not be repeated here.

As a possible implementation manner, the data processing device 800 is a network card or a DPU.

As a possible implementation manner, after the receiving module 801 receives the write request, the processing module 802 is also configured to store the first data in the memory 122 of the data processing device; please refer to step 202 in FIG. 2 for the specific implementation manner. , or refer to the description of step 408 in FIG. 4 , which will not be repeated here.

The sending module is further configured to return a write request completion response to the first host; for a specific implementation, please refer to the description of step 203 in FIG. 2 , or refer to the description of step 409 in FIG. 4 , which will not be repeated here.

As a possible implementation manner, after the receiving module 801 receives the write request, the processing module 802 is further configured to store the first data in the memory 122 of the data processing device; please refer to step 408 in FIG. 4 for the specific implementation manner. description, which will not be repeated here. The processing module 802 is further configured to obtain the first data from the memory 122 and store it in the persistent storage medium of the data processing device; for the specific implementation, please refer to the description of step 410 in FIG. repeat.

As a possible implementation manner, the data processing apparatus 800 further includes a first file system, where the first file system is the same as the second file system of the storage device 130 . The receiving module 801 is also used to: receive a file creation request, the file creation request is used to request to create a first file under the file system of the storage device 130; please refer to the description of step 402 in FIG. repeat.

The sending module 803 is further configured to: send the file creation request to the storage device; for a specific implementation, please refer to the description of step 403 in FIG. 4 , which will not be repeated here.

The receiving module 801 is also configured to: receive a successful creation response sent by the storage device 130; for a specific implementation, please refer to the description of step 405 in FIG. 4 , which will not be repeated here.

The processing module 802 is further configured to: create the first file in the first file system; for a specific implementation, please refer to the description of step 406 in FIG. 4 , which will not be repeated here.

If the write request sent by the first host is used to request to store the first data in the first file of the second file system;

Then, when storing the first data in the persistent storage medium of the data processing device, the processing module 802 is specifically configured to: write the first data into the first file system of the first file system. in the file. For the specific implementation manner, please refer to the description of step 411 in FIG. 4 , which will not be repeated here.

As a possible implementation manner, the processing module 802 is further configured to: delete the first file stored in the device after storing the data of the first file in the storage device through a deduplication and compression operation.

As a possible implementation, the receiving module 801 is also configured to receive at least part of the data read request sent by the storage device 130 for requesting to read the first file; for the specific implementation, please refer to step 2 in FIG. 7 description and will not be repeated here. The sending module is further configured to send at least part of the data of the first file to the storage device 130 . For the specific implementation, please refer to the description of step 3 in FIG. 7 , which will not be repeated here.

The embodiment of the present application also provides a computer storage medium, the computer storage medium stores computer instructions, and when the computer instructions are run on the storage device, the storage device executes the above-mentioned related method steps to realize the DPU120 in the above-mentioned embodiment. For the method of execution, refer to the description of steps 201 to 205 in FIG. 2 , which will not be repeated here, or perform the above-mentioned related method steps to realize the method performed by the DPU220 in the above embodiment, refer to steps 401 to 403 in FIG. 4 . The description of steps 405-412 will not be repeated here.

The embodiment of the present application also provides a computer program product. When the computer program product is run on a computer, it causes the computer to execute the above-mentioned related steps, so as to realize the method performed by the DPU 120 in the above-mentioned embodiment, see step 201 in FIG. 2 The description of ~step 205 will not be repeated here, or the above-mentioned related method steps are executed to realize the method executed by the DPU220 in the above-mentioned embodiment, refer to the description of steps 401-403 and steps 405-412 in FIG. Let me repeat.

In addition, an embodiment of the present application also provides a device, which may specifically be a chip, a component or a module, and the device may include a connected processor and a memory; wherein the memory is used to store computer-executable instructions, and when the device is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the methods executed by the DPU120 in the above-mentioned method embodiments, refer to the description of steps 201 to 205 in FIG. Method steps to implement the method executed by the DPU 220 in the above embodiment, refer to the description of steps 401 to 403 and steps 405 to 412 in FIG. 4 and will not repeat them here.

Among them, the data processing device, computer storage medium, computer program product or chip provided in the embodiment of the present application are all used to execute the method corresponding to the DPU120, DPU220 or storage device 130 provided above, therefore, the beneficial effects it can achieve Reference can be made to the beneficial effects of the corresponding methods provided above, and details will not be repeated here.

Through the description of the above embodiments, those skilled in the art can understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be assigned by different Completion of functional modules means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or It may be integrated into another device, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

A unit described as a separate component may or may not be physically separated, and a component shown as a unit may be one physical unit or multiple physical units, which may be located in one place or distributed to multiple different places. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit (or module) in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If an integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium Among them, several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk.

Optionally, the computer-executed instructions in the embodiments of the present application may also be referred to as application program codes, which is not specifically limited in the embodiments of the present application.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.

The various illustrative logic units and circuits described in the embodiments of the present application can be implemented by a general-purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, Discrete gate or transistor logic, discrete hardware components, or any combination of the above designed to implement or operate the described functions. The general-purpose processor may be a microprocessor, and optionally, the general-purpose processor may also be any conventional processor, controller, microcontroller or state machine. A processor may also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration to accomplish.

The steps of the method or algorithm described in the embodiments of the present application may be directly embedded in hardware, a software unit executed by a processor, or a combination of both. The software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other storage medium in the art. Exemplarily, the storage medium can be connected to the processor, so that the processor can read information from the storage medium, and can write information to the storage medium. Optionally, the storage medium can also be integrated into the processor. The processor and storage medium can be provided in an ASIC.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

Although the application has been described in conjunction with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely illustrative of the application as defined by the appended claims and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of this application. Apparently, those skilled in the art can make various changes and modifications to the present application without departing from the scope of the present application. In this way, if these modifications and variations of the application fall within the scope of the claims of the application and their equivalent technologies, the application also intends to include these modifications and variations.

Claims

A data backup system, characterized in that the system includes a data processing device and a storage device,

The data processing device is configured to receive a write request sent by a first host, where the write request carries first data that needs to be backed up to the storage device;

The data processing device is configured to perform deduplication and compression operations on the first data to obtain second data;

The data processing device is used for writing the second data into the storage device.
The system according to claim 1, wherein the data processing device is a network card or a data processing unit (DPU).
The system according to claim 1 or 2, wherein after the data processing device receives the write request, it is further configured to store the first data in the memory of the data processing device, and send the first data to the The first host returns a write request completion response;

When performing a deduplication operation on the first data, the data processing device is specifically configured to:

The first data is acquired from the internal memory, and a data block duplicated with a data block already stored in the storage device is deleted from the first data.
The system according to claim 1 or 2, wherein after the data processing device receives the write request, it is further configured to store the first data in the memory of the data processing device, and send the first data to the The first host returns a write request completion response;

obtaining the first data from the internal memory and storing it in a persistent storage medium of the data processing device;

The data processing device also includes a persistent storage medium. When performing deduplication and compression operations on the first data, the data processing device is specifically used for:

Acquiring the first data from the persistent storage medium; deleting a data block duplicated with a data block already stored in the storage device from the first data, so as to generate the second data.
The system according to claim 4, wherein the data processing device further comprises a first file system, the first file system is the same as the second file system of the storage device, and the write request is based on the A write request for the second file system;

When storing the first data to the persistent storage medium of the data processing device, the data processing device is specifically configured to store the first data to the data processing device through the first file system persistent storage media.
The system according to claim 4 or 5, wherein the data processing device is further configured to: receive a file creation request; forward the file creation request to the storage device;

The storage device is further configured to: create the first file requested by the file creation request in the second file system of the storage device, and generate mapping address information of the first file, the mapping address information Indicating that the data of the first file is located in the data processing device, or the access path of the data of the first file in the data processing device; sending a creation success response to the data processing device;

The data processing apparatus is further configured to: receive a creation success response sent by the storage device, and create the first file in the first file system;

The write request is used to request to store the first data to a first file of the second file system;

When storing the first data to the persistent storage medium of the data processing device, the data processing device is specifically configured to: write the first data into the first file.
The system according to claim 6, wherein the data processing device is further used for:

After storing the data of the first file in the storage device through deduplication and compression operation, deleting the first file stored in the data processing device;

The storage device is configured to: modify the mapping address information of the first file to indicate data of the first file and store it in the storage device.
The system of claim 6, wherein,

The storage device is further configured to: receive a first read request sent by the second host, the first read request is used to read at least part of the data of the first file; determine that at least part of the data of the first file is located in the data processing device, sending a second read request for reading at least part of the data of the first file to the data processing device;

The data processing device is further configured to: read at least part of the data of the first file from the data processing device according to the second read request, and send the data to the storage device.
A data processing device, characterized in that the device comprises:

A receiving module, configured to receive a write request sent by the first host, where the write request carries first data that needs to be backed up to the storage device;

A processing module, configured to perform deduplication and compression operations on the first data to obtain second data;

A sending module, configured to send the second data to the storage device.
The device according to claim 9, characterized in that the device is a network card or a data processing unit (DPU).
The device according to claim 9 or 10, wherein after the receiving module receives the write request, the processing module is further configured to store the first data in the memory of the data processing device ;

The sending module is further configured to: return a write request completion response to the first host;

When performing a deduplication operation on the first data, the processing module is specifically configured to:

The first data is acquired from the internal memory, and a data block duplicated with a data block already stored in the storage device is deleted from the first data.
The device according to claim 9 or 10, wherein after the receiving module receives the write request, the processing module is further configured to store the first data in the memory of the data processing device ; The sending module is also used to: return a write request completion response to the first host;

The processing module is further configured to: obtain the first data from the internal memory and store it in a persistent storage medium of the data processing device;

When performing deduplication and compression operations on the first data, the processing module is specifically used to:

Acquiring the first data from the persistent storage medium; deleting a data block duplicated with a data block already stored in the storage device from the first data.
The device according to claim 12, further comprising a first file system, the first file system is the same as the second file system of the storage device, and the write request is based on the first Two file system write requests;

When storing the first data to the persistent storage medium of the data processing device, the processing module is specifically configured to: store the first data to the data processing device through the first file system persistent storage media.
The device according to claim 12 or 13, wherein the receiving module is further configured to: receive a file creation request;

The sending module is further configured to: send the file creation request to the storage device;

The receiving module is further configured to: receive a creation success response sent by the storage device;

The processing module is further configured to: create the first file in the first file system;

The write request is used to request to store the first data to a first file of the second file system;

When storing the first data to the persistent storage medium of the data processing device, the processing module is specifically configured to: write the first data into the first file of the first file system .
The device according to claim 14, wherein the processing module is further configured to: delete the data stored in the device after storing the data of the first file in the storage device through a deduplication and compression operation. The first file in .
The apparatus of claim 14 wherein,

The receiving module is further configured to receive a read request sent by the storage device, where the read request is used to request to read at least part of the data of the first file;

The sending module is further configured to send at least part of the data of the first file to the storage device.
A computing device, characterized in that the computing device includes a processor and a memory;

The memory is used to store computer program instructions;

Execution by the processor invokes computer program instructions in the memory to perform the functions of the data processing apparatus according to any one of claims 9 to 16.
A computer-readable storage medium, characterized in that it includes a computer-readable storage medium storing program codes, and the program codes include instructions for executing the data processing device according to any one of claims 9 to 16 function.