CN109086172B - Data processing method and related device - Google Patents

Data processing method and related device Download PDF

Info

Publication number
CN109086172B
CN109086172B CN201811108304.4A CN201811108304A CN109086172B CN 109086172 B CN109086172 B CN 109086172B CN 201811108304 A CN201811108304 A CN 201811108304A CN 109086172 B CN109086172 B CN 109086172B
Authority
CN
China
Prior art keywords
target data
storage device
identifier
data packet
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811108304.4A
Other languages
Chinese (zh)
Other versions
CN109086172A (en
Inventor
何孝金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201811108304.4A priority Critical patent/CN109086172B/en
Publication of CN109086172A publication Critical patent/CN109086172A/en
Application granted granted Critical
Publication of CN109086172B publication Critical patent/CN109086172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data processing method, which comprises the following steps: reading target data by a first storage device; the method comprises the steps that a first storage device detects whether target data have a corresponding first identifier and a corresponding second identifier, wherein the second identifier is used for marking that the target data are compressed; and if the target data has the corresponding first identifier and the corresponding second identifier, the first storage device sends a first target data packet to the second storage device. The embodiment of the application also discloses a data processing device. The embodiment of the application reduces the data processing burden of the main storage device and the backup storage device.

Description

Data processing method and related device
Technical Field
The present application relates to the field of data storage, and in particular, to a data processing method and related apparatus.
Background
The remote copy technology is a remote data backup technology based on a storage device, and is generally divided into synchronous remote copy and asynchronous remote copy. The main principle of synchronous remote copy is that data needs to be written to a main storage device and a backup storage device at the same time, and the main principle of asynchronous remote copy is that data is written to the main storage device first and then copied from the main storage device to the backup storage device.
In the big data era, mass data storage occupies a large amount of storage space. The deduplication processing technology and the compression processing technology are core technologies that can reduce a data storage space at present, and especially in an All Flash Array (AFA) with a high storage space cost, the deduplication processing technology and the compression processing technology have become necessary characteristics in the AFA. The deduplication processing technique is generally: and calculating a hash value of the newly written data, comparing the hash value with the stored hash value, recording the position of the data corresponding to the same hash value if the same hash value is found, and not writing the current data into the storage equipment.
However, in the existing remote copy technology, when copying target data between a main storage device and a backup storage device, it is not sensed whether the target data is subjected to deduplication processing and compression processing, so that when copying the data, even if the target data is subjected to deduplication processing and compression processing, the main storage device decompresses the target data and then sends the decompressed target data to the backup storage device, and the backup storage device performs deduplication processing and compression processing again on the decompressed target data. Therefore, the burden of data processing of the main storage device and the backup storage device is increased, and the amount of data transmitted between the main storage device and the backup storage device is large, so that the Recovery Point Object (RPO) is high during asynchronous remote copy.
Disclosure of Invention
The embodiment of the application provides a data processing method which is used for remote data backup of a storage device.
In view of the above, a first aspect of the present application provides a data processing method, including:
reading target data by first storage equipment;
the first storage device detects whether the target data has a corresponding first identifier and a corresponding second identifier, wherein the second identifier is used for marking that the target data is compressed;
if the target data has the corresponding first identifier and the corresponding second identifier, the first storage device sends a first target data packet to a second storage device, so that the second storage device processes the target data according to the first target data packet;
the first target data packet at least carries the target data, the first identifier and the second identifier, the first identifier is used for indicating the second storage device to perform deduplication processing on the target data according to the first identifier, and the second identifier is used for indicating the second storage device to perform write processing on the target data.
With reference to the first aspect of the embodiment of the present application, in a first possible implementation manner of the first aspect, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data does not have the corresponding first identifier and the corresponding second identifier, the first storage device sends a second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries the target data and a third identifier, where the third identifier is used to instruct the second storage device to perform compression processing on the target data.
With reference to the first possible implementation manner of the first aspect of the embodiment of the present application, in a second possible implementation manner of the first aspect, after the first storage device reads the target data, the method further includes:
if the target data is stored in the cache region of the first storage device, the first storage device sends the second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
wherein, the second data packet at least carries the target data and the third identifier.
With reference to the first aspect of the embodiment of the present application, in a third possible implementation manner of the first aspect, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data has the corresponding first identifier and does not have the corresponding second identifier, the first storage device sends a third target data packet to the second storage device, so that the second storage device processes the target data according to the third target data packet;
wherein the third destination data packet at least carries the destination data, the first identifier and the third identifier.
With reference to the first aspect of the embodiment of the present application, in a fourth possible implementation manner of the first aspect, after the detecting, by the first storage device, whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data does not have the corresponding first identifier and has the corresponding second identifier, the first storage device sends a fourth target data packet to the second storage device, so that the second storage device processes the target data according to the fourth target data packet;
wherein, the fourth target data packet at least carries the target data and the second identifier.
A second aspect of the present application provides a data processing apparatus comprising:
the reading module is used for reading target data;
the detection module is used for detecting whether the target data has a corresponding first identifier and a corresponding second identifier, wherein the second identifier is used for marking that the target data is compressed;
a sending module, configured to send a first target data packet to a second storage device if the target data has the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the first target data packet;
the first target data packet at least carries the target data, the first identifier and the second identifier, the first identifier is used for indicating the second storage device to perform deduplication processing on the target data according to the first identifier, and the second identifier is used for indicating the second storage device to perform write processing on the target data.
With reference to the second aspect of the embodiments of the present application, in a first possible implementation manner of the second aspect, a data processing apparatus is provided, which includes:
the sending module is further configured to send a second target data packet to the second storage device if the target data does not have the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries the target data and a third identifier, where the third identifier is used to instruct the second storage device to compress the target data.
With reference to the first possible implementation manner of the second aspect of the embodiment of the present application, in a second possible implementation manner of the second aspect, a data processing apparatus is provided, which includes:
the sending module is further configured to send the second target data packet to the second storage device if the target data is stored in the cache area of the first storage device, so that the second storage device processes the target data according to the second target data packet;
wherein, the second data packet at least carries the target data and the third identifier.
With reference to the second aspect of the embodiment of the present application, in a third possible implementation manner of the second aspect, a data processing apparatus is provided, including:
the sending module is further configured to send a third target data packet to the second storage device if the target data has the corresponding first identifier and does not have the corresponding second identifier, so that the second storage device processes the target data according to the third target data packet;
wherein the third target data packet at least carries the target data, the first identifier and the third identifier.
With reference to the second aspect of the embodiment of the present application, in a fourth possible implementation manner of the second aspect, a data processing apparatus is provided, including:
the sending module is further configured to send a fourth target data packet to the second storage device if the target data does not have the corresponding first identifier and has the corresponding second identifier, so that the second storage device processes the target data according to the fourth target data packet;
wherein, the fourth target data packet at least carries the target data and the second identifier.
According to the technical scheme, the embodiment of the application has the following advantages:
the embodiment of the application provides a data processing method which is used for remote data backup of a storage device. The data processing burden of the main storage device and the backup storage device is reduced, the data volume transmitted between the main storage device and the backup storage device is reduced, and the recovery point target in asynchronous remote copying is reduced.
Drawings
FIG. 1 is a schematic diagram of a network framework of a storage device according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of data processing in an application scenario of the present application;
FIG. 3 is a schematic diagram of an embodiment of a method for data processing in an embodiment of the present application;
fig. 4 is a schematic diagram of an embodiment of a data processing apparatus in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a data processing method which is used for remote data backup of a storage device. The data processing burden of the main storage device and the backup storage device is reduced, the data volume transmitted between the main storage device and the backup storage device is reduced, and the recovery point target in asynchronous remote copying is reduced.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, please refer to fig. 1, where fig. 1 is a schematic diagram of a network framework of a storage device in an embodiment of the present application. Although fig. 1 includes one main storage device and one backup storage device, it should be understood that the type and number of the main storage device and the type and number of the backup storage device are determined according to actual situations, and in practical applications, the type and number of the main storage device and the type and number of the backup storage device are not limited, and the main storage device and the backup storage device may be a single storage device or a storage array composed of a plurality of storage devices, where the storage device may be a Solid State Drive (SSD), a Hybrid Hard Drive (HHD), a mechanical Hard Disk Drive (HDD), an optical disk library, a magnetic tape library, or the like, and when the main storage device and the backup storage device are a storage array composed of a plurality of storage devices, the storage device may be composed of one or more of the SSD, the HHD, the HDD, the optical disk library, and the magnetic tape library, and the like, and the present invention is not limited herein. Data communication between the main storage device and the backup storage device may be transmitted through a transmission control protocol/internet protocol (TCP/IP).
The present application is applicable to a remote copy technology of data, wherein the remote copy technology of data is generally divided into synchronous remote copy and asynchronous remote copy. Synchronous remote copy refers to copying data of a main storage device to a backup storage device in a synchronous mirror mode through remote mirror image software, and input/output (in/out, I/O) transactions of each main storage device need to wait for completion confirmation information of remote copy to be released. Synchronous mirroring enables the remote copy to always match what the local machine requires to be replicated. When the main storage device fails, the mirrored remote copy can ensure that the service continues to be executed without data loss after the application program of the user is switched to the backup storage device. Asynchronous remote copy refers to ensuring that the primary I/O operations of the primary storage system are completed before the backup storage device is updated, the I/O operations of the primary storage device not being affected by the I/O operations of the backup storage device. The remote data replication is carried out in a background synchronization mode, so that the performance of a local system is slightly influenced, the transmission distance is long (up to over 1000 kilometers), and the requirement on network bandwidth is low.
The following detailed description is from the perspective of the primary storage device and the backup storage device. For convenience of understanding, an application scenario of the method for data processing will be described below with reference to fig. 2, please refer to fig. 2, and fig. 2 is a schematic flow chart of data processing in the application scenario of the present application, and as shown in the figure, specifically:
in step S1, in the remote copying process of data, when data needs to be subjected to data synchronization operation, a primary storage device generates a difference bitmap, where the difference bitmap is used to mark data on the primary storage device, which is different from a backup storage device, where the data is usually newly written on the primary storage device, and at this time, the data needs to be written on the backup storage device, so as to complete data synchronization operation, the primary storage device may find a logical volume address of the data to be copied according to the difference bitmap, in an actual application, the data is usually stored in the storage device in the form of a data block (data block), and a data block including the target data is referred to as a target data block, and since data is processed through the logical volume address at a software level, in the data processing method provided by the present application, a logical volume address of a target data block in the primary storage device is first obtained;
in step S2, after the logical volume address of the target data block in the main storage device is obtained, the target data block can be read according to the address, because the data is subjected to deduplication processing, the data is first calculated to generate corresponding fingerprint information, the fingerprint information is used to instruct the main storage device to use the fingerprint information to find whether the same stored fingerprint information exists in the main storage device, if so, the address of the data corresponding to the same stored fingerprint information is recorded, the data to be written currently is not written in the storage device, a mapping relationship is established with the address of the recorded corresponding data, deduplication processing is completed, and usually, the fingerprint information is stored in the block head of the data block of the data. After the data is compressed, an identifier is generated to identify that the data is compressed, and the identifier of the compressed data is usually stored in a block header of a data block of the data. After the logical volume address of the target data block in the main storage device is acquired, the target data block can be read according to the address, and whether the data is subjected to deduplication compression or not is judged by searching whether fingerprint information exists in the target data block and an identification corresponding to compression processing. If the fingerprint information and the identifier corresponding to the compression processing exist in the target data block, whether the data is subjected to the deduplication processing and the compression processing is judged, if yes, the step S4 is executed, if the fingerprint information and the identifier corresponding to the compression processing do not exist in the target data block, whether the data is subjected to the deduplication processing and the compression processing is judged, and if not, the step S3 is executed.
In step S3, after finding that there is no fingerprint information and no identifier corresponding to compression processing in the target data block, the main storage device directly sends the corresponding data in the target data block to the backup storage device, and the backup storage device further processes the received data according to its own service processing requirements.
In step S4, when finding that there are fingerprint information and an identifier corresponding to compression processing in the target data block, and after determining that the data is subjected to deduplication processing and compression processing, the main storage device queries deduplication fingerprint information of the target data stored at the block head in the target data block through the target data block address, if a plurality of batches of data blocks need to be copied at the same time, step S5 is performed, and if the data block needing to be copied is a single data block, step S6 is performed;
in step S5, after the fingerprint information corresponding to the target data is queried, when a plurality of batches of data blocks need to be copied at the same time at present, the main storage device may compare the fingerprint information of other data blocks that need to be copied at the same time with the fingerprint, and if the same fingerprint exists, one data block is retained, and information of other data blocks that need to be copied at the same time is recorded, so as to perform deduplication processing.
In step S6, after the deduplication fingerprint information corresponding to the target data is found, the target data corresponding to the deduplication fingerprint information that is read is compressed because the target data has been subjected to compression processing.
In step S7, after the compressed target data is read, the main storage device sends the logical volume address of the target data block, the deduplication fingerprint, and the read compressed data to the backup storage device in the form of a data packet, and the backup storage device further processes the received data according to its own service processing requirements.
In the scheme, the main storage device queries a target data block where the target data is located before sending the target data to the backup storage device, and queries whether the target data block contains the deduplication fingerprint information and the compressed identifier, if so, the deduplication fingerprint information, the compressed identifier and the compressed target data are sent to the backup storage device in a data packet mode, and the backup storage device can directly use the deduplication fingerprint information to perform deduplication processing according to the deduplication fingerprint information and the compressed identifier without repeatedly calculating the target data. The backup storage device can also judge that the currently received target data has been compressed according to the received identifier of the compression processing, so that the target data can be directly written in without being compressed again. The data processing burden of the main storage device and the backup storage device is reduced, the data volume transmitted between the main storage device and the backup storage device is reduced, and the recovery point target in asynchronous remote copying is reduced.
Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of a data processing method in an embodiment of the present application, where the embodiment of the data processing method in the embodiment of the present application includes:
101. reading target data by a first storage device;
in this embodiment, the first storage device determines target data by obtaining a difference bitmap or a difference identifier of different data recorded between the first storage device and the second storage device, and reads the target data according to a logical volume address where the target data is located, where the logical volume address is a location code, and the location information of the data is represented by an arabic number, for example: the logical volume address corresponding to data 1 is 1, the logical volume address corresponding to data 2 is 2, and so on.
102. The method comprises the steps that a first storage device detects whether target data have corresponding first identification and corresponding second identification;
in this embodiment, when the target data is subjected to deduplication processing, the storage device performs hash operation on the target data to generate a corresponding hash value (hash), where the hash value is referred to as fingerprint information of the data, and is referred to as a first identifier in this embodiment, and a hash algorithm needs to be used for the hash operation, where the hash algorithm applicable in this application may include: the hash algorithm may be, but is not limited to, an xxhash algorithm, an MD hash algorithm, a SHA-1 hash algorithm, a SHA-2 hash algorithm, an MD5 hash algorithm, and the like. After the target data is compressed, the storage device generates an identifier for marking that the target data has been compressed, where the identifier is referred to as a second identifier in this embodiment. The first identifier, the second identifier, and the logical volume address of the target data are metadata of the target data, and the metadata is usually stored in a block header of a data block stored in the target data, and may also be stored in different areas according to settings of different storage vendors, for example, the metadata is stored in a non-volatile memory in the storage device, which is not limited herein. The first storage device detects whether the first identifier and the second identifier exist in the storage device.
103. If the target data has the corresponding first identifier and the corresponding second identifier, the first storage device sends a first target data packet to the second storage device;
in this embodiment, after the first storage device detects that a first identifier corresponding to the target data and a second identifier corresponding to the target data exist in the storage device, the first storage device uses the obtained first identifier, the obtained second identifier, and the target data to make a first target data packet, where the first target data packet further includes a logical volume address of the target data. The first storage device sends the first target data packet to the second storage device through a TCP/IP protocol, after receiving the first target data packet, the second storage device can use the first identifier in the first target data packet to detect whether fingerprint information identical to the first identifier exists in the second storage device, if so, the target data in the first target data packet is not written in, the logical volume address of the target data is recorded, a mapping relation is established between the logical volume address and the address of corresponding data with identical fingerprint information, and the deduplication processing of the target data in the second storage device is completed. The second storage device may determine that the currently received target data is compressed data according to the second identifier in the first target data packet, and therefore, it is not necessary to compress the target data.
In this embodiment of the application, before sending target data to the second storage device, the first storage device may first detect a target data block where the target data is located, detect whether there are deduplication fingerprint information and an identifier of compression processing, and if so, send the deduplication fingerprint information and the identifier of compression processing and the compressed target data to the second storage device in a form of a data packet. The second storage device may perform deduplication processing directly using the deduplication fingerprint information without repeatedly calculating the target data according to the deduplication fingerprint information and the identification of the compression processing. The second storage device may also determine that the currently received target data has been subjected to compression processing according to the received identifier of the compression processing, and therefore, the target data does not need to be subjected to compression processing again and can be directly written in. The data processing burden of the first storage device and the second storage device is reduced, the data volume transmitted between the first storage device and the second storage device is reduced, and the recovery point target in asynchronous remote copying is reduced.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an embodiment of the second data processing method provided in the embodiment of the present application, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data does not have the corresponding first identifier and the corresponding second identifier, the first storage device sends a second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries target data and a third identifier, and the third identifier is used for indicating the second storage device to compress the target data.
In this embodiment, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, if the target data does not have the corresponding first identifier and the corresponding second identifier, that is, the target data has not been subjected to deduplication processing and compression processing, the first storage device obtains the third identifier, and uses the third identifier and the target data to make a second target data packet, where the second target data packet further includes a logical volume address of the target data. And the third identifier is a newly-established identifier for identifying the target data as uncompressed data after the first storage device detects that the target data is uncompressed. And the second storage device determines that the target data in the second target data packet is uncompressed data according to the third identifier carried in the received second target data packet, and meanwhile, because the second target data packet does not carry the first identifier, the second storage device can select whether to perform deduplication processing and compression processing on the target data according to the self requirement.
In the embodiment of the application, after the first storage device detects that the target data does not have the first identifier and the second identifier, a second target data packet is sent to the second storage device, where the second target data packet carries the target data and the third identifier. The data processing method without the target data being subjected to the deduplication processing and the compression processing is provided, and the implementation flexibility of the scheme is improved.
Optionally, on the basis of the second data processing method provided in the embodiment of the present application, in a third data processing method provided in the embodiment of the present application, after the first storage device reads the target data, the method further includes:
if the target data is stored in the cache region of the first storage device, the first storage device sends a second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
and the second data packet at least carries target data and a third identifier.
In this embodiment, after the first storage device reads the target data, the first storage device may determine whether a current storage location of the target data is a cache area (cache) of the first storage device, and if so, skip the step of detecting whether the first identifier and the second identifier exist in the first storage device, and send the second target data packet to the second storage device. The processing procedure executed by the second storage device after receiving the second target data packet is similar to the embodiment of the second data processing method provided in this embodiment, and is not described herein again.
In this embodiment, when the target data is stored in the cache region of the first storage device, since the data in the cache region is not subjected to deduplication processing and compression processing, it may be directly determined that the target data is data that is not subjected to deduplication processing and compression processing, and the first storage device sends the second target data packet to the second storage device. The data processing method is provided when the target data are stored in the cache region, the processing flow of the first storage device to the target data is simplified, and the feasibility of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an embodiment of a fourth data processing method provided in the embodiment of the present application, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data has the corresponding first identifier and does not have the corresponding second identifier, the first storage device sends a third target data packet to the second storage device, so that the second storage device processes the target data according to the third target data packet;
and the third target data packet at least carries target data, a first identifier and a third identifier.
In this embodiment, when the first storage device detects that the target data has the corresponding first identifier and does not have the corresponding second identifier, that is, the target data is subjected to deduplication processing and is not subjected to compression processing, at this time, the first storage device sends a third target data packet to the second storage device, where the third target data packet carries the target data, the first identifier, the third identifier, and a logical volume address of the target data. After the second storage device receives the third target data packet, the second storage device may use the first identifier in the first target data packet to detect whether fingerprint information identical to the first identifier exists in the second storage device, if so, the target data in the first target data packet is not written, and a logical volume address of the target data is recorded, a mapping relationship is established with an address of corresponding data in which the identical fingerprint information exists, and deduplication processing of the target data in the second storage device is completed.
In the embodiment of the application, a method for processing target data which is subjected to deduplication processing and is not subjected to compression processing is provided, the second storage device performs deduplication processing on the target data according to a received third target data packet, and selects whether to perform compression processing on the target data according to the self requirement, so that the implementation flexibility of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an embodiment of a fifth data processing method provided in the embodiment of the present application, after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further includes:
if the target data does not have the corresponding first identifier and has the corresponding second identifier, the first storage device sends a fourth target data packet to the second storage device, so that the second storage device processes the target data according to the fourth target data packet;
and the fourth target data packet at least carries target data and a second identifier.
In this embodiment, when the first storage device detects that the target data does not have the corresponding first identifier and has the corresponding second identifier, that is, the target data is not subjected to deduplication processing and is subjected to compression processing, the first storage device sends a fourth target data packet to the second storage device, where the fourth target data packet carries the target data, the second identifier, and a logical volume address of the target data. Since the fourth target data packet does not carry the first identifier, the second storage device may select whether to perform deduplication processing on the target data according to its own requirements. The second storage device may determine that the currently received target data is compressed data according to the second identifier in the fourth target data packet, so that it is not necessary to compress the target data.
In the embodiment of the application, a method for processing target data which is not subjected to deduplication processing and is subjected to compression processing is provided, and the second storage device selects whether to perform deduplication processing on the target data according to the self requirement, so that the implementation flexibility of the scheme is improved.
Referring to fig. 4, fig. 4 is a schematic view of an embodiment of a data processing apparatus 20 according to the present application, and in a first embodiment of the data processing apparatus 20 according to the present application, the data processing apparatus 20 includes:
a reading module 201, configured to read target data;
a detection module 202, configured to detect whether target data has a corresponding first identifier and a corresponding second identifier, where the second identifier is used to mark that the target data is compressed;
the sending module 203 is configured to send a first target data packet to the second storage device if the target data has the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the first target data packet.
In this embodiment, the reading module 201 reads target data, and the detecting module 202 detects whether the target data has a corresponding first identifier and a corresponding second identifier, where the second identifier is used to mark that the target data is compressed, and if the target data has the corresponding first identifier and the corresponding second identifier, the sending module 203 sends a first target data packet to a second storage device, so that the second storage device processes the target data according to the first target data packet.
In this embodiment of the application, before sending target data to the second storage device, the first storage device may first detect a target data block where the target data is located, detect whether there are deduplication fingerprint information and an identifier of compression processing, and if so, send the deduplication fingerprint information and the identifier of compression processing and the compressed target data to the second storage device in a form of a data packet. The second storage device may perform deduplication processing directly using the deduplication fingerprint information without repeatedly calculating the target data according to the deduplication fingerprint information and the identification of the compression processing. The second storage device can also judge that the currently received target data has been compressed according to the received identifier of the compression processing, so that the target data can be directly written in without needing to be compressed again. The data processing burden of the first storage device and the second storage device is reduced, the data volume transmitted between the first storage device and the second storage device is reduced, and the recovery point target in asynchronous remote copying is reduced.
Alternatively, on the basis of the embodiment corresponding to fig. 4, in an embodiment of the second data processing apparatus provided in the embodiment of the present application,
the sending module 203 is further configured to send a second target data packet to the second storage device if the target data does not have the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries target data and a third identifier, and the third identifier is used for indicating the second storage device to compress the target data.
In the embodiment of the application, after the first storage device detects that the target data does not have the first identifier and the second identifier, a second target data packet is sent to the second storage device, and the second target data packet carries the target data and the third identifier. The method for processing the data without the deduplication processing and the compression processing of the target data is provided, and the implementation flexibility of the scheme is improved.
Optionally, on the basis of the embodiment of the second data processing apparatus provided in the embodiment of the present application, in an embodiment of a third data processing apparatus provided in the embodiment of the present application,
the sending module 203 is further configured to send a second target data packet to the second storage device if the target data is stored in the cache area of the first storage device, so that the second storage device processes the target data according to the second target data packet;
and the second data packet at least carries the target data and the third identifier.
In this embodiment, when the target data is stored in the cache region of the first storage device, since the data in the cache region is not subjected to deduplication and compression, it may be directly determined that the target data is data that is not subjected to deduplication and compression, and the first storage device sends the second target data packet to the second storage device. The data processing method is provided when the target data are stored in the cache region, the processing flow of the first storage device to the target data is simplified, and the feasibility of the scheme is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 4, in an embodiment of a fourth data processing apparatus provided in the embodiment of the present application,
the sending module 203 is further configured to send a third target data packet to the second storage device if the target data has the corresponding first identifier and does not have the corresponding second identifier, so that the second storage device processes the target data according to the third target data packet;
and the third target data packet at least carries target data, a first identifier and a third identifier.
In the embodiment of the application, a method for processing target data which is subjected to deduplication processing and is not subjected to compression processing is provided, the second storage device performs deduplication processing on the target data according to a received third target data packet, and selects whether to perform compression processing on the target data according to the self requirement, so that the implementation flexibility of the scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 4, in an embodiment of a fifth data processing apparatus provided in this application embodiment,
the sending module 203 is further configured to send a fourth target data packet to the second storage device if the target data does not have the corresponding first identifier and has the corresponding second identifier, so that the second storage device processes the target data according to the fourth target data packet;
and the fourth target data packet at least carries target data and a second identifier.
In the embodiment of the application, a method for processing target data which is not subjected to deduplication processing and is subjected to compression processing is provided, and the second storage device selects whether to perform deduplication processing on the target data according to the requirement of the second storage device, so that the implementation flexibility of the scheme is improved.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims (6)

1. A method of data processing, comprising:
reading target data by first storage equipment;
the first storage device detects whether the target data has a corresponding first identifier and a corresponding second identifier, wherein the second identifier is used for marking that the target data is compressed;
if the target data has the corresponding first identifier and the corresponding second identifier, the first storage device sends a first target data packet to a second storage device, so that the second storage device processes the target data according to the first target data packet;
the first target data packet at least carries the target data, the first identifier and the second identifier, the first identifier is used to instruct the second storage device to perform deduplication processing on the target data according to the first identifier, and the second identifier is used to instruct the second storage device to perform write processing on the target data;
if the target data does not have the corresponding first identifier and the corresponding second identifier, the first storage device sends a second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries the target data and a third identifier, and the third identifier is used for indicating the second storage device to compress the target data;
after the first storage device reads the target data, the method further comprises:
if the target data is stored in the cache region of the first storage device, the first storage device sends the second target data packet to the second storage device, so that the second storage device processes the target data according to the second target data packet;
wherein the second data packet at least carries the target data and the third identifier.
2. The method of claim 1, wherein after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further comprises:
if the target data has the corresponding first identifier and does not have the corresponding second identifier, the first storage device sends a third target data packet to the second storage device, so that the second storage device processes the target data according to the third target data packet;
wherein the third target data packet at least carries the target data, the first identifier and the third identifier.
3. The method of claim 1, wherein after the first storage device detects whether the target data has the corresponding first identifier and the corresponding second identifier, the method further comprises:
if the target data does not have the corresponding first identifier and has the corresponding second identifier, the first storage device sends a fourth target data packet to the second storage device, so that the second storage device processes the target data according to the fourth target data packet;
wherein the fourth target data packet at least carries the target data and the second identifier.
4. A data processing apparatus, comprising:
the reading module is used for reading target data;
the detection module is used for detecting whether the target data has a corresponding first identifier and a corresponding second identifier, wherein the second identifier is used for marking that the target data is compressed;
a sending module, configured to send a first target data packet to a second storage device if the target data has the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the first target data packet;
the first target data packet at least carries the target data, the first identifier and the second identifier, the first identifier is used to instruct the second storage device to perform deduplication processing on the target data according to the first identifier, and the second identifier is used to instruct the second storage device to perform write processing on the target data;
the sending module is further configured to send a second target data packet to the second storage device if the target data does not have the corresponding first identifier and the corresponding second identifier, so that the second storage device processes the target data according to the second target data packet;
the second data packet at least carries the target data and a third identifier, where the third identifier is used to instruct the second storage device to perform compression processing on the target data;
the sending module is further configured to send the second target data packet to the second storage device if the target data is stored in the cache area of the first storage device, so that the second storage device processes the target data according to the second target data packet;
wherein the second data packet at least carries the target data and the third identifier.
5. The data processing apparatus of claim 4,
the sending module is further configured to send a third target data packet to the second storage device if the target data has the corresponding first identifier and does not have the corresponding second identifier, so that the second storage device processes the target data according to the third target data packet;
wherein the third target data packet at least carries the target data, the first identifier and the third identifier.
6. The data processing apparatus of claim 4,
the sending module is further configured to send a fourth target data packet to the second storage device if the target data does not have the corresponding first identifier and has the corresponding second identifier, so that the second storage device processes the target data according to the fourth target data packet;
wherein the fourth target data packet at least carries the target data and the second identifier.
CN201811108304.4A 2018-09-21 2018-09-21 Data processing method and related device Active CN109086172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811108304.4A CN109086172B (en) 2018-09-21 2018-09-21 Data processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811108304.4A CN109086172B (en) 2018-09-21 2018-09-21 Data processing method and related device

Publications (2)

Publication Number Publication Date
CN109086172A CN109086172A (en) 2018-12-25
CN109086172B true CN109086172B (en) 2022-12-06

Family

ID=64842307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811108304.4A Active CN109086172B (en) 2018-09-21 2018-09-21 Data processing method and related device

Country Status (1)

Country Link
CN (1) CN109086172B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107179878B (en) * 2016-03-11 2021-03-19 伊姆西Ip控股有限责任公司 Data storage method and device based on application optimization
CN106648469B (en) * 2016-12-29 2020-01-17 华为技术有限公司 Cache data processing method and device and storage controller
CN107025289B (en) * 2017-04-14 2018-12-11 腾讯科技(深圳)有限公司 A kind of method and relevant device of data processing
CN107229420B (en) * 2017-05-27 2020-05-26 苏州浪潮智能科技有限公司 Data storage method, reading method, deleting method and data operating system
CN107193503B (en) * 2017-05-27 2020-05-29 杭州宏杉科技股份有限公司 Data deduplication method and storage device
CN108268219B (en) * 2018-02-01 2021-02-09 杭州宏杉科技股份有限公司 Method and device for processing IO (input/output) request

Also Published As

Publication number Publication date
CN109086172A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
US9892005B2 (en) System and method for object-based continuous data protection
US11232151B2 (en) Systems, methods, and software for improved video data recovery effectiveness
JP4354233B2 (en) Backup system and method
JP5774794B2 (en) Storage system and storage system control method
CN107544871B (en) Virtual machine disk backup method and device
US7681001B2 (en) Storage system
US10866742B1 (en) Archiving storage volume snapshots
US10572335B2 (en) Metadata recovery method and apparatus
WO2023000674A1 (en) Method and apparatus for data compression, backup and recovery of cloud hard disk, device and storage medium
US20190227710A1 (en) Incremental data restoration method and apparatus
CN110442298B (en) Storage equipment abnormality detection method and device and distributed storage system
US20170269847A1 (en) Method and Device for Differential Data Backup
JP4755244B2 (en) Information generation method, information generation program, and information generation apparatus
JP2016181142A (en) Backup control apparatus, backup control method, and program
US11455117B2 (en) Data reading method, apparatus, and system, avoiding version rollback issues in distributed system
US20100174880A1 (en) Method for fast format of a fully allocated volume when copied from a space efficient volume
JP5600015B2 (en) Backup system and backup method
US8560789B2 (en) Disk apparatus, data replicating method onto disk apparatus and program recording medium
CN109086172B (en) Data processing method and related device
US20090185762A1 (en) Data structure for image file
US10360108B2 (en) System and method of using performance-maintaining commands for generating a backup of unsupported file systems
US7587466B2 (en) Method and computer system for information notification
CN110287164B (en) Data recovery method and device and computer equipment
JP6556980B2 (en) Storage control device, storage control method, and storage control program
CN110688071A (en) Data synchronization method and system for reducing data synchronization quantity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant